A Short Skinny on Relations & the Algebra of Programming

I’ve been reading about the Algebra of Programming lately and lovin’ it. See J.N. Oliveira’s draft text in particular and the links in the references. I’ve started exploring the stuff from this post and more over here: https://github.com/philzook58/rel

Why and What?

Relations can expand the power of functional programming for the purpose of specification.

The point of a specification is to be able to write down in a very compact and clear way your intent for a program, more clearly and compactly than a full implementation could be written. It therefore makes sense to add to your specification language constructs that are not necessarily executable or efficient for the sake of compactness and clarity. When one needs executability or efficiency, one writes an implementation whose behavior you can connect to the spec via a formal or informal process.

Functional programming, with it’s focus on the abstraction of the mathematical function, is a good trade-off between executability, efficiency, and expressibility. It lies in a reasonable location between the ideas amenable to reasoning by a human mind and the command-driven requirements of the machines.

Functions are a specialization of relations. Relations extend the mathematical notion of functions with constructs like nondeterministic choice, failure and converse. These constructs are not always obviously executable or efficient. However, they greatly extend the abilities of reasoning and the clarity of expression of a specification.

The point-free style of reasoning about functions extends to a point-free style reasoning about relations, which is known as relation algebra. There are rich analogies with databases, category theory, linear algebra, and other topics.

Plus, I think it is very neato for some reason. If anyone ever thinks something is really neato, isn’t it worth giving it a listen?

A Simple Representation of Relations in Haskell

The simplest description of relations is as a set of tuples. So first let’s talk a bit about the options for sets in Haskell.

Sets in Haskell

There are a couple different reasonable ways to represent sets in Haskell.

  • [a] or Vector a
  • a -> Bool
  • Set a — a tree based Set from the containers package.

These have different performance characteristics and different power. The list [a] is very simple and has specialized pleasant syntax available. The indicator function a -> Bool gives you no ability to produce values of type a, but can easily denote very sophisticated spaces. Set a is a good general purpose data structure with fast lookup. You might also choose to mix and match combinations of these. Interconversion is often possible, but expensive. This is not a complete list of possibilities for sets, for example you may want a representation with a stronger possibility for search.

Relations in Haskell

We can directly use the definition of relations as a set of tuples with the above

But we also have the option to “curry” our relation representations, sort of mixing and matching properties of these representations.

You might also choose to package up multiples of these representations, choosing the most appropriate as the situation requires, see for example the relation package, whose type holds both Map a (Set b) and Map b (Set a).

Despite fiendishly poor performance, for simplicity and list comprehension syntax we are going to be using type Rel a b = [(a,b)] for the remainder of the post.

I’m also taking the restriction that we’re working in bounded enumerable spaces for ease. I assume such a requirement can be lifted for many purposes, but finite spaces like these are especially well tamed. The following typeclass and definition is very useful in this case.

Functions and Relations

Functions can be thought of as relations with the special property that for each left part of the tuple, there is exactly one right side and every possible left side appears. The relation corresponding to a function f looks like F = \{(x,y) | x \in X, y \in Y, y = f (x)\}.

There is a natural and slightly clever lifting of function composition to relations. We now check whether there exists a value that is in the right side of one and the left side of the other.

Because of these two operations (and their properties of associativity and absorption), FinRel is a category. We do however need the Eq b restriction to make Rel an instance of the category typeclass, so it does not quite fit the definition of category in base. It is a constrained category.

We can lift the common arrow/categorical combinators up to relations for example.

With these combinators, you have access to many functions on basic non-recursive algebraic data types. By combining them in a point free style, you can build some other useful combinators.

An Aside: Relations, Linear Algebra, Databases

The composition operation described above is not so unfamiliar as it may first appear.

Relation algebra has a great similarity to linear algebra. This connection can be made more clear by considering sparsity patterns of matrices and tensors. Sparsity patterns are a useful abstraction of linear algebraic operations. Instead of considering matrices of numbers, instead the entries are “zero” and “possibly nonzero” or, if you prefer, a matrix of boolean values corresponding to those same questions.

The ordinary row times column matrix multiplication corresponds to relation composition. Replace * with AND and + with OR. If any of the numbers is zero, then multiplying them will result in zero. In summing two numbers, if either is possibly nonzero, then the result is possibly nonzero.

Another interesting way of looking at it is that we are replacing the summation binding form \sum_i with the logical quantifier \exists_i. Both introduce a scoped “dummy variable” i and have a lot of syntactic similarity. Other related forms include \lambda i, \forall i, \int di, \max_i .

There is also an analog of the point free relation algebra in linear algebra. Linear algebra has the most widely used point free notation in the world, matrix notation. Consider the expressions Ax=b and X = ABC as compared to \sum_j A_{ij} x_j = b_i and X_{il} = \sum_{jk} A_{ij} B_{jk} C_{kl} . Matrix notation is SO much better for certain calculations. Other pieces of the matrix notation include transpose, inverse, Kronecker product, the Khatri-Rao product, and Hadamard product. Their properties are more clear in the index free form in my opinion. I believe even massive tensor expressions can be written index free using these operators. There are also analogies to be drawn between the graphical notations in these different fields.

Databases can be thought of very similarly to sparse matrices. In principle, you could enumerate all the possible values for a column of a database. So you could think of a database as a giant matrix with a 1 if the item is in the database and 0 if not. Databases are very very sparse from this perspective, and you would never store them this way. The join operation is a relative of relational composition, however join usually operates via looking at the column names, whereas our join is position based.

Query optimization in databases has interesting analogs in sparse linear algebra. For example, the Taco compiler http://tensor-compiler.org/ is doing something very akin to a query optimizer.

Inverting Relations

Unlike functions, Relations are always “invertible”. We call this the converse of a relation. When a function is invertible, it corresponds to the converse. In terms of the tuples underlying our representation, it just swaps them. Relations also possess operations trans and untrans that may be thought of as a kind of currying or as a partial inverse on a single parameter.

Orderings can also be lifted to relations (\leq) = \{(a,b) | a \leq b\}. The composition of relations also respects the usual composition of ordering.

Nondeterministic choice is sometimes represented in Haskell using Set returning functions a -> [b]. You may recall this from the context of the List monad. In fact in this case, we have an isomorphism as evidenced by tabulateSearch and searchRel.

Similarly partial functions can be reflected into relations

A useful trick is to lift sets/subsets to relations as a diagonal relation. \{(a,a) | a \in S \}. Projection onto the set can be achieve by composing with this relation. The identity results if you are talking about the entire set S.

Comparing Relations

We can compare sets by asking if one is a subset of the other A\subseteq B . Relations can also be compared by this operation, which we call relational inclusion.

A subservient notion to this is relational equality.

Relational algebra is chockful of inequality style reasoning, which is richer and slightly more complicated than equality style reasoning. This is one of the benefits of moving from functional descriptions to a relational description.

Relations also form a lattice with respect to these comparisons. What the hell are lattices? In the context of finite relations, lattices may be over powered mathematical machinery, but it really is useful down the line. They give you binary operators that play nice with some kind of ordering, in our case relational inclusion. These two operations are the meet and the join, which find the greatest lower bound and least upper bound of the operands respectively. For our relations, these correspond to the more familiar notion of set intersection and union. The intersection of two sets is the biggest set that is in both of them. The union is the smallest set for which both sets are a subset of it.

Using meet/join vs intersection/union becomes more interesting when the domain is fancier than relations over finite domains. Issues of infinity can make this interesting, or when using a representation that can’t explicitly represent arbitrary unions or intersections, but that instead has to approximate them. My favorite example is polyhedra. Polyhedra are not closed under unions. So in this case the join and union do not coincide. You need to take a convex hull of the union instead, which is the best approximation. Concretely, polyhedra can be represented as a list of their vertices, which generate the polyhedron. There is no way to express a union in this representation. Concatenating the lists represents taking the convex hull of the union.

An additional property that a lattice may possess is a largest and small element, called top (\top ) and bottom (\bot). Our finite domain relations do have these.

Relational Division

And now finally we get to one of the most interesting, powerful, and confusing operations: relational division. Relational division is a kind of pseudo inverse to relational composition. In linear algebra, the pseudo inverse is a matrix that does the best job it can to invert another matrix in a least squares sense. If the matrix is actually invertible, it equals the inverse. Relational division does the best job it can to invert a relational composition. Instead of taking least squares as a criteria, it ensures that the result doesn’t over approximate. If you had the inequality X \cdot Y \subseteq Z and you want to solve for X, relational division is the thing that does that. The right division Q = Z/Y is the largest relation such that Q \cdot Y \subseteq Z.

A helpful example is the similar operation of division in database tables.

And here is an implementation that I think is correct. I’ve goofed it up a couple times, it is a rather confusing construct.

There also exists a very similar operation of ldiv.

Relational division encapsulates many notions of searching or optimizing. I invite you to read more about it in J.N. Oliveira’s text or the Bird & de Moor text.

Properties and QuickCheck

Oh. Mah. Glob. You guys. So many properties. (Artwork courtesy of David)

Relation algebra is so chock full of properties. This is a perfect opportunity for some QuickCheck , a randomized property testing framework. There are so many more to test. I need to dig through to collect up all the identities.

Bits and Bobbles

  • Relations over continuous spaces. Vector subspaces (Linear Relations), Polyhedra (Linear inequality relations).
  • Non Bool-valued Relations. Replace \exists_x with \max_x. The weighted edgelist of a graph is a natural relation. By using composition we can ask about paths. We still have a comparison operator \subseteq which now respects the ordering of weights
  • Galois connections are cool.
  • Relations combined with recursion schemes. Recursion schemes are the point free way of describing recursion.
  • Moving into infinite spaces. How do we cope?
  • Faster search. Some relations are best specified by functions, Maps, others, mixes and matching.
  • If you “singletonize” relations a la the Agda project https://github.com/scmu/aopa, you get very interesting interconnections with profunctors, which people say are a categorical generalization of relations.
  • Point-free DSLs are interesting and pleasant. Many worries about alpha renaming are gone, at the expense of point-free pain. A DSL like this may be necessary to choose good relational query plans

References

Edit: An interesting comment and related library from /u/stevana

Leave a Reply

Your email address will not be published. Required fields are marked *