Computational Category Theory in Python I: Dictionaries for FinSet

Category theory is a mathematical theory with reputation for being very abstract.

Category theory is an algebraic theory of functions. It has the flavor of connecting up little pipes and ports that is reminiscent of dataflow languages or circuits, but with some hearty mathematical underpinnings.

So is this really applicable to programming at all? Yes, I think so.

Here’s one argument. Libraries present an interface to their users. One of the measures of the goodness or badness of an interface is how often you are inclined to peek under the hood to get it to do the thing that you need. Designing these interfaces is hard. Category theory has taken off as a field because it has been found to be a useful and uniform interface to a surprising variety of very different mathematics. I submit that it is at least plausible that software interfaces designed with tasteful mimicry of category theory may achieve similar uniformity across disparate software domains. This is epitomized for me in Conal Elliott’s Compiling to Categories. http://conal.net/papers/compiling-to-categories/

I think it is easy to have the miscomprehension that a fancy language like Haskell or Agda is necessary to even begin writing software that encapsulates category theory based ideas, but this is simply not the case. I’ve been under this misapprehension before.

It just so happens that category theory is especially useful in those languages for explaining some programming patterns especially those concerning polymorphism. See Bartosz Milewski’s Category theory for Programmers.

But this is not the only way to use category theory.

There’s a really delightful book by Rydeheard and Burstall called Computational Category Theory. The first time I looked at it, I couldn’t make heads or tails of it, going on the double uphill battle of category theory and Standard ML. But looking at it now, it seems extremely straightforward and well presented. It’s a cookbook of how to build category theoretic interfaces for software.

So I think it is interesting to perform some translation of its concepts and style into python, the lingua franca of computing today.

In particular, there is a dual opportunity to both build a unified interface between some of the most commonly used powerful libraries in the python ecosystem and also use these implementations to help explain categorical concepts in concrete detail. I hope to have the attention span to do this following:

A very simple category is that of finite sets. The objects in the category can be represented by python sets. The morphisms can be represented by python dictionaries. Nothing abstract here. We can rip and tear these things apart any which way we please.

The manipulations are made even more pleasant by the python features of set and dictionary comprehension which will mimic the definitions you’ll find on the wikipedia page for these constructions quite nicely.

Composition is defined as making a new dictionary by feeding the output of the first dictionary into the second. The identity dictionary over a set is one that has the same values as keys. The definition of products and coproducts (disjoint union) are probably not too surprising.

One really interesting thing about the Rydeheard and Burstall presentation is noticing what are the inputs to these constructions and what are the outputs. Do you need to hand it objects? morphisms? How many? How can we represent the universal property? We do so by outputting functions that construct the required universal morphisms. They describe this is a kind of skolemization . The constructive programmatic presentation of the things is incredibly helpful to my understanding, and I hope it is to yours as well.

Here is a python class for FinSet. I’ve implemented a couple of interesting constructions, such as pullbacks and detecting monomorphisms and epimorphisms.

I’m launching you into the a deep end here if you have never seen category theory before (although goddamn does it get deeper). Do not be surprised if this doesn’t make that much sense. Try reading Rydeheard and Burstall chapter 3 and 4 first or other resources.

Here’s some fun exercises (Ok. Truth time. It’s because I got lazy). Try to implement exponential and pushout for this category.

Uniform Continuity is Kind of Like a Lens

A really interesting topic is exact real arithmetic. It turns out, there are systematic ways of calculating numerical results with arbitrarily fine accuracy.

In practice this is not used much as it is complicated and slow.

There are deep waters here.

The problem is made rather difficult by the fact that you can’t compute real numbers strictly, you have to in some sense compute better and better finite approximations.

One way of doing this is to compute a stream of arbitrarily good approximations. If someone needs a better approximation than you’ve already given, they pop the next one off.

Streams give you some inverted control flow. They allow the results to pull on the input, going against the grain of the ordinary direction of computation. If you are interested in a final result of a certain accuracy, they seem somewhat inefficient. You have to search for the right amount to pull the incoming streams, and the intermediate computations may not be helpful.

Haskell chews infinite lists up for breakfast, so it’s a convenient place for such things https://wiki.haskell.org/Exact_real_arithmetic https://hackage.haskell.org/package/exact-real

A related but slightly different set of methods comes in the form of interval arithmetic. Interval arithmetic also gives precise statements of accuracy, maintain bounds of the accuracy as a number is carried along

Interval arithmetic is very much like forward mode differentiation. In forward mode differentiation, you compute on dual numbers (x,dx) and carry along the derivatives as you go.

Conceptually, differentiation and these validated bounds are connected as well. They are both telling you something about how the function is behaving nearby. The derivative is mostly meaningful at exactly the point it is evaluated. It is extremely local. The verified bounds being carried along are sort of a very principled finite difference approximation.

But reverse mode differentiation is often where it is at. This is the algorithm that drives deep learning. Reverse mode differentiation can be modeled functionally as a kind of lens. http://www.philipzucker.com/reverse-mode-differentiation-is-kind-of-like-a-lens-ii/ . The thing that makes reverse mode confusing is the backward pass. This is also inverted control flow, where the output pushes information to the input. The Lens structure does this too

It carrier a function that goes in the reverse direction which are being composed in the opposite direction of ordinary control flow. These functions are the “setters” in the ordinary usage of the Lens, but they are the backproppers for differentiation.

By analogy one might try

There is something pleasing here compared to interval arithmetic in that the output epsilon drives the input delta. The second function is kind of a Skolemized \delta(\epsilon) from the definition of continuity.

Although it kind of makes sense, there is something unsatisfying about this. How do you compute the x -> y? You already need to know the accuracy before you can make this function?

So it seems to me that actually a better definition is

This type surprised me and is rather nice in many respects. It let’s you actually calculate x -> y, has that lazy pull based feel without infinite streams, and has delta as a function of epsilon.

I have heard, although don’t understand, that uniform continuity is the more constructive definition (see constructive analysis by Bridger) https://en.wikipedia.org/wiki/Uniform_continuity This definition seems to match that.

In addition we are able to use approximations of the actual function if we know the accuracy it needs to be computed to. For example, given we know we need 0.01 accuracy of the output, we know we only need 0.009 accuracy in the input and we only need the x term of a Taylor series of sine (the total inaccuracy of the input and the inaccuracy of our approximation of sine combine to give total inaccuracy of output). If we know the needed accuracy allows it, we can work with fast floating point operations. If we need better we can switch over to mpfr, etc.

This seems nice for MetaOcaml staging or other compile time macro techniques. If the epsilon required is known at compile time, it makes sense to me that one could use MetaOcaml to produce fast unrolled code. In addition, if you know the needed accuracy you can switch between methods and avoid the runtime overhead. The stream based approach seems to have a lot of context switching and perhaps unnecessary intermediate computations. It isn’t as bad as it seems, since these intermediate computations are usually necessary to compute anyhow, but still.

We can play the same monoidal category games with these lenses as ever. We can use dup, par, add, mul, sin, cos etc. and wire things up in diagrams and what have you.

This might be a nice type for use in a theorem prover. The Lens type combined with the appropriate properties that the intervals go to zero and stay consistent for arbitrary epsilon seems like enough? { Realf | something something something}

Relation to Backwards error analysis?

Does this have nice properties like backprop when on high dimensional inputs? That’s where backprop really shines, high to low dimensional functions

Categorical Combinators for Convex Optimization and Model Predictive Control using Cvxpy

We’re gonna pump this well until someone MAKES me stop.

This particular example is something that I’ve been trying to figure out for a long time, and I am pleasantly surprised at how simple it all seems to be. The key difference with my previous abortive attempts is that I’m not attempting the heavy computational lifting myself.

We can take pointful DSLs and convert them into point-free category theory inspired interface. In this case a very excellent pointful dsl for convex optimization: cvxpy. Some similar and related posts converting dsls to categorical form

A convex optimization problem optimizes a convex objective function with constraints that define a convex set like polytopes or balls. They are polynomial time tractable and shockingly useful. We can make a category out of convex optimization problems. We consider some variables to be “input” and some to be “output”. This choice is somewhat arbitrary as is the case for many “relation” feeling things that aren’t really so rigidly oriented.

Convex programming problems do have a natural notion of composition. Check out the last chapter of Rockafeller, where he talks about the convex algebra of bifunctions. Instead of summing over the inner composition variable like in Vect \sum_j A_{ij}B_{jk}, or existentializing like in Rel \{ (a,c) |\exists b. (a,b)\in A, (b,c) \in B \}, we instead minimize over the inner composition variable min_y A(x,y) + B(y,z). These are similar operations in that they all produce bound variables.

The identity morphism is just the simple constraint that the input variables equal to output variables with an objective function of 0. This is an affine constraint, hence convex.

In general, if we ignore the objective part entirely by just setting it to zero, we’re actually working in a very computationally useful subcategory of Rel, ConvexRel, the category of relations which are convex sets. Composition corresponds to an existential operation, which is quickly solvable by convex optimization techniques. In operations research terms, these are feasibility problems rather than optimization problems. Many of the combinators do nothing to the objective.

The monoidal product just stacks variables side by side and adds the objectives and combines the constraints. They really are still independent problems. Writing things in this way opens up a possibility for parallelism. More on that some other day.

We can code this all up in python with little combinators that return the input, output, objective, constraintlist. We need to hide these in inner lambdas for appropriate fresh generation of variables.

Now for a somewhat more concrete example: Model Predictive control of an unstable (linearized) pendulum.

Model predictive control is where you solve an optimization problem of the finite time rollout of a control system online. In other words, you take measurement of the current state, update the constraint in an optimization problem, ask the solver to solve it, and then apply the force or controls that the solver says is the best.

This gives the advantage over the LQR controller in that you can set hard inequality bounds on total force available, or positions where you wish to allow the thing to go. You don’t want your system crashing into some wall or falling over some cliff for example. These really are useful constraints in practice. You can also include possibly time dependent aspects of the dynamics of your system, possibly helping you model nonlinear dynamics of your system.

I have more posts on model predictive control here http://www.philipzucker.com/model-predictive-control-of-cartpole-in-openai-gym-using-osqp/ http://www.philipzucker.com/flappy-bird-as-a-mixed-integer-program/

Here we model the unstable point of a pendulum and ask the controller to find forces to balance the pendulum.

We can interpret the controller in GraphCat in order to produce a diagram of the 10 step lookahead controller. This is an advantage of the categorical style a la compiling to categories

We can also actually run it in model predictive control configuration in simulation.

And some curves. How bout that.

Bits and Bobbles

LazySets https://github.com/JuliaReach/LazySets.jl

ADMM – It’s a “lens”. I’m pretty sure I know how to do it pointfree. Let’s do it next time.

The minimization can be bubbled out to the top is we are always minimizing. If we mix in maximization, then we can’t and we’re working on a more difficult problem. This is similar to what happens in Rel when you have relational division, which is a kind of universal quantification \forall . Mixed quantifier problems in general are very tough. These kinds of problems include games, synthesis, and robustness. More on this some other day.

Convex-Concave programming minimax https://web.stanford.edu/~boyd/papers/pdf/dccp_cdc.pdf https://web.stanford.edu/class/ee364b/lectures/cvxccv.pdf

The minimization operation can be related to the summation operation by the method of steepest descent in some cases. The method of steepest descent approximates a sum or integral by evaluating it at it’s most dominant position and expanding out from there, hence converts a linear algebra thing into an optimization problem. Examples include the connection between statistical mechanics and thermodynamics and classical mechanics and quantum mechanics.

Legendre Transformation ~ Laplace Transformation via steepest descent https://en.wikipedia.org/wiki/Convex_conjugate yada yada, all kinds of good stuff.

Intersection is easy. Join/union is harder. Does MIP help?

Quantifier elimination

MIP via adjunction

Rough Ideas on Categorical Combinators for Model Checking Petri Nets using Cvxpy

Petri nets are a framework for modeling dynamical systems that is very intuitive to some people. The vanilla version is that there are discrete tokens at nodes on a graph representing resources of some kind and tokens can be combined according to the firing of transition rules into new tokens in some other location.

This is a natural generalization of chemical reaction kinetics, where tokens are particular kinds of atoms that need to come together. It also is a useful notion for computer systems, where tokens represent some computational resource.

To me, this becomes rather similar to a flow problem or circuit problem. Tokens feel a bit like charge transitions are a bit like current (although not necessarily conservative). In a circuit, one can have such a small current that the particulate nature of electric current in terms of electrons is important. This happens for shot noise or for coulomb blockade for example.

If the number of tokens is very large, it seems intuitively sensible to me that one can well approximate the behavior by relaxing to a continuum. Circuits have discrete electrons and yet are very well approximated by ohm’s laws and the like. Populations are made of individuals, and yet in the thousands their dynamics are well described by differential equations.

It seems to me that mixed integer programming is a natural fit for this problem. Mixed integer programming has had its implementations and theory heavily refined for over 70 years so now very general purpose and performant off the shelf solvers are available. The way mixed integer programs are solved is by considering their quickly solved continuous relaxation (allowing fractional tokens and fractional transitions more akin to continuous electrical circuit flow) and using this information to systematically inform a discrete search process. This seems to me a reasonable starting approximation. Another name for petri nets is Vector Addition Systems, which has more of the matrix-y flavor.

We can encode a bounded model checking for reachability of a petri net into a mixed integer program fairly easily. We use 2-index variables, the first of which will be used for time step. We again turn to the general purpose functional way of encoding pointful dsls into pointfree ones as I have done here and here. The key point is that you need to be careful where you generate fresh variables. This is basically copy catting my posts here. http://www.philipzucker.com/categorical-combinators-for-graphviz-in-python/ http://www.philipzucker.com/a-sketch-of-categorical-relation-algebra-combinators-in-z3py/

I’m like 70% sure what I did here makes sense, but I’m pretty sure the general idea makes sense with some fiddling.

The big piece is the weighted_block function. It let’s you build a combinator with an internal state and input and output flow variables. You give matrices with entries for every possible transition. Whether transitions occurred between t and t+1 is indicated by integer variables. There is also possible accumulation of tokens at nodes, which also requires integer variables. Perhaps we’d want to expose the token state of the nodes to the outside too?

Weighted block schematically looks something like this

We can also get out a graphical representation of the net by reinterpreting our program into GraphCat. This is part of the power of the categorical interface. http://www.philipzucker.com/categorical-combinators-for-graphviz-in-python/

When I talked to Zach about this, he seemed skeptical that MIP solvers wouldn’t eat die a horrible death if you threw a moderately large petri net into them. Hard to say without trying.

Thoughts

There is an interesting analogy to be found with quantum field theory in that if you lift up to considering distributions of tokens, it looks like an occupation number representation. See Baez. http://math.ucr.edu/home/baez/stoch_stable.pdf

If you relax the integer constraint and the positivity constraints, this really is a finite difference formulation for capacitor circuits. The internal states would then be the charge of the capacitor. Would the positivity constraint be useful for diodes?

I wonder how relevant the chunky nature of petri nets might be for considering superconducting circuits, which have quantization of quantities from quantum mechanical effects.

Cvxpy let’s you describe convex regions. We can use this to implement a the convex subcategory of Rel which you can ask reachability questions. Relational division won’t work probably. I wonder if there is something fun there with respect the the integerizing operation and galois connections.

Edit: I should have googled a bit first. https://www.sciencedirect.com/science/article/pii/S0377221705009240 mathemtical programming tecchniques for petri net reachability. So it has been tried, and it sounds like the results weren’t insanely bad.

Categorical Combinators for Graphviz in Python

Graphviz is a graph visualization tool https://www.graphviz.org/. In Conal Elliott’s Compiling to categories http://conal.net/papers/compiling-to-categories/, compiling code to its corresponding graphviz representation was one very compelling example. These graphs are very similar to the corresponding string diagram of the monoidal category expression, but they also look like boolean circuit diagrams. Besides in Conal Elliott’s Haskell implementation, there is an implementation in the Julia Catlab.jl project https://epatters.github.io/Catlab.jl/stable/

I’ve basically implemented a toy version of a similar thing in python. It lets you do things like this

Why python?

  • Python is the lingua franca of computing these days. Many people encounter it, even people whose main thing isn’t computers.
  • The python ecosystem is nutso good.
  • Julia is kind of an uphill battle for me. I’m fighting the battle ( https://github.com/philzook58/Rel.jl ) , but I already know python pretty well. I can rip this out and move on.

What I did was implement some wrappers around the graphviz python package. It exposes a not very feature rich stateful interface. It is pretty nice that it prints the graphs inline in jupyter notebooks though.

The code is written in a style very similar (and hopefully overloadable with) to that of z3py relation algebra. http://www.philipzucker.com/a-sketch-of-categorical-relation-algebra-combinators-in-z3py/ . There is a fairly general cookbook method here for categorifying dsl. Since graphviz does not directly expose fresh node creation as far as I can tell, I made my own using a random number generator. The actual combinators are graphviz object processors, so we build up a giant functional chain of processors and then actually execute it with run. The inputs and outputs are represented by lists of node names. The is some design space to consider here.

I also had to use class based wrappers Based on the precedent set by the python 3 matrix multiplication operator @, I think it is a requirement that this also be used for category composition. id is a keyword or something in python unfortunately. For monoidal product, I feel like overloading power ** looks nice even if it is a nonsensical analogy, * is also not too bad. I went with * for now.

The graphviz graphs aren’t quite string diagrams. They make no promise to preserve the ordering of your operations, but they seem to tend to.

Here’s some example usage

Class based overloading is the main paradigm of overloading in python. You overload a program into different categories, by making a program take in the appropriate category class as a parameter.

For example something like this ought to work. Then you can get the graph of some matrix computation to go along with it’s numpy implementation

Maybe outputting tikz is promising? https://github.com/negrinho/sane_tikz

A Sketch of Categorical Relation Algebra Combinators in Z3Py

I’ve discussed relation algebra before. Relations are sets of tuples. There, I implemented the relations naively using lists for sets. This is very simple, and very clean especially with list comprehension syntax. It is however horrifically inefficient, and we could only deal with finitely enumerable domains. The easiest path to fixing these problems is to cash out to an external solver, in this case z3.

There are many beautifully implemented solvers out there and equally beautiful DSL/modeling languages. Examples in mind include sympy, cvxpy, and z3. These modeling languages require you to instantiate variable objects and build expressions out of them and then hand it off to the solver. This is a reasonable interface, but there are advantages to a more categorical/point-free style DSL.

Point-free languages are ones that do not include binding forms that introduce bound/dummy variables. Examples of binding forms like this are \lambda \sum \max \min \int \sup \lim \forall \exists. One problem lies in the fact that the names of bound variables don’t matter, and that they end up accidentally smashing into each other. You may have experienced this in physics or math class as the dummy indices or dummy variable problem causing you to screw up your calculation of some cross product identity or some complicated tensor sum. These are surprisingly subtle problems, very difficult to diagnose and get right. de Bruijn indices is a technique for giving the bound variables canonical names, but it sucks to implement in its own way. When you make a DSL point free, it is a joy to manipulate, optimize, and search. I think this may be the core of why category theory is good language for mathematics and programming.

Point-free style also tends to have significant economy of size, for better or worse. Sometimes it is better to have an expression very dense in information. This is important if you are about the algebraically manipulate an expression with paper and pencil. Every manipulation requires a great deal of mind numbing copying as you proceed line by line, so it can be excruciating if your notation has a lot of unnecessary bulk.

Relations are like functions. The two pieces of the tuple can be roughly thought of as the “input” and the “output”. Relations are only loosely directional though. Part of the point of relations is that the converse (inverse) of a relation is easy to define.

When we are implement relations, we have a choice. Do we want the relation to produce its variables, accept its variable, or accept one and produce the other? There are advantages to each. When relations were [(a,b)], a -> b -> Bool, and a -> [b] converting between these forms was a rather painful enumeration process. The sting of converting between them is taken out by the fact that the conversion is no longer a very computationally expensive process, since we’re working at the modeling layer.

When you’re converting a pointful DSL to pointfree DSL, you have to be careful where you instantiate fresh variables or else you’ll end up with secret relations that you didn’t intend. Every instantiation of id needs to be using fresh variables for example. You don’t want the different id talking to each other. Sometimes achieving this involves a little currying and/or thunking.

There is a pattern that I have notice when I’m using modeling languages. When you have a function or relation on variables, there are constraints produced that you have to keep a record of. The pythonic way is to have a Model or Solver object, and then have that objects mutate an internal record of the set of constraints. I don’t particularly enjoy this style though. It feels like too much boiler plate.

In Haskell, I would use something like a Writer monad to automatically record the constraints that are occurring. Monads are not really all that pleasant even in Haskell, and especially a no go in python without “do” syntax.

However, because we are going point free it is no extra cost at all to include this pipework along for the ride in the composition operation.

Here are implementations of the identity and composition for three different styles. Style 1 is fully receptive, style 2 is mixed (function feeling) and style 3 is fully productive of variables.

Fair warning, I’m being sketchy here. I haven’t really tried this stuff out.

z3 is a simply typed language. You can get away with some polymorphism at the python level (for example the == dispatches correctly accord to the object) but sometimes you need to manually specify the sort of the variables. Given these types, the different styles are interconvertible

We can create the general cadre of relation algebra operators. Here is a somewhat incomplete list

Questions about relation algebra expressions are often phrased in term of relational inclusion. You can construct a relation algebra expression, use the rsub in the appropriate way and ask z3’s prove function if it is true.

Z3 has solvers for

  • Combinatorial Relations
  • Linear Relations
  • Polyhedral Relations
  • Polynomial Relations
  • Interval Relations – A point I was confused on. I thought interval relations were not interesting. But I was interpetting the term incorrectly. I was thinking of relations on AxB that are constrained to take the form of a product of intervals. In this case, the choice of A has no effect on the possible B whatsoever, so it feels non relational. However, there is also I_A x I_B , relations over the intervals of A and B. This is much closer to what is actually being discussed in interval arithmetic.

Applications we can use this for:

  • Graph Problems. The Edges can be thought of as a relation between vertices. Relation composition Using the starn operator is a way to ask for paths through the graph.
  • Linear Relations – To some degree this might supplant my efforts on linear relations. http://www.philipzucker.com/linear-relation-algebra-of-circuits-with-hmatrix/ Z3 is fully capable of understanding linear relations.
  • Safety and liveness of control systems. Again. a transition relation is natural here. It is conceivable that the state space can be heterogenous in time, which is the interesting power of the categorical style. I feel like traditional control systems usually maintain the same state space throughout.
  • Program verification
  • Games? Nash equilibria?

Other Thoughts

  • Maybe we are just building a shitty version of alloy. https://alloytools.org/
  • What about uninterpeted relations? What about higher order relations? What about reflecting into a z3 ADT for a relational language. Then we could do relational program synthesis. This is one style, just hand everything off to smt. https://github.com/nadia-polikarpova/cse291-program-synthesis/tree/master/lectures
  • I should try to comply with python conventions, in particular numpy and pandas conventions. @ should be composition for example, since relation composition has a lot of flavor of matrix composition. I should overload a lot of operators, but then I’d need to wrap in a class 🙁
  • Z3 has special support for some relations. How does that play in? https://theory.stanford.edu/~nikolaj/programmingz3.html#sec-special-relations https://z3prover.github.io/api/html/ml/Z3.Relation.html
  • As long as you only use composition, there is a chaining of existentials, which really isn’t so bad.
  • What we’ve done here is basically analogous/identical to what John Wiegley did compiling to the category of z3. Slightly different in that he only allowed for existential composition rather than relational division. http://newartisans.com/2017/04/haskell-and-z3/
  • We can reduced the burden on z3 if we know the constructive proof objects that witness our various operations. Z3 is gonna do better if we can tell it exactly which y witness the composition of operators, or clues to which branch of an Or it should use.
  • It’s a bummer, but when you use quantifiers, you don’t see countermodels? Maybe there is some hook where you can, or in the dump of the proof object.
  • What about recursion schemes? The exact nature of z3 to handle unbounded problems is fuzzy to me. It does have the support to define recursive functions. Also explicit induction predicates can go through sometimes. Maybe look at the Cata I made in fancy relaion algebra post
  • I think most proof assistants have implementations of relation algebra available. I find you can do a surprising amount in z3.

Solving the Laplace Equations with Linear Relations

The Laplace equation is ubiquitous in physics and engineering.

$latex \nabla^2 \phi = 0 = \partial_x^2 \phi + \partial_y^2 \phi = 0

It and slight variants of it describes electrostatics, magnetostatics, steady state heat flow, elastic flex, pressure, velocity potentials.

There are a couple reasons for that.

  • It results from minimizing the squared gradient of a field |\nabla \phi |^2 which can make sense from an energy minimization perspective.
  • Similarly it results from the combination of a flow conservation law and a linear constitutive relation connecting flow and field (such as Ohm’s law, Fick’s law, or Hooke’s law).
  • It also gets used even if not particularly appropriate because we know how to mathematically deal with it, for example in image processing.

There are a couple of questions we may want to ask about a Laplace equation system

  • Given the field on the boundary, determine the field in the interior (Dirchlet problem)
  • Given the normal derivative of the field on the boundary determine the field in the interior (Neumann problem)
  • Given sources in the interior and 0 boundary condition, determine the field. The Laplace equation is called the Poisson equation when you allow a source term on the right hand side. \nabla^2 \phi = \rho.
  • Given the field at the boundary, determine the derivative at the boundary. Dirichlet-to-Neumann map or Poincare-Steklov operator.

Given the Dirichlet to Neumann map, you do not have to consider the interior of a region to use it. The Dirichlet to Neumann map is sort of the same thing as an effective resistance or scattering matrix. It gives you a black box representation of a region based solely on the variables at its boundary.

This linear relation algebra is useful for many things that I’d have considered a use case for the Schur complement. The Schur complement arises when you do Gaussian elimination on a blocked matrix. It is good that this pattern has a name, because once you know about it, you’ll see it in many places. Domain decomposition, marginalized gaussian distributions, low-rank update, Scattering matrices.

By composing the linear relations corresponding to the Dirchlet-Neumann relations of regions, we can build the Dirichlet-Neumann relations of larger regions.

To make this more concrete, let us take the example of electrical circuits like before. A grid of resistors is a finite difference approximation to the continuous problem

-\nabla \phi = E Electric field is gradient of potential

E = \rho j continuum ohm’s law

\nabla\cdot j = 0 current conservation

In this post, I mentioned how you can make reasonable 2 dimensional diagrams out of a monoidal category, by sort of arbitrarily flipping one wire up and one wire down as in the diagram below. This defines a horizontal and vertical composition which have to do the required book-keeping (associations) to keep an arrow in canonical form. I had considered this for the management method of weights in neural networks, but it is way more natural as the actual geometrical layout of a finite difference grid of a laplace equation.

So we can reuse our categorical circuit combinators to build a finite difference Laplace equation.

Just showing how you can bend a 4-wire monoidal box into a 2-d diagram. Ignore the labels.

This can be implemented in Haskell doing the following. Neato.

Bits and Bobbles

  • Not the tightest post, but I needed to get it out there. I have a short attention span.
  • Homology and defining simplices as categories. One way of describing Homology is about linear operators that are analogues of finite difference operators (or better yet, discrete differential geometry operators / exterior derivatives). To some degree, it is analyzing the required boundary conditions to fully define differential equations on weirdo topological surfaces, which correspond to geometrical loops/holes. You can figure this out by looking at subspaces and quotients of the difference operators. Here we have here a very category theory way of looking at partial differential equations. How does it all connect?
  • Continuous circuit models – https://en.wikipedia.org/wiki/Distributed-element_model Telegrapher’s equation is classic example.
  • Cody mentioned that I could actually build circuits and measure categorical identities in a sense. That’s kind of cool. Or I could draw conductive ink on carbon paper and actually make my string diagrams into circuits? That is also brain tickling
  • Network circuits
  • I really want to get coefficients that aren’t just doubles. allowing rational functions of a frequency \omega would allow analysis of capacitor/inductor circuits, but also tight binding model systems for fun things like topological insulators and the Haldane model http://www.philipzucker.com/topologically-non-trivial-circuit-making-haldane-model-gyrator/ . I may need to leave Haskell. I’m not seeing quite the functionality I need. Use Sympy? https://arxiv.org/abs/1605.02532 HLinear. Flint bindings for haskell? Looks unmaintained. Could also use a grobner basis package as dynamite for a mouse.
  • This is relevant for the boundary element method. Some really cool other stuff relevant here. http://people.maths.ox.ac.uk/martinsson/2014_CBMS/

Blah Blah blah: The subtlest aspect of differential equations is that of boundary conditions. It is more correct usually to consider the system of the interior differential equation and the boundary conditions as equally essential parts of the statement of the problem you are considering.

Categorical LQR Control with Linear Relations

Last time, I described and built an interesting expansion of describing systems via linear functions, that of using linear relations. We remove the requirement that mappings be functional (exactly one output vector for any given input vector), which extends our descriptive capability. This is useful for describing “open” pieces of a linear system like electric circuits. This blog post is about another application: linear dynamical systems and control.

Linear relations are an excellent tool for reachability/controllability. In a control system, it is important to know where it is possible to get the system to. With linear dynamics x_{t+1} = Ax_{t} + B u_{t}, an easy controllability question is “can my system reach into some subspace”. This isn’t quite the most natural question physically, but it is natural mathematically. My first impulse would be to ask “given the state starts in this little region over here, can it get to this little region over here”, but the subspace question is a bit easier to answer. It’s a little funky but it isn’t useless. It can detect if the control parameter only touches 1 of 2 independent dynamical systems for example. We can write the equations of motion as a linear relation and iterate composition on them. See Erbele for more.

There is a set of different things (and kind of more interesting things) that are under the purview of linear control theory, LQR Controllers and Kalman filters. The LQR controller and Kalman filter are roughly (exactly?) the same thing mathematically. At an abstract mathematical level, they both rely on the fact that optimization of a quadratic objective x^T Q x + r^T x + c with a linear constraints Ax=b is solvable in closed form via linear algebra. The cost of the LQR controller could be the squared distance to some goal position for example. When you optimize a function, you set the derivative to 0. This is a linear equation for quadratic objectives. It is a useful framework because it has such powerful computational teeth.

The Kalman filter does a similar thing, except for the problem of state estimation. There are measurements linear related to the state that you want to match with the prior knowledge of the linear dynamics of the system and expected errors of measurement and environment perturbations to the dynamics.

There are a couple different related species of these filters. We could consider discrete time or continuous time. We can also consider infinite horizon or finite horizon. I feel that discrete time and finite horizon are the most simple and fundamental so that is what we’ll stick with. The infinite horizon and continuous time are limiting cases.

We also can consider the dynamics to be fixed for all time steps, or varying with time. Varying with time is useful for the approximation of nonlinear systems, where there are different good nonlinear approximation (taylor series of the dynamics) depending on the current state.

There are a couple ways to solve a constrained quadratic optimization problem. In some sense, the most conceptually straightforward is just to solve for a span of the basis (the “V-Rep” of my previous post) and plug that in to the quadratic objective and optimize over your now unconstrained problem. However, it is commonplace and elegant in many respects to use the Lagrange multiplier method. You introduce a new parameter for every linear equality and add a term to the objective \lambda^T (A x - b). Hypothetically this term is 0 for the constrained solution so we haven’t changed the objective in a sense. It’s a funny slight of hand. If you optimize over x and \lambda, the equality constraint comes out as your gradient conditions on \lambda. Via this method, we convert a linearly constrained quadratic optimization problem into an unconstrained quadratic optimization problem with more variables.

Despite feeling very abstract, the value these Lagrangian variables take on has an interpretation. They are the “cost” or “price” of the equality they correspond to. If you moved the constraint by an amount Ax - b + \epsilon, you would change the optimal cost by an amount \lambda \epsilon. (Pretty sure I have that right. Units check out. See Boyd https://www.youtube.com/watch?v=FJVmflArCXc)

The Lagrange multipliers enforcing the linear dynamics are called the co-state variables. They represent the “price” that the dynamics cost, and are derivatives of the optimal value function V(s) (The best value that can be achieved from state s) that may be familiar to you from dynamic programming or reinforcement learning. See references below for more.

Let’s get down to some brass tacks. I’ve suppressed some terms for simplicity. You can also add offsets to the dynamics and costs.

A quadratic cost function with lagrange multipliers. Q is a cost matrix associated with the x state variables, R is a cost matrix for the u control variables.

C = \lambda_0 (x_0 - \tilde{x}_0) + \sum_t x_t^T Q x_t + u_t^T R u_t + \lambda_{t+1}^T (x_{t+1} - A x_t - B u_t  )

Equations of motion results from optimizing with respect to \lambda by design.

\nabla_{\lambda_{t+1}} C = x_{t+1} - A x_t - B u_t  = 0.

The initial conditions are enforced by the zeroth multiplier.

\nabla_{\lambda_0} C = x_i - x_{0} = 0

Differentiation with respect to the state x has the interpretation of backwards iteration of value derivative, somewhat related to what one finds in the Bellman equation.

\nabla_{x_t} C = Q x_{t} + A^T \lambda_{t+1} - \lambda_{t} = 0 \Longrightarrow \lambda_{t} =  A^T \lambda_{t+1} + Q x_{t}

The final condition on value derivative is the one that makes it the most clear that the Lagrange multiplier has the interpretation of the derivative of the value function, as it sets it to that.

\nabla_{x_N} C = Q x_N - \lambda_{N} = 0

Finally, differentiation with respect to the control picks the best action given knowledge of the value function at that time step.

\nabla_{u_t} C = R u_{t} - B^T \lambda_t  = 0

Ok. Let’s code it up using linear relations. Each of these conditions is a separate little conceptual box. We can build the optimal control step relation by combining these updates together

The string diagram for a single time step of control

The following is a bit sketchy. I am confident it makes sense, but I’m not confident that I have all of the signs and other details correct. I also still need to add the linear terms to the objective and offset terms to the dynamics. These details are all kind of actually important. Still, I think it’s nice to set down in preliminary blog form.

Initial conditions and final conditions. Final conditions fix the final value derivative to Qx. Initial conditions set x to some value. Should there be a leg for lambda on the initial conditions?

Bits and Bobbles

  • The code in context https://github.com/philzook58/ConvexCat/blob/master/src/LinRel.hs
  • Some of the juicier stuff is nonlinear control. Gotta walk before we can run though. I have some suspicions that a streaming library may be useful here, or a lens-related approach. Also ADMM.
  • Reminiscent of Keldysh contour. Probably a meaningless coincidence.
  • In some sense H-Rep is the analog of (a -> b -> Bool) and V-rep [(a,b)]
  • Note that the equations of motion are relational rather than functional for a control systems. The control parameters u describe undetermined variables under our control.
  • Loopback (trace) the value derivative for infinite horizon control.
  • Can solve the Laplace equation with a Poincare Steklov operator. That should be my next post.
  • There is something a touch unsatisfying even with the reduced goals of this post. Why did I have to write down the quadratic optimization problem and then hand translate it to linear relations? It’s kind of unacceptable. The quadratic optimization problem is really the most natural statement, but I haven’t figured out to make it compositional. The last chapter of Rockafellar?

References

https://arxiv.org/abs/1405.6881 Baez and Erbele – Categories in Control

http://www.scholarpedia.org/article/Optimal_control

http://www.argmin.net/2016/05/18/mates-of-costate/ – interesting analogy to backpropagation. Lens connection?

https://courses.cs.washington.edu/courses/cse528/09sp/optimality_chapter.pdf

Linear Relation Algebra of Circuits with HMatrix

Oooh this is a fun one.

I’ve talked before about relation algebra and I think it is pretty neat. http://www.philipzucker.com/a-short-skinny-on-relations-towards-the-algebra-of-programming/. In that blog post, I used finite relations. In principle, they are simple to work with. We can perform relation algebra operations like composition, meet, and join by brute force enumeration.

Unfortunately, brute force may not always be an option. First off, the finite relations grow so enormous as to be make this infeasible. Secondly, it is not insane to talk about relations or regions with an infinite number of elements, such as some continuous blob in 2D space. In that case, we can’t even in principle enumerate all the points in the region. What are we to do? We need to develop some kind of finite parametrization of regions to manipulate. This parametrization basically can’t possibly be complete in some sense, and we may choose more or less powerful systems of description for computational reasons.

In this post, we are going to be talking about linear or affine subspaces of a continuous space. These subspaces are hyperplanes. Linear subspaces have to go through the origin, while affine spaces can have an offset from the origin.

In the previous post, I mentioned that the finite relations formed a lattice, with operations meet and join. These operations were the same as set intersection and union so the introduction of the extra terminology meet and join felt a bit unwarranted. Now the meet and join aren’t union and intersection anymore. We have chosen to not have the capability to represent the union of two vectors, instead we can only represent the smallest subspace that contains them both, which is the union closed under vector addition. For example, the join of a line and point will be the plane that goes through both.

Linear/Affine stuff is great because it is so computational. Most questions you cant to ask are answerable by readily available numerical linear algebra packages. In this case, we’ll use the Haskell package HMatrix, which is something like a numpy/scipy equivalent for Haskell. We’re going to use type-level indices to denote the sizes and partitioning of these spaces so we’ll need some helper functions.

Matrices are my patronum. They make everything good. Artwork courtesy of David

In case I miss any extensions, make typos, etc, you can find a complete compiling version here https://github.com/philzook58/ConvexCat/blob/master/src/LinRel.hs

In analogy with sets of tuples for defining finite relations, we partition the components of the linear spaces to be “input” and “output” indices/variables \begin{bmatrix} x_1 & x_2 & x_3 & ... & y_1 & y_2 & y_3 & ... \end{bmatrix}. This partition is somewhat arbitrary and easily moved around, but the weakening of strict notions of input and output as compared to functions is the source of the greater descriptive power of relations.

Relations are extensions of functions, so linear relations are an extension of linear maps. A linear map has the form y = Ax. A linear relation has the form Ax + By = 0. An affine map has the form y = Ax + b and an affine relation has the form Ax + By = b.

There are at least two useful concrete representation for subspaces.

  1. We can write a matrix A and vector b down that corresponds to affine constraints. Ax = b. The subspace described is the nullspace of A plus a solution of the equation. The rows of A are orthogonal to the space.
  2. We can hold onto generators of subspace. x = A' l+b where l parametrizes the subspace. In other words, the subspace is generated by / is the span of the columns of A'. It is the range of A'.

We’ll call these two representations the H-Rep and V-Rep, borrowing terminology from similar representations in polytopes (describing a polytope by the inequalities that define it’s faces or as the convex combination of it’s vertices). https://inf.ethz.ch/personal/fukudak/lect/pclect/notes2015/PolyComp2015.pdf These two representations are dual in many respects.

It is useful to have both reps and interconversion routines, because different operations are easy in the two representations. Any operations defined on one can be defined on the other by sandwiching between these conversion functions. Hence, we basically only need to define operations for one of the reps (if we don’t care too much about efficiency loss which, fair warning, is out the window for today). The bulk of computation will actually be performed by these interconversion routines. The HMatrix function nullspace performs an SVD under the hood and gathers up the space with 0 singular values.

These linear relations form a category. I’m not using the Category typeclass because I need BEnum constraints hanging around. The identity relations is x = y aka Ix - Iy = 0.

Composing relations is done by combining the constraints of the two relations and then projecting out the interior variables. Taking the conjunction of constraints is easiest in the H-Rep, where we just need to vertically stack the individual constraints. Projection easily done in the V-rep, where you just need to drop the appropriate section of the generator vectors. So we implement this operation by flipping between the two.

We can implement the general cadre of relation operators, meet, join, converse. I feel the converse is the most relational thing of all. It makes inverting a function nearly a no-op.

Relational inclusion is the question of subspace inclusion. It is fairly easy to check if a VRep is in an HRep (just see plug the generators into the constraints and see if they obey them) and by using the conversion functions we can define it for arbitrary combos of H and V.

It is useful the use the direct sum of the spaces as a monoidal product.

A side note: Void causes some consternation. Void is the type with no elements and is the index type of a 0 dimensional space. It is the unit object of the monoidal product. Unfortunately by an accident of the standard Haskell definitions, actual Void is not a BEnum. So, I did a disgusting hack. Let us not discuss it more.

Circuits

Baez and Fong have an interesting paper where they describe building circuits using a categorical graphical calculus. We have the pieces to go about something similar. What we have here is a precise way in which circuit diagrams can be though of as string diagrams in a monoidal category of linear relations.

An idealized wire has two quantities associated with it, the current flowing through it and the voltage it is at.

When we connect wires, the currents must be conserved and the voltages must be equal. hid and hcompose from above still achieve that. Composing two independent circuits in parallel is achieve by hpar.

Independent resistors in parallel.

We will want some basic tinker toys to work with.

A resistor in series has the same current at both ends and a voltage drop proportional to the current

Composing two resistors in parallel adds the resistance. (resistor r1) <<< (resistor r2) == resistor (r1 + r2))

A bridging resistor allows current to flow between the two branches

A bridging resistor

Composing two bridge circuits is putting the bridge resistors in parallel. The conductance G=\frac{1}{R} of resistors in parallel adds. hcompose (bridge r1) (bridge r2) == bridge 1 / (1/r1 + 1/r2).

parallel resistors compose

An open circuit allows no current to flow and ends a wire. open ~ resistor infinity

At branching points, the voltage is maintained, but the current splits.

This cmerge combinator could also be built using a short == bridge 0 , composing a branch with open, and then absorbing the Void away.

We can bend wires up or down by using a composition of cmerge and open.

Voltage and current sources enforce current and voltage to be certain values

Measurements of circuits proceed by probes.

Inductors and capacitors could be included easily, but would require the entries of the HMatrix values to be polynomials in the frequency \omega, which it does not support (but it could!). We’ll leave those off for another day.

We actually can determine that the rules suggested above are being followed by computation.

Bits and Bobbles

  • Homogenous systems are usually a bit more elegant to deal with, although a bit more unfamiliar and abstract.
  • Could make a pandas like interface for linear relations that uses numpy/scipy.sparse for the computation. All the swapping and associating is kind of fun to design, not so much to use. Labelled n-way relations are nice for users.
  • Implicit/Lazy evaluation. We should let the good solvers do the work when possible. We implemented our operations eagerly. We don’t have to. By allowing hidden variables inside our relations, we can avoid the expensive linear operations until it is useful to actually compute on them.
  • Relational division = quotient spaces?
  • DSL. One of the beauties of the pointfree/categorical approach is that you avoid the need for binding forms. This makes for a very easily manipulated DSL. The transformations feel like those of ordinary algebra and you don’t have to worry about the subtleties of index renaming or substitution under binders.
  • Sparse is probably really good. We have lots of identity matrices and simple rearrangements. It is very wasteful to use dense operations on these.
  • Schur complement https://en.wikipedia.org/wiki/Schur_complement are the name in the game for projecting out pieces of linear problems. We have some overlap.
  • Linear relations -> Polyhedral relations -> Convex Relations. Linear is super computable, polyhedral can blow up. Rearrange a DSL to abuse Linear programming as much as possible for queries.
  • Network circuits. There is an interesting subclass of circuits that is designed to be pretty composable.

https://en.wikipedia.org/wiki/Two-port_network Two port networks are a very useful subclass of electrical circuits. They model transmission lines fairly well, and easily composable for filter construction.

It is standard to describe these networks by giving a linear function between two variables and the other two variables. Depending on your choice of which variables depend on which, these are called the z-parameters, y-parameters, h-parameters, scattering parameters, abcd parameters. There are tables of formula for converting from one form to the others. The different parameters hold different use cases for composition and combining in parallel or series. From the perspective of linear relations this all seems rather silly. The necessity for so many descriptions and the confusing relationship between them comes from the unnecessary and overly rigid requirement of have a linear function-like relationship rather than just a general relation, which depending of the circuit may not even be available (there are degenerate configurations where two of the variables do not imply the values of the other two). A function relationship is always a lie (although a sometimes useful one), as there is always back-reaction of new connections.

The relation model also makes clearer how to build lumped models out of continuous ones. https://en.wikipedia.org/wiki/Lumped-element_model

https://en.wikipedia.org/wiki/Transmission_line https://en.wikipedia.org/wiki/Distributed-element_model

    null
  • Because the type indices have no connection to the actual data types (they are phantom) it is a wise idea to use smart constructors that check that the sizes of the matrices makes sense.
  • Nonlinear circuits. Grobner Bases and polynomial relations?
  • Quadratic optimization under linear constraints. Can’t get it to come out right yet. Clutch for Kalman filters. Nice for many formulations like least power, least action, minimum energy principles. Edit: I did more in this direction here http://www.philipzucker.com/categorical-lqr-control-with-linear-relations/
  • Quadratic Operators -> Convex operators. See last chapter of Rockafellar.
  • Duality of controllers and filters. It is well known (I think) that for ever controller algorithm there is a filter algorithm that is basically the same thing.
    • LQR – Kalman
    • Viterbi filter – Value function table
    • particle filter – Monte Carlo control
    • Extended Kalman – iLQR-ish? Use local approximation of dynamics
    • unscented kalman – ?

Refs

Functors, Vectors, and Quantum Circuits

Vectors are dang useful things, and any energy you put into them seems to pay off massive dividends.

Vectors and Linear Algebra are useful for:

  • 2D, 3D, 4D geometry stuff. Computer graphics, physics etc.
  • Least Squares Fitting
  • Solving discretized PDEs
  • Quantum Mechanics
  • Analysis of Linear Dynamical Systems
  • Probabilistic Transition Matrices

There are certain analogies between Haskell Functors and Vectors that corresponds to a style of computational vector mathematics that I think is pretty cool and don’t see talked about much.

Due to the expressivity of its type system, Haskell has a first class notion of container that many other languages don’t. In particular, I’m referring to the fact that Haskell has higher kinded types * -> * (types parametrized on other types) that you can refer to directly without filling them first. Examples in the standard library include Maybe, [], Identity, Const b, and Either b. Much more vector-y feeling examples can be found in Kmett’s linear package V0, V1, V2, V3, V4. For example, the 4 dimensional vector type V4

This really isn’t such a strange, esoteric thing as it may appear. You wouldn’t blink an eye at the type

in some other language. What makes Haskell special is how compositional and generic it is. We can build thousand element structs with ease via composition. What we have here is an alternative to the paradigm of computational vectors ~ arrays. Instead we have computational vectors ~ structs. In principle, I see no reason why this couldn’t be as fast as arrays, although with current compiler expectations it probably isn’t.

Monoidal categories are a mathematical structure that models this analogy well. It has been designed by mathematicians for aesthetic elegance, and it seems plausible that following its example leads us to interesting, useful, and pleasant vector combinators. And I personally find something that tickles me about category theory.

So to get started, let’s talk a bit about functors.

The Algebra of Functors

Functors in Haskell are a typeclass for containers. They allow you to map functions over all the items in the container. They are related to the categorical notion of functor, which is a mapping between categories.

You can lift the product and sum of types to the product and sum of Functors which you may find in Data.Functor.Product and Data.Functor.Sum. This is analogous to the lifting of ordinary addition and multiplication to the addition and multiplication of polynomials, which are kind of like numbers with a “hole”.

Functors also compose. A container of containers of a is still a container of a. We can form composite containers by using the Compose newtype wrapper.

When you use this Compose newtype, instead of having to address the individual elements by using fmap twice, a single application of fmap will teleport you through both layers of the container.

Product, Sum, and Compose are all binary operator on functors. The type constructor has kind

Some important other functors from the algebra of types perspective are Const Void a, Const () a, and Identity a. These are identity elements for Sum, Product, and Compose respectively.

You can define mappings between containers that don’t depend on the specifics of their contents. These mappings can only rearrange, copy and forget items of their contained type. This can be enforced at the type level by the polymorphic type signature

These mappings correspond in categorical terminology to natural transformations between the functors f and g. There is a category where objects are Functors and morphisms are natural transformations. Sum, Product, and Compose all obeys the laws necessary to be a monoidal product on this category.

How the lifting of functions works for Compose is kind of neat.

Because the natural transformations require polymorphic types, when you apply ntf to fg the polymorphic variable a in the type of ntf restricts to a ~ g a'.

Product and Sum have a straight forward notion of commutativity ( (a,b) is isomorphic to (b,a)) . Compose is more subtle. sequenceA from the Traversable typeclass can swap the ordering of composition. sequenceA . sequenceA may or may not be the identity depending on the functors in question, so it has some flavor of a braiding operation. This is an interesting post on that topic https://parametricity.com/posts/2015-07-18-braids.html

Combinators of these sorts are used arise in at least the following contexts

  • Data types a la carte – A systematic way of building extensible data types
  • GHC Generics – A system for building generic functions that operate on data types that can be described with sums, products, recursion, and holes.
  • In and around the Lens ecosystem

Also see the interesting post by Russell O’Connor and functor oriented programming http://r6.ca/blog/20171010T001746Z.html. I think the above is part of that to which he is referring.

Vector Spaces as Shape

Vector spaces are made of two parts, the shape (dimension) of the vector space and the scalar.

Just as a type of kind * -> * can be thought of as a container modulo it’s held type, it can also be a vector modulo its held scalar type. The higher kinded type for vector gives an explicit slot to place the scalar type.

The standard Haskell typeclass hierarchy gives you some of the natural operations on vectors if you so choose to abuse it in that way.

  • Functor ~> Scalar Multiplication: smul s = fmap (* s)
  • Applicative ~> Vector Addition: vadd x y = (+) <$> x <*> y
  • Traversable ~> Tranposition. sequenceA has the type of transposition and works correctly for the linear style containers like V4.

The linear library does use Functor for scalar multiplication, but defines a special typeclass for addition, Additive. I think this is largely for the purposes for bringing Map like vectors into the fold, but I’m not sure.

Once we’ve got the basics down of addition and scalar multiplication, the next thing I want is operations for combining vector spaces. Two important ones are the Kronecker product and direct sum. In terms of indices, the Kronecker product is a space that is indexed by the cartesian product (,) of its input space indices and the direct sum is a space indexed by the Either of its input space indices. Both are very useful constructs. I use the Kronecker product all the time when I want to work on 2D or 3D grids for example. If you’ll excuse my python, here is a toy 2-D finite difference Laplace equation example. We can lift the 1D second derivative matrix K = \partial_x^2 using the kronecker product K2 = K \otimes I + I \otimes K. The direct sum is useful as a notion of stacking matrices.

The following is perhaps the most important point of the entire post.

Compose of vector functors gives the Kronecker product, and Product gives the direct sum (this can be confusing but its right. Remember, the sum in direct sum refers to the indices).

We can form the Kronecker product of vectors given a Functor constraint.

Notice we have two distinct but related things called kron: Kron and kron. One operates on vectors spaces and the other operates on vector values.

Building vector spaces out of small combinators like V2, V4, DSum, Kron is interesting for a number of reasons.

  • It is well typed. Similar to Nat indexed vectors, the types specify the size of the vector space. We can easily describe vector spaced as powers of 2 as V16 = Kron V2 (Kron V2 (Kron V2 (Kron V2 V1))), similarly in terms of its prime factors, or we can do a binary expansion (least significant bit first) V5 = DSum V1 (Kron V2 (DSum V0 (Kron V2 V1))) or other things. We do it without going into quasi-dependently typed land or GADTs.
  • It often has better semantic meaning. It is nice to say Measurements, or XPosition or something rather than just denote the size of a vector space in terms of a nat. It is better to say a vector space is the Kron of two meaningful vector spaces than to just say it is a space of size m*n. I find it pleasant to think of the naturals as a free Semiring rather than as the Peano Naturals and I like the size of my vector space defined similarly.
  • Interesting opportunities for parallelism. See Conal Elliott’s paper on scans and FFT: http://conal.net/papers/generic-parallel-functional/

What do linear operators look like?

In the Vectors as shape methodology, Vectors look very much like Functors.

I have been tempted to lift the natural transformation type above to the following for linear operators.

In a sense this works, we could implement kron because many of the container type (V1, V2, V3, etc) in the linear package implement Num. However, choosing Num is a problem. Why not Fractional? Why not Floating? Sometimes we want those. Why not just specifically Double?

We don’t really want to lock away the scalar in a higher rank polymorphic type. We want to ensure that everyone is working in the same scalar type before allowing things to proceed.

Note also that this type does not constrain us to linearity. Can we form the Kronecker product of linear operators? Yes, but I’m not in love with it. This is not nearly so beautiful as the little natural transformation dance.

This was a nice little head scratcher for me. Follow the types, my friend! I find this particularly true for uses of sequenceA. I find that if I want the containers swapped in ordering. In that situation sequenceA is usually the right call. It could be called transpose.

Giving the vector direct access to the scalar feels a bit off to me. I feel like it doesn’t leave enough “room” for compositionally. However, there is another possibility for a definition of morphisms could be that I think is rather elegant.

Does this form actually enforce linearity? You may still rearrange objects. Great. You can also now add and scalar multiply them with the Additive k constraint. We also expose the scalar, so it can be enforced to be consistent.

One other interesting thing to note is that these forms allow nonlinear operations. fmap, liftU2 and liftI2 are powerful operations, but I think if we restricted Additive to just a correctly implemented scalar multiply and vector addition operation, and zero, we’d be good.

We can recover the previous form by instantiation k to V1. V1, the 1-d vector space, is almost a scalar and can play the scalars role in many situations. V1 is the unit object with respect to the monoidal product Kron.

There seems to be a missing instance to Additive that is useful. There is probably a good reason it isn’t there, but I need it.

Monoidal Categories

The above analogy can be put into mathematical terms by noting that both vectors and functor are monoidal categories. I talked a quite a bit about monoidal categories in a previous post http://www.philipzucker.com/a-touch-of-topological-computation-3-categorical-interlude/ .

Categories are the combo of a collection of objects and arrows between the objects. The arrows can compose as long as the head of one is on the same object as the tail of the other. On every object, there is always an identity arrow, which when composed will do nothing.

We need a little extra spice to turn categories into monoidal categories. One way of thinking about it is that monoidal categories have ordinary category composition and some kind of horizontal composition, putting things side to side. Ordinary composition is often doing something kind of sequentially, applying a sequence of functions, or a sequence of matrices. The horizontal composition is often something parallel feeling, somehow applying the two arrows separately to separate pieces of the system.

Why are they called Monoidal?

There is funny game category people play where they want to lift ideas from other fields and replace the bits and pieces in such a way that the entire thing is defined in terms of categorical terminology. This is one such example.

A monoid is a binary operations that is associative and has an identity.

Sometimes people are more familiar with the concept of a group. If not, ignore the next sentence. Monoids are like groups without requiring an inverse.

Numbers are seperately monoids under both addition, multiplication and minimization (and more), all of which are associative operations with identity (0, 1, and infinity respectively).

Exponentiation is a binary operation that is not a monoid, as it isn’t associative.

The Monoid typeclass in Haskell demonstrates this http://hackage.haskell.org/package/base-4.12.0.0/docs/Data-Monoid.html

A common example of a monoid is list, where mempty is the empty list and mappend appends the lists.

There are different set-like intuitions for categories. One is that the objects in the category are big opaque sets. This is the case for Hask, Rel and Vect.

A different intuitiion is that the category itself is like a set, and the objects are the elements of that set. There just so happens to be some extra structure knocking around in there: the morphisms. This is the often more the feel for the examples of preorders or graphs. The word “monoidal” means that they we a binary operation on the objects. But in the category theory aesthetic, you also need that binary operation to “play nice” with the morphisms that are hanging around too.

Functors are the first thing that has something like this. It has other properties that come along for the ride. A Functor is a map that takes objects to objects and arrows to arrows in a nice way. A binary functor takes two objects to and object, and two arrows to one arrow in a way that plays nice (commutes) with arrow composition.

String diagrams

String diagrams are a graphical notation for monoidal categories. Agin I discussed this more here.

Morphisms are denoted by boxes. Regular composition is shown by plugging arrows together vertically. Monoidal product is denoted by putting the arrows side to side.

When I was even trying to describe what a monoidal category was, I was already using language evocative of string diagrams.

You can see string diagrams in the documentation for the Arrow library. Many diagrams that people use in various fields can be formalized as the string diagrams for some monoidal category. This is big chunk of Applied Category Theory.

This is the connection to quantum circuits, which are after all a graphical notation for very Kroneckery linear operations.

example circuit

There is an annoying amount of stupid repetitive book keeping with the associative structure of Kron. This can largely be avoided hopefully with coerce, but I’m not sure. I was having trouble with roles when doing it generically.

Bit and Bobbles

  • Woof. This post was more draining to write than I expected. I think there is still a lot left to say. Sorry about the editing everyone! Bits and pieces of this post are scattered in this repo
  • How would you go about this in other languages? C, Rust, OCaml, C++, Agda
  • The discussion of Vect = * -> * is useful for discussion of 2-Vect, coming up next. What if we make vectors of Vect? Wacky shit.
  • Metrics and Duals vectors. type Dual f a = f a -> a. type Dual1 f a = forall k. Additive k => Kron f k a -> k a
  • Adjunction diagrams have cups and caps. Since we have been using representable functors, they actually have a right adjunction that is tupling with the vector space index type. This gives us something that almost feels like a metric but a weirdly constrained metric.
  • LinOp1 form is yoneda? CPS? Universally quantified k is evocative of forall c. (a -> c) -> (b -> c)

References

Appendix

Representable/Naperian Functors

Containers that are basically big product types are also known as representable, Naperian, or logarithmic. Representable places emphasis on the isomorphism between such a container type and the type (->) i which by the algebra of types correspond is isomorphic to a^i (i copies of a). They are called Naperian/Logarithmic because there is a relationship similar to exponentiation between the index type a and the container type f. If you take the Product f g, this container is indexed by (a + b) = Either a b. Compose f g is indexed by the product (a,b). (f r) ~ r^a The arrow type is written as an exponential b^a because if you have finite enumerable types a and b, that is the number of possible tabulations available for f. The Sum of two representable functors is no longer representable. Regular logarithms of sums Log(f + g) do not have good identities associated with them.

See Gibbons article. There is a good argument to be made that representable functors are a good match for vectors/well typed tensor programming.

But note that there is a reasonable interpretation for container types with sum types in them. These can be thought of as subspaces, different bases, or as choices of sparsity patterns. When you define addition, you’ll need to say how these subspaces reconcile with each other.
— two bases at 45 degrees to each other.

Monoidal Products on Hask

Hask is a name for the category that has objects as Haskell types and morphisms as Haskell functions.

Note that it’s a curious mixing of type/value layers of Haskell. The objects are types whereas the function morphisms are Haskell values. Composition is given by (.) and the identity morphisms are given by id.

For Haskell, you can compose functions, but you can also smash functions together side by side. These combinators are held in Control.Arrow.

You can smash together types with tuple (,) or with Either. Both of these are binary operators on types. The corresponding mapping on morphisms are given by

These are binary operators on morphisms that play nice with the composition structure of Haskell.

Monoidal Combinators of Functors

A monoidal category also has unit objects. This is given by the Identity functor

There is also a sense of associativity. It is just newtype rearrangement, so it can also be achieved with a coerce (although not polymorphically?).

Similarly, we can define a monoidal category structure using Product or Sum instead of Compose.

These are all actually just newtype rearrangement, so they should all just be instances of coerce, but I couldn’t get the roles to go through generically?