Uniform Continuity is Kind of Like a Lens

A really interesting topic is exact real arithmetic. It turns out, there are systematic ways of calculating numerical results with arbitrarily fine accuracy.

In practice this is not used much as it is complicated and slow.

There are deep waters here.

The problem is made rather difficult by the fact that you can’t compute real numbers strictly, you have to in some sense compute better and better finite approximations.

One way of doing this is to compute a stream of arbitrarily good approximations. If someone needs a better approximation than you’ve already given, they pop the next one off.

Streams give you some inverted control flow. They allow the results to pull on the input, going against the grain of the ordinary direction of computation. If you are interested in a final result of a certain accuracy, they seem somewhat inefficient. You have to search for the right amount to pull the incoming streams, and the intermediate computations may not be helpful.

Haskell chews infinite lists up for breakfast, so it’s a convenient place for such things https://wiki.haskell.org/Exact_real_arithmetic https://hackage.haskell.org/package/exact-real

A related but slightly different set of methods comes in the form of interval arithmetic. Interval arithmetic also gives precise statements of accuracy, maintain bounds of the accuracy as a number is carried along

Interval arithmetic is very much like forward mode differentiation. In forward mode differentiation, you compute on dual numbers (x,dx) and carry along the derivatives as you go.

type ForwardMode x dx y dy = (x,dx) -> (y,dy)
type IntervalFun x delx y dely = (x,delx) -> (y, dely)

Conceptually, differentiation and these validated bounds are connected as well. They are both telling you something about how the function is behaving nearby. The derivative is mostly meaningful at exactly the point it is evaluated. It is extremely local. The verified bounds being carried along are sort of a very principled finite difference approximation.

But reverse mode differentiation is often where it is at. This is the algorithm that drives deep learning. Reverse mode differentiation can be modeled functionally as a kind of lens. http://www.philipzucker.com/reverse-mode-differentiation-is-kind-of-like-a-lens-ii/ . The thing that makes reverse mode confusing is the backward pass. This is also inverted control flow, where the output pushes information to the input. The Lens structure does this too

type Lens s t a b = s -> (a, b -> t)

It carrier a function that goes in the reverse direction which are being composed in the opposite direction of ordinary control flow. These functions are the “setters” in the ordinary usage of the Lens, but they are the backproppers for differentiation.

By analogy one might try

type RealF x delta y epsilon = Lens x delta y epsilon = x -> (y, epsilon -> delta)

There is something pleasing here compared to interval arithmetic in that the output epsilon drives the input delta. The second function is kind of a Skolemized \delta(\epsilon) from the definition of continuity.

Although it kind of makes sense, there is something unsatisfying about this. How do you compute the x -> y? You already need to know the accuracy before you can make this function?

So it seems to me that actually a better definition is

type RealF x delta y epsilon = Lens epsilon y delta x  = epsilon -> (delta, x -> y)

This type surprised me and is rather nice in many respects. It let’s you actually calculate x -> y, has that lazy pull based feel without infinite streams, and has delta as a function of epsilon.

I have heard, although don’t understand, that uniform continuity is the more constructive definition (see constructive analysis by Bridger) https://en.wikipedia.org/wiki/Uniform_continuity This definition seems to match that.

In addition we are able to use approximations of the actual function if we know the accuracy it needs to be computed to. For example, given we know we need 0.01 accuracy of the output, we know we only need 0.009 accuracy in the input and we only need the x term of a Taylor series of sine (the total inaccuracy of the input and the inaccuracy of our approximation of sine combine to give total inaccuracy of output). If we know the needed accuracy allows it, we can work with fast floating point operations. If we need better we can switch over to mpfr, etc.

This seems nice for MetaOcaml staging or other compile time macro techniques. If the epsilon required is known at compile time, it makes sense to me that one could use MetaOcaml to produce fast unrolled code. In addition, if you know the needed accuracy you can switch between methods and avoid the runtime overhead. The stream based approach seems to have a lot of context switching and perhaps unnecessary intermediate computations. It isn’t as bad as it seems, since these intermediate computations are usually necessary to compute anyhow, but still.

We can play the same monoidal category games with these lenses as ever. We can use dup, par, add, mul, sin, cos etc. and wire things up in diagrams and what have you.

This might be a nice type for use in a theorem prover. The Lens type combined with the appropriate properties that the intervals go to zero and stay consistent for arbitrary epsilon seems like enough? { Realf | something something something}

Relation to Backwards error analysis?

Does this have nice properties like backprop when on high dimensional inputs? That’s where backprop really shines, high to low dimensional functions

Notes on Finally Tagless

For reading group this week we read finally tagless partially evaluated http://okmij.org/ftp/tagless-final/JFP.pdf. It took me a couple minutes to regain my footing on my preferred explanation of what tagless is. This is why we write blog posts, to remember that kind of thing.

One thing that is very confusing about finally tagless as presented is that people tend to be talking about dsls with binding forms, like embedded lambda calculi, or tensor summation and things. This is complicated and I think to some degree orthogonal to the core of the the idea. Instead I’ll use Bool as my running example, which is so simple that it perhaps obscures the idea in the opposite direction.

When you define a data type, you define constructors. Constructors are just functions. This is more readily apparent using GADT syntax.

What makes constructor feel like more than just ordinary functions is that you can pattern match out of them too. Applying constructors and pattern matching out of them is a completely lossless process. The two processes are dual in some sense. In some sense, it seems like programming is one big shuffling game. In some sense. In some sense. In some sense.

In some sense. iN SoME SeNSe

Anyway, pattern matching is it’s own thing that doesn’t feel like other piece of the language. But pattern matching can be captured as a first class object with the notion of an eliminator / recursor function. If you think about it, what pattern matching is is a thing that takes that data type and then gives you the stuff inside the data type. So pattern matching is the same as function that takes in a functions that tell me what to do with that stuff for each case.

The bohm-berarducci encoding of data types makes the pattern matching function the data type itself.

data BoolI where
  TrueI  :: BoolI
  FalseI :: BoolI

type BoolC = forall s. s -> s -> s
truec :: BoolC
truec = \x y -> x
falsec :: BoolC 
falsec = \x y -> y

to :: BoolI -> BoolC
to TrueI = truec
to FalseI = falsec

from :: BoolC -> BoolI
from f = f TrueI FalseI

In the final encoding of the datatype, we replace the data keyword with the class keyword. We can witness the isomorphism with an instance for BoolI and an intepretation function from BoolI to BoolF

class BoolF a where
   truef :: a
   falsef :: a

instance BoolF BoolI where
   truef = TrueI
   falsef = FalseI

interpf :: BoolF a => BoolI -> a
interpf TrueI = truef
interpf FalseI = falsef

However, there are some very nice features of this encoding. Typeclass resolution happens completely at compile time. This means that you can write something once and have it run many ways, with no runtime overhead. This is useful for dsls, especially ones you intend to just immediately interpret out of.

Once way you can have something run many ways is by having a syntax tree for the thing you want to do. Then you can write different intepreters. But then you have the runtime cost of interpretation.

interpi :: BoolI -> Int
interpi TrueI = 40
interpi FalseF = 27

interps :: BoolI -> String
interps TrueI = "hi"
interps FalseF = "fred"

instance BoolF Int where
   truef = 40
   falsef = 27

instance BoolF String where
  truef = "hi"
  falsef = "fred" 

A second feature is the openness of typeclasses compared to data types. Suppose you wanted to add another field to BoolI. Now you need to correct all your functions. However, you can make the new field a new typeclass and all your old functions still work. You can require the power you need.

A third thing is that finally tagless does get you some of the type restriction available with GADTs in a language without them. GADTs are IN SOME SENSE just constructors without the most general inferred type. But they also let you recover the type information you hid away upon pattern matching.

We can see the correspondence in a different way. A typeclass constraint corresponds to the implicit supplying of a dictionary with fields correspond to the typeclass.

s -> s -> s ~ (s,s) -> s ~ {truef :: s, falsef :: s} -> s ~ BoolF s => s

What is finally tagless not so good for? Brains mostly. It is quite a mind bending style to use. If you want to do deep pattern matching in some simplifier, it is possible, yet rather difficult to achieve. I’ve seen this done in some Oleg papers somewhere (on SQL query optimization I think?)

Here’s another example on list

class ListF f where
  cons :: a -> f a -> f a
  nil :: f a

instance ListF [] where
  cons = (:)
  nil = []

interpl :: LiftF f => [a] -> f a
interpl (x : xs) = cons x (interpl xs)
interpl [] = nil


type ListC a = forall s. (a -> s -> s) -> s -> s

Going the other direction from finally tagless is interesting as well. If you take a typeclass and replace the keyword class with data, you get something like the “free” version of that class. Two cases in mind are that of the free monoid and free monad. The usual presentation of these looks different though. That is because they are canonized. These data types need to be thought of as “modulo” the laws of the typeclass, which probably shows up in a custom Eq instance. I’m a little hazy on exactly how to explain the Pure constructors, but you do need them I think.

data FreeMonoid a where
   Mappend :: FreeMonoid a -> FreeMonoid a -> FreeMonoid a
   Mempty :: FreeMonoid a
   Pure :: a -> Freemonoid a
data FreeMonad f a where
   Bind :: FreeMond f a -> (a -> FreeMonad f b) -> FreeMonad f b
   Return :: a -> FreeMonad f a
   Pure' :: f a -> FreeMonad f a
 

http://okmij.org/ftp/tagless-final/JFP.pdf – tagless final paper. Also some very interesting things related to partial evaluation

https://oleg.fi/gists/posts/2019-06-26-linear-church-encodings.html – interesting explanation of bohm-berarducci

http://okmij.org/ftp/tagless-final/course/lecture.pdf – oleg’s course

I thought reflection without remorse had some related form of free monad http://okmij.org/ftp/Haskell/zseq.pdf

A Sketch of Gimped Interval Propagation with Lenses

David Sanders (who lives in Julia land https://github.com/JuliaIntervals ) explained a bit of how interval constraint propagation library worked to me last night. He described it as being very similar to backpropagation, which sets off alarm bells for me.

Backpropagation can be implemented in a point-free functional style using the lens pattern. http://www.philipzucker.com/reverse-mode-differentiation-is-kind-of-like-a-lens-ii/ Lenses are generally speaking a natural way to express in a functional style forward-backward pass algorithm that shares information between the two passes .

I also note Conal Elliot explicitly mentions interval computation in his compiling to categories work http://conal.net/papers/compiling-to-categories/ https://github.com/conal/concat and he does have something working there.

Interval arithmetic itself has already been implemented in Haskell in Ed Kmett’s interval package. https://hackage.haskell.org/package/intervals-0.9.1/docs/Numeric-Interval.html so we can just use that.

The interesting thing the backward pass gives you is that everything feels a bit more relational rather than functional. The backward pass allows you to infer new information using constraints given down the line. For example, fuse :: Lens (a,a) a let’s you enforce that two variables we actually equal. The lens pattern lets you store the forward pass intervals in a closure, so that you can intersect it with the backwards pass intervals.

I make no guarantees what I have here is right. It’s a very rough first pass. It compiles, so that is cool I guess.

Here’s my repo in case I fix more things up and you wanna check it out https://github.com/philzook58/ad-lens/blob/master/src/Numeric/ADLens/Interval.hs

Now having said that, to my knowledge Propagators are a more appropriate technique for this domain. https://www.youtube.com/watch?v=s2dknG7KryQ https://www.youtube.com/watch?v=nY1BCv3xn24 I don’t really know propagators though. It’s on my to do list.

Lens has a couple problems. It is probably doing way more work than it should, and we aren’t iterating to a fixed point.

Maybe an iterated lens would get us closer?

data Lens s t a b = Lens (a -> (b , (b -> (a, Lens s t a b))))

This is one way to go about the iterative process of updating a neural network in a functional way by evaluating it over and over and backpropagating. The updated weights will be stored in those closures. It seems kind of nice. It is clearly some relative of Iteratees and streaming libraries like pipes and conduit (which are also a compositional bidirectional programming pattern), the main difference being that it enforces a particular ordering of passes (for better or worse). Also I haven’t put in any monadic effects, which is to some degree the point of those libraries, but also extremely conceptually clouding to what is going on.

Another interesting possiblity is the type

type Lens s t a b = s -> (a, b -> t)

Lens s (Interval s) a (Interval a)

This has pieces that might be helpful for talking about continuous functions in a constructive way. It has the forward definition of the function, and then the inverse image of intervals. The inverse image function depends on the original evaluation point? Does this actually make sense? The definition of continuity is that this inverse image function must make arbitrarily small image intervals as you give it smaller and smaller range intervals. Continuity is compositional and plays nice with many arithmetic and structural combinators. So maybe something like this might be a nice abstraction for proof carrying continuous functions in Coq or Agda? Pure conjecture.

Solving the Laplace Equations with Linear Relations

The Laplace equation is ubiquitous in physics and engineering.

$latex \nabla^2 \phi = 0 = \partial_x^2 \phi + \partial_y^2 \phi = 0

It and slight variants of it describes electrostatics, magnetostatics, steady state heat flow, elastic flex, pressure, velocity potentials.

There are a couple reasons for that.

  • It results from minimizing the squared gradient of a field |\nabla \phi |^2 which can make sense from an energy minimization perspective.
  • Similarly it results from the combination of a flow conservation law and a linear constitutive relation connecting flow and field (such as Ohm’s law, Fick’s law, or Hooke’s law).
  • It also gets used even if not particularly appropriate because we know how to mathematically deal with it, for example in image processing.

There are a couple of questions we may want to ask about a Laplace equation system

  • Given the field on the boundary, determine the field in the interior (Dirchlet problem)
  • Given the normal derivative of the field on the boundary determine the field in the interior (Neumann problem)
  • Given sources in the interior and 0 boundary condition, determine the field. The Laplace equation is called the Poisson equation when you allow a source term on the right hand side. \nabla^2 \phi = \rho.
  • Given the field at the boundary, determine the derivative at the boundary. Dirichlet-to-Neumann map or Poincare-Steklov operator.

Given the Dirichlet to Neumann map, you do not have to consider the interior of a region to use it. The Dirichlet to Neumann map is sort of the same thing as an effective resistance or scattering matrix. It gives you a black box representation of a region based solely on the variables at its boundary.

This linear relation algebra is useful for many things that I’d have considered a use case for the Schur complement. The Schur complement arises when you do Gaussian elimination on a blocked matrix. It is good that this pattern has a name, because once you know about it, you’ll see it in many places. Domain decomposition, marginalized gaussian distributions, low-rank update, Scattering matrices.

By composing the linear relations corresponding to the Dirchlet-Neumann relations of regions, we can build the Dirichlet-Neumann relations of larger regions.

To make this more concrete, let us take the example of electrical circuits like before. A grid of resistors is a finite difference approximation to the continuous problem

-\nabla \phi = E Electric field is gradient of potential

E = \rho j continuum ohm’s law

\nabla\cdot j = 0 current conservation

In this post, I mentioned how you can make reasonable 2 dimensional diagrams out of a monoidal category, by sort of arbitrarily flipping one wire up and one wire down as in the diagram below. This defines a horizontal and vertical composition which have to do the required book-keeping (associations) to keep an arrow in canonical form. I had considered this for the management method of weights in neural networks, but it is way more natural as the actual geometrical layout of a finite difference grid of a laplace equation.

So we can reuse our categorical circuit combinators to build a finite difference Laplace equation.

Just showing how you can bend a 4-wire monoidal box into a 2-d diagram. Ignore the labels.

This can be implemented in Haskell doing the following. Neato.

Bits and Bobbles

  • Not the tightest post, but I needed to get it out there. I have a short attention span.
  • Homology and defining simplices as categories. One way of describing Homology is about linear operators that are analogues of finite difference operators (or better yet, discrete differential geometry operators / exterior derivatives). To some degree, it is analyzing the required boundary conditions to fully define differential equations on weirdo topological surfaces, which correspond to geometrical loops/holes. You can figure this out by looking at subspaces and quotients of the difference operators. Here we have here a very category theory way of looking at partial differential equations. How does it all connect?
  • Continuous circuit models – https://en.wikipedia.org/wiki/Distributed-element_model Telegrapher’s equation is classic example.
  • Cody mentioned that I could actually build circuits and measure categorical identities in a sense. That’s kind of cool. Or I could draw conductive ink on carbon paper and actually make my string diagrams into circuits? That is also brain tickling
  • Network circuits
  • I really want to get coefficients that aren’t just doubles. allowing rational functions of a frequency \omega would allow analysis of capacitor/inductor circuits, but also tight binding model systems for fun things like topological insulators and the Haldane model http://www.philipzucker.com/topologically-non-trivial-circuit-making-haldane-model-gyrator/ . I may need to leave Haskell. I’m not seeing quite the functionality I need. Use Sympy? https://arxiv.org/abs/1605.02532 HLinear. Flint bindings for haskell? Looks unmaintained. Could also use a grobner basis package as dynamite for a mouse.
  • This is relevant for the boundary element method. Some really cool other stuff relevant here. http://people.maths.ox.ac.uk/martinsson/2014_CBMS/

Blah Blah blah: The subtlest aspect of differential equations is that of boundary conditions. It is more correct usually to consider the system of the interior differential equation and the boundary conditions as equally essential parts of the statement of the problem you are considering.

Categorical LQR Control with Linear Relations

Last time, I described and built an interesting expansion of describing systems via linear functions, that of using linear relations. We remove the requirement that mappings be functional (exactly one output vector for any given input vector), which extends our descriptive capability. This is useful for describing “open” pieces of a linear system like electric circuits. This blog post is about another application: linear dynamical systems and control.

Linear relations are an excellent tool for reachability/controllability. In a control system, it is important to know where it is possible to get the system to. With linear dynamics x_{t+1} = Ax_{t} + B u_{t}, an easy controllability question is “can my system reach into some subspace”. This isn’t quite the most natural question physically, but it is natural mathematically. My first impulse would be to ask “given the state starts in this little region over here, can it get to this little region over here”, but the subspace question is a bit easier to answer. It’s a little funky but it isn’t useless. It can detect if the control parameter only touches 1 of 2 independent dynamical systems for example. We can write the equations of motion as a linear relation and iterate composition on them. See Erbele for more.

There is a set of different things (and kind of more interesting things) that are under the purview of linear control theory, LQR Controllers and Kalman filters. The LQR controller and Kalman filter are roughly (exactly?) the same thing mathematically. At an abstract mathematical level, they both rely on the fact that optimization of a quadratic objective x^T Q x + r^T x + c with a linear constraints Ax=b is solvable in closed form via linear algebra. The cost of the LQR controller could be the squared distance to some goal position for example. When you optimize a function, you set the derivative to 0. This is a linear equation for quadratic objectives. It is a useful framework because it has such powerful computational teeth.

The Kalman filter does a similar thing, except for the problem of state estimation. There are measurements linear related to the state that you want to match with the prior knowledge of the linear dynamics of the system and expected errors of measurement and environment perturbations to the dynamics.

There are a couple different related species of these filters. We could consider discrete time or continuous time. We can also consider infinite horizon or finite horizon. I feel that discrete time and finite horizon are the most simple and fundamental so that is what we’ll stick with. The infinite horizon and continuous time are limiting cases.

We also can consider the dynamics to be fixed for all time steps, or varying with time. Varying with time is useful for the approximation of nonlinear systems, where there are different good nonlinear approximation (taylor series of the dynamics) depending on the current state.

There are a couple ways to solve a constrained quadratic optimization problem. In some sense, the most conceptually straightforward is just to solve for a span of the basis (the “V-Rep” of my previous post) and plug that in to the quadratic objective and optimize over your now unconstrained problem. However, it is commonplace and elegant in many respects to use the Lagrange multiplier method. You introduce a new parameter for every linear equality and add a term to the objective \lambda^T (A x - b). Hypothetically this term is 0 for the constrained solution so we haven’t changed the objective in a sense. It’s a funny slight of hand. If you optimize over x and \lambda, the equality constraint comes out as your gradient conditions on \lambda. Via this method, we convert a linearly constrained quadratic optimization problem into an unconstrained quadratic optimization problem with more variables.

Despite feeling very abstract, the value these Lagrangian variables take on has an interpretation. They are the “cost” or “price” of the equality they correspond to. If you moved the constraint by an amount Ax - b + \epsilon, you would change the optimal cost by an amount \lambda \epsilon. (Pretty sure I have that right. Units check out. See Boyd https://www.youtube.com/watch?v=FJVmflArCXc)

The Lagrange multipliers enforcing the linear dynamics are called the co-state variables. They represent the “price” that the dynamics cost, and are derivatives of the optimal value function V(s) (The best value that can be achieved from state s) that may be familiar to you from dynamic programming or reinforcement learning. See references below for more.

Let’s get down to some brass tacks. I’ve suppressed some terms for simplicity. You can also add offsets to the dynamics and costs.

A quadratic cost function with lagrange multipliers. Q is a cost matrix associated with the x state variables, R is a cost matrix for the u control variables.

C = \lambda_0 (x_0 - \tilde{x}_0) + \sum_t x_t^T Q x_t + u_t^T R u_t + \lambda_{t+1}^T (x_{t+1} - A x_t - B u_t  )

Equations of motion results from optimizing with respect to \lambda by design.

\nabla_{\lambda_{t+1}} C = x_{t+1} - A x_t - B u_t  = 0.

The initial conditions are enforced by the zeroth multiplier.

\nabla_{\lambda_0} C = x_i - x_{0} = 0

Differentiation with respect to the state x has the interpretation of backwards iteration of value derivative, somewhat related to what one finds in the Bellman equation.

\nabla_{x_t} C = Q x_{t} + A^T \lambda_{t+1} - \lambda_{t} = 0 \Longrightarrow \lambda_{t} =  A^T \lambda_{t+1} + Q x_{t}

The final condition on value derivative is the one that makes it the most clear that the Lagrange multiplier has the interpretation of the derivative of the value function, as it sets it to that.

\nabla_{x_N} C = Q x_N - \lambda_{N} = 0

Finally, differentiation with respect to the control picks the best action given knowledge of the value function at that time step.

\nabla_{u_t} C = R u_{t} - B^T \lambda_t  = 0

Ok. Let’s code it up using linear relations. Each of these conditions is a separate little conceptual box. We can build the optimal control step relation by combining these updates together

The string diagram for a single time step of control

The following is a bit sketchy. I am confident it makes sense, but I’m not confident that I have all of the signs and other details correct. I also still need to add the linear terms to the objective and offset terms to the dynamics. These details are all kind of actually important. Still, I think it’s nice to set down in preliminary blog form.

Initial conditions and final conditions. Final conditions fix the final value derivative to Qx. Initial conditions set x to some value. Should there be a leg for lambda on the initial conditions?

Bits and Bobbles

  • The code in context https://github.com/philzook58/ConvexCat/blob/master/src/LinRel.hs
  • Some of the juicier stuff is nonlinear control. Gotta walk before we can run though. I have some suspicions that a streaming library may be useful here, or a lens-related approach. Also ADMM.
  • Reminiscent of Keldysh contour. Probably a meaningless coincidence.
  • In some sense H-Rep is the analog of (a -> b -> Bool) and V-rep [(a,b)]
  • Note that the equations of motion are relational rather than functional for a control systems. The control parameters u describe undetermined variables under our control.
  • Loopback (trace) the value derivative for infinite horizon control.
  • Can solve the Laplace equation with a Poincare Steklov operator. That should be my next post.
  • There is something a touch unsatisfying even with the reduced goals of this post. Why did I have to write down the quadratic optimization problem and then hand translate it to linear relations? It’s kind of unacceptable. The quadratic optimization problem is really the most natural statement, but I haven’t figured out to make it compositional. The last chapter of Rockafellar?

References

https://arxiv.org/abs/1405.6881 Baez and Erbele – Categories in Control

http://www.scholarpedia.org/article/Optimal_control

http://www.argmin.net/2016/05/18/mates-of-costate/ – interesting analogy to backpropagation. Lens connection?

https://courses.cs.washington.edu/courses/cse528/09sp/optimality_chapter.pdf

Linear Relation Algebra of Circuits with HMatrix

Oooh this is a fun one.

I’ve talked before about relation algebra and I think it is pretty neat. http://www.philipzucker.com/a-short-skinny-on-relations-towards-the-algebra-of-programming/. In that blog post, I used finite relations. In principle, they are simple to work with. We can perform relation algebra operations like composition, meet, and join by brute force enumeration.

Unfortunately, brute force may not always be an option. First off, the finite relations grow so enormous as to be make this infeasible. Secondly, it is not insane to talk about relations or regions with an infinite number of elements, such as some continuous blob in 2D space. In that case, we can’t even in principle enumerate all the points in the region. What are we to do? We need to develop some kind of finite parametrization of regions to manipulate. This parametrization basically can’t possibly be complete in some sense, and we may choose more or less powerful systems of description for computational reasons.

In this post, we are going to be talking about linear or affine subspaces of a continuous space. These subspaces are hyperplanes. Linear subspaces have to go through the origin, while affine spaces can have an offset from the origin.

In the previous post, I mentioned that the finite relations formed a lattice, with operations meet and join. These operations were the same as set intersection and union so the introduction of the extra terminology meet and join felt a bit unwarranted. Now the meet and join aren’t union and intersection anymore. We have chosen to not have the capability to represent the union of two vectors, instead we can only represent the smallest subspace that contains them both, which is the union closed under vector addition. For example, the join of a line and point will be the plane that goes through both.

Linear/Affine stuff is great because it is so computational. Most questions you cant to ask are answerable by readily available numerical linear algebra packages. In this case, we’ll use the Haskell package HMatrix, which is something like a numpy/scipy equivalent for Haskell. We’re going to use type-level indices to denote the sizes and partitioning of these spaces so we’ll need some helper functions.

Matrices are my patronum. They make everything good. Artwork courtesy of David

In case I miss any extensions, make typos, etc, you can find a complete compiling version here https://github.com/philzook58/ConvexCat/blob/master/src/LinRel.hs

type BEnum a = (Enum a, Bounded a) 

-- cardinality. `size` was already taken by HMatrix :(
card :: forall a. (BEnum a) => Int
card = (fromEnum (maxBound @a)) - (fromEnum (minBound @a)) + 1

In analogy with sets of tuples for defining finite relations, we partition the components of the linear spaces to be “input” and “output” indices/variables \begin{bmatrix} x_1 & x_2 & x_3 & ... & y_1 & y_2 & y_3 & ... \end{bmatrix}. This partition is somewhat arbitrary and easily moved around, but the weakening of strict notions of input and output as compared to functions is the source of the greater descriptive power of relations.

Relations are extensions of functions, so linear relations are an extension of linear maps. A linear map has the form y = Ax. A linear relation has the form Ax + By = 0. An affine map has the form y = Ax + b and an affine relation has the form Ax + By = b.

There are at least two useful concrete representation for subspaces.

  1. We can write a matrix A and vector b down that corresponds to affine constraints. Ax = b. The subspace described is the nullspace of A plus a solution of the equation. The rows of A are orthogonal to the space.
  2. We can hold onto generators of subspace. x = A' l+b where l parametrizes the subspace. In other words, the subspace is generated by / is the span of the columns of A'. It is the range of A'.

We’ll call these two representations the H-Rep and V-Rep, borrowing terminology from similar representations in polytopes (describing a polytope by the inequalities that define it’s faces or as the convex combination of it’s vertices). https://inf.ethz.ch/personal/fukudak/lect/pclect/notes2015/PolyComp2015.pdf These two representations are dual in many respects.

-- HLinRel holds A x = b constraint
data HLinRel a b = HLinRel (Matrix Double) (Vector Double) deriving Show

-- x = A l + b. Generator constraint. 
data VLinRel a b = VLinRel (Matrix Double) (Vector Double) deriving Show

It is useful to have both reps and interconversion routines, because different operations are easy in the two representations. Any operations defined on one can be defined on the other by sandwiching between these conversion functions. Hence, we basically only need to define operations for one of the reps (if we don’t care too much about efficiency loss which, fair warning, is out the window for today). The bulk of computation will actually be performed by these interconversion routines. The HMatrix function nullspace performs an SVD under the hood and gathers up the space with 0 singular values.

-- if A x = b then x is in the nullspace + a vector b' solves the equation
h2v :: HLinRel a b -> VLinRel a b
h2v (HLinRel a b) = VLinRel a' b' where
        b' = a <\> b -- least squares solution
        a' = nullspace a

-- if x = A l + b, then A' . x = A' A l + A' b = A' b because A' A = 0
v2h :: VLinRel a b -> HLinRel a b
v2h (VLinRel a' b') = HLinRel a b where
        b = a #> b' -- matrix multiply
        a = tr $ nullspace (tr a') -- orthogonal space to range of a.
-- tr is transpose and not trace? A little bit odd, HMatrix.

These linear relations form a category. I’m not using the Category typeclass because I need BEnum constraints hanging around. The identity relations is x = y aka Ix - Iy = 0.

hid :: forall a. BEnum a => HLinRel a a
hid =  HLinRel (i ||| (- i)) (vzero s) where 
                            s = card @a
                            i = ident s

Composing relations is done by combining the constraints of the two relations and then projecting out the interior variables. Taking the conjunction of constraints is easiest in the H-Rep, where we just need to vertically stack the individual constraints. Projection easily done in the V-rep, where you just need to drop the appropriate section of the generator vectors. So we implement this operation by flipping between the two.

hcompose :: forall a b c. (BEnum a, BEnum b, BEnum c) => HLinRel b c -> HLinRel a b -> HLinRel a c
hcompose (HLinRel m b) (HLinRel m' b') = let a'' = fromBlocks [[       ma',           mb' ,    0       ],
                                                               [         0 ,    mb,        mc          ]] in
                                         let b'' = vjoin [b', b] in 
                                         let (VLinRel q p) = h2v (HLinRel a'' b'') in -- kind of a misuse
                                         let q' = (takeRows ca q)  -- drop rows belonging to @b
                                                       === 
                                                  (dropRows (ca + cb) q) in
                                         let [x,y,z] =  takesV [ca,cb,cc] p in
                                         let p'=  vjoin [x,z] in -- rebuild without rows for @b
                                         v2h (VLinRel q' p') -- reconstruct HLinRel
                                       where 
                                           ca = card @a
                                           cb = card @b 
                                           cc = card @c
                                           sb = size b -- number of constraints in first relation
                                           sb' = size b' -- number of constraints in second relation
                                           ma' = takeColumns ca m'
                                           mb' = dropColumns ca m'
                                           mb = takeColumns cb m
                                           mc = dropColumns cb m

(<<<) :: forall a b c. (BEnum a, BEnum b, BEnum c) => HLinRel b c -> HLinRel a b -> HLinRel a c
(<<<) = hcompose

We can implement the general cadre of relation operators, meet, join, converse. I feel the converse is the most relational thing of all. It makes inverting a function nearly a no-op.

hjoin :: HLinRel a b -> HLinRel a b -> HLinRel a b
hjoin v w = v2h $ vjoin' (h2v v) (h2v w)

-- hmatrix took vjoin from me :(
-- joining means combining generators and adding a new generator
-- Closed under affine combination l * x1 + (1 - l) * x2 
vjoin' :: VLinRel a b -> VLinRel a b -> VLinRel a b
vjoin' (VLinRel a b) (VLinRel a' b') = VLinRel (a ||| a' ||| (asColumn (b - b'))) b

-- no constraints, everything
-- trivially true
htop :: forall a b. (BEnum a, BEnum b) => HLinRel a b 
htop = HLinRel (vzero (1,ca + cb)) (konst 0 1) where 
                                      ca = card @a
                                      cb = card @b 

-- hbottom?
                              
hconverse :: forall a b. (BEnum a, BEnum b) => HLinRel a b -> HLinRel b a 
hconverse (HLinRel a b) = HLinRel ( (dropColumns ca a) |||  (takeColumns ca a)) b where 
    ca = card @a
    cb = card @b  

Relational inclusion is the question of subspace inclusion. It is fairly easy to check if a VRep is in an HRep (just see plug the generators into the constraints and see if they obey them) and by using the conversion functions we can define it for arbitrary combos of H and V.


-- forall l. A' ( A l + b) == b'
-- is this numerically ok? I'm open to suggestions.
vhsub :: VLinRel a b -> HLinRel a b -> Bool
vhsub (VLinRel a b) (HLinRel a' b') = (naa' <=  1e-10 * (norm_2 a') * (norm_2 a)  ) && ((norm_2 ((a' #> b) - b')) <= 1e-10 * (norm_2 b')  ) where
          naa' = norm_2 (a' <> a)

hsub :: HLinRel a b -> HLinRel a b -> Bool
hsub h1 h2 = vhsub (h2v h1) h2

heq :: HLinRel a b -> HLinRel a b -> Bool
heq a b = (hsub a b) && (hsub b a)


instance Ord (HLinRel a b) where
  (<=) = hsub
  (>=) = flip hsub 

instance Eq (HLinRel a b) where
  (==) = heq

It is useful the use the direct sum of the spaces as a monoidal product.

hpar :: HLinRel a b -> HLinRel c d -> HLinRel (Either a c) (Either b d)
hpar (HLinRel mab v) (HLinRel mcd v') = HLinRel (fromBlocks [ [mab, 0], [0 , mcd]]) (vjoin [v, v']) where

hleft :: forall a b. (BEnum a, BEnum b) => HLinRel a (Either a b)
hleft = HLinRel ( i ||| (- i) ||| (konst 0 (ca,cb))) (konst 0 ca) where 
    ca = card @a
    cb = card @b  
    i = ident ca

hright :: forall a b. (BEnum a, BEnum b) => HLinRel b (Either a b)
hright = HLinRel ( i ||| (konst 0 (cb,ca)) ||| (- i) ) (konst 0 cb) where 
    ca = card @a
    cb = card @b  
    i = ident cb

htrans :: HLinRel a (Either b c) -> HLinRel (Either a b) c 
htrans (HLinRel m v) = HLinRel m v

hswap :: forall a b. (BEnum a, BEnum b) => HLinRel (Either a b) (Either b a)
hswap = HLinRel (fromBlocks [[ia ,0,0 ,-ia], [0, ib,-ib,0]]) (konst 0 (ca + cb)) where 
        ca = card @a
        cb = card @b  
        ia = ident ca
        ib = ident cb


hsum :: forall a. BEnum a => HLinRel (Either a a) a
hsum = HLinRel ( i ||| i ||| - i ) (konst 0 ca)  where 
        ca = card @a 
        i= ident ca

hdup :: forall a. BEnum a => HLinRel a (Either a a)
hdup = HLinRel (fromBlocks [[i, -i,0 ], [i, 0, -i]]) (konst 0 (ca + ca))  where 
        ca = card @a 
        i= ident ca

hdump :: HLinRel a Void
hdump = HLinRel 0 0

hlabsorb ::forall a. BEnum a => HLinRel (Either Void a) a
hlabsorb = HLinRel m v where (HLinRel m v) = hid @a 

A side note: Void causes some consternation. Void is the type with no elements and is the index type of a 0 dimensional space. It is the unit object of the monoidal product. Unfortunately by an accident of the standard Haskell definitions, actual Void is not a BEnum. So, I did a disgusting hack. Let us not discuss it more.

Circuits

Baez and Fong have an interesting paper where they describe building circuits using a categorical graphical calculus. We have the pieces to go about something similar. What we have here is a precise way in which circuit diagrams can be though of as string diagrams in a monoidal category of linear relations.

An idealized wire has two quantities associated with it, the current flowing through it and the voltage it is at.

-- a 2d space at every wire or current and voltage.
data IV = I | V deriving (Show, Enum, Bounded, Eq, Ord)

When we connect wires, the currents must be conserved and the voltages must be equal. hid and hcompose from above still achieve that. Composing two independent circuits in parallel is achieve by hpar.

Independent resistors in parallel.

We will want some basic tinker toys to work with.

A resistor in series has the same current at both ends and a voltage drop proportional to the current

resistor :: Double -> HLinRel IV IV
resistor r = HLinRel ( (2><4)  [ 1,0,-1,   0,
                                 r, 1, 0, -1]) (konst 0 2) 

Composing two resistors in parallel adds the resistance. (resistor r1) <<< (resistor r2) == resistor (r1 + r2))

A bridging resistor allows current to flow between the two branches

bridge :: Double -> HLinRel (Either IV IV) (Either IV IV)
bridge r = HLinRel (  (4><8) [ 1,0, 1,  0, -1, 0, -1,  0, -- current conservation
                               0, 1, 0, 0, 0, -1 , 0,  0, --voltage maintained left
                               0, 0, 0, 1, 0,  0,  0, -1, -- voltage maintained right
                               r, 1, 0,-1, -r,  0,  0, 0  ]) (konst 0 4)  
A bridging resistor

Composing two bridge circuits is putting the bridge resistors in parallel. The conductance G=\frac{1}{R} of resistors in parallel adds. hcompose (bridge r1) (bridge r2) == bridge 1 / (1/r1 + 1/r2).

parallel resistors compose

An open circuit allows no current to flow and ends a wire. open ~ resistor infinity

open :: HLinRel IV Void
open = HLinRel (fromList [[1,0]]) (konst 0 1)

At branching points, the voltage is maintained, but the current splits.

cmerge :: HLinRel (Either IV IV) IV
cmerge = HLinRel (fromList [[1, 0, 1, 0, -1, 0],
                           [0,1,0,0,0 ,-1  ],
                           [0,0,0,1, 0, -1]])  (konst 0 3)

This cmerge combinator could also be built using a short == bridge 0 , composing a branch with open, and then absorbing the Void away.

We can bend wires up or down by using a composition of cmerge and open.

cap :: HLinRel  (Either IV IV) Void
cap  = hcompose open cmerge

cup :: HLinRel Void (Either IV IV)
cup = hconverse cap

ground :: HLinRel IV Void
ground = HLinRel ( (1><2) [ 0 , 1 ]) (vzero 1) 

Voltage and current sources enforce current and voltage to be certain values

vsource :: Double -> HLinRel IV IV
vsource v = HLinRel ( (2><4) [ 1,0,-1,   0,
                               0, 1, 0, -1]) (fromList [0,v])  

isource :: Double -> HLinRel IV IV
isource i = HLinRel (fromList [ [1,0, -1,   0], -- current conservation
                                [1, 0, 0,  0]]) (fromList [0,i])  

Measurements of circuits proceed by probes.

type VProbe = ()
vprobe :: HLinRel IV VProbe
vprobe = HLinRel ( (2><3)  [1,0,0,
                            0,1,-1]) (konst 0 2)  

Inductors and capacitors could be included easily, but would require the entries of the HMatrix values to be polynomials in the frequency \omega, which it does not support (but it could!). We'll leave those off for another day.

We actually can determine that the rules suggested above are being followed by computation.

r20 :: HLinRel IV IV
r20 = resistor 20

main :: IO ()
main = do
            print (r20 == (hid <<< r20))
            print (r20 == r20 <<< hid)
            print (r20 == (hmeet r20 r20))
            print $ resistor 50 == r20 <<< (resistor 30)
            print $ (bridge 10) <<< (bridge 10) == (bridge 5)
            print $ v2h (h2v r20) == r20
            print $ r20 <= htop
            print $ hconverse (hconverse r20) == r20
            print $ (open <<< r20) == open

Bits and Bobbles

  • Homogenous systems are usually a bit more elegant to deal with, although a bit more unfamiliar and abstract.
  • Could make a pandas like interface for linear relations that uses numpy/scipy.sparse for the computation. All the swapping and associating is kind of fun to design, not so much to use. Labelled n-way relations are nice for users.
  • Implicit/Lazy evaluation. We should let the good solvers do the work when possible. We implemented our operations eagerly. We don't have to. By allowing hidden variables inside our relations, we can avoid the expensive linear operations until it is useful to actually compute on them.
  • Relational division = quotient spaces?
  • DSL. One of the beauties of the pointfree/categorical approach is that you avoid the need for binding forms. This makes for a very easily manipulated DSL. The transformations feel like those of ordinary algebra and you don't have to worry about the subtleties of index renaming or substitution under binders.
  • Sparse is probably really good. We have lots of identity matrices and simple rearrangements. It is very wasteful to use dense operations on these.
  • Schur complement https://en.wikipedia.org/wiki/Schur_complement are the name in the game for projecting out pieces of linear problems. We have some overlap.
  • Linear relations -> Polyhedral relations -> Convex Relations. Linear is super computable, polyhedral can blow up. Rearrange a DSL to abuse Linear programming as much as possible for queries.
  • Network circuits. There is an interesting subclass of circuits that is designed to be pretty composable.

https://en.wikipedia.org/wiki/Two-port_network Two port networks are a very useful subclass of electrical circuits. They model transmission lines fairly well, and easily composable for filter construction.

It is standard to describe these networks by giving a linear function between two variables and the other two variables. Depending on your choice of which variables depend on which, these are called the z-parameters, y-parameters, h-parameters, scattering parameters, abcd parameters. There are tables of formula for converting from one form to the others. The different parameters hold different use cases for composition and combining in parallel or series. From the perspective of linear relations this all seems rather silly. The necessity for so many descriptions and the confusing relationship between them comes from the unnecessary and overly rigid requirement of have a linear function-like relationship rather than just a general relation, which depending of the circuit may not even be available (there are degenerate configurations where two of the variables do not imply the values of the other two). A function relationship is always a lie (although a sometimes useful one), as there is always back-reaction of new connections.

-- voltage divider
divider :: Double -> Double -> HLinRel (Either IV IV) (Either IV IV)
divider r1 r2 = hcompose (bridge r2) (hpar (resistor r1) hid) 

The relation model also makes clearer how to build lumped models out of continuous ones. https://en.wikipedia.org/wiki/Lumped-element_model

https://en.wikipedia.org/wiki/Transmission_line https://en.wikipedia.org/wiki/Distributed-element_model

    null
  • Because the type indices have no connection to the actual data types (they are phantom) it is a wise idea to use smart constructors that check that the sizes of the matrices makes sense.

-- smart constructors
hLinRel :: forall a b. (BEnum a, BEnum b) => Matrix Double -> Vector Double -> Maybe (HLinRel a b) 
hLinRel m v | cols m == (ca + cb) &&  (size v == rows m)  = Just (HLinRel m v)
            |  otherwise = Nothing  where 
                 ca = card @a
                 cb = card @b  
  • Nonlinear circuits. Grobner Bases and polynomial relations?
  • Quadratic optimization under linear constraints. Can't get it to come out right yet. Clutch for Kalman filters. Nice for many formulations like least power, least action, minimum energy principles. Edit: I did more in this direction here http://www.philipzucker.com/categorical-lqr-control-with-linear-relations/
  • Quadratic Operators -> Convex operators. See last chapter of Rockafellar.
  • Duality of controllers and filters. It is well known (I think) that for ever controller algorithm there is a filter algorithm that is basically the same thing.
    • LQR - Kalman
    • Viterbi filter - Value function table
    • particle filter - Monte Carlo control
    • Extended Kalman - iLQR-ish? Use local approximation of dynamics
    • unscented kalman - ?

Refs

Neural Networks with Weighty Lenses (DiOptics?)

I wrote a while back how you can make a pretty nice DSL for reverse mode differentiation based on the same type as Lens. I’d heard some interesting rumblings on the internet around these ideas and so was revisiting them.

type Lens s t a b = s -> (a, b -> t)
type AD x dx y dy = x -> (y, dy -> dx)

Composition is defined identically for reverse mode just as it is for lens.

The forward computation shares info with the backwards differential propagation, which corresponds to a transposed Jacobian

After chewing on it a while, I realized this really isn’t that exotic. How it works is that you store the reverse mode computation graph, and all necessary saved data from the forward pass in the closure of the (dy -> dx). I also have a suspicion that if you defunctionalized this construction, you’d get the Wengert tape formulation of reverse mode ad.

Second, Lens is just a nice structure for bidirectional computation, with one forward pass and one backward pass which may or may not be getting/setting. There are other examples for using it like this.

It is also pretty similar to the standard “dual number” form type FAD x dx y dy = (x,dx)->(y,dy) for forward mode AD. We can bring the two closer by a CPS/Yoneda transformation and then some rearrangement.

     x -> (y, dy -> dx) 
==>  x -> (y, forall s. (dx -> s) -> (dy -> s))
==>  forall s. (x, dx -> s) -> (y, dx -> s) 

and meet it in the middle with

(x,dx) -> (y,dy)
==> forall s. (x, s -> dx) -> (y, s -> dy)

I ended the previous post somewhat unsatisfied by how ungainly writing that neural network example was, and I called for Conal Elliot’s compiling to categories plugin as a possible solution. The trouble is piping the weights all over the place. This piping is very frustrating in point-free form, especially when you know it’d be so trivial pointful. While the inputs and outputs of layers of the network compose nicely (you no longer need to know about the internal computations), the weights do not. As we get more and more layers, we get more and more weights. The weights are in some sense not as compositional as the inputs and outputs of the layers, or compose in a different way that you need to maintain access to.

I thought of a very slight conceptual twist that may help.

The idea is we keep the weights out to the side in their own little type parameter slots. Then we define composition such that it composes input/outputs while tupling the weights. Basically we throw the repetitive complexity appearing in piping the weights around into the definition of composition itself.

These operations are easily seen as 2 dimensional diagrams.

Three layers composed, exposing the weights from all layers
The 2-D arrow things can be built out of the 1-d arrows of the original basic AD lens by bending the weights up and down. Ultimately they are describing the same thing

Here’s the core reverse lens ad combinators

import Control.Arrow ((***))

type Lens'' a b = a -> (b, b -> a)

comp :: (b -> (c, (c -> b))) -> (a -> (b, (b -> a))) -> (a -> (c, (c -> a)))
comp f g x = let (b, dg) = g x in
             let (c, df) = f b in
             (c, dg . df)

id' :: Lens'' a a
id' x = (x, id) 

relu' :: (Ord a, Num a) => Lens'' a a
relu' = \x -> (frelu x, brelu x) where
        frelu x | x > 0 = x
                | otherwise = 0
        brelu x dy | x > 0 = dy
                   | otherwise = 0

add' :: Num a => Lens'' (a,a) a 
add' = \(x,y) -> (x + y, \ds -> (ds, ds))

dup' :: Num a => Lens'' a (a,a)
dup' = \x -> ((x,x), \(dx,dy) -> dx + dy)

sub' :: Num a => Lens'' (a,a) a 
sub' = \(x,y) -> (x - y, \ds -> (ds, -ds))

mul' :: Num a => Lens'' (a,a) a 
mul' = \(x,y) -> (x * y, \dz -> (dz * y, x * dz))

recip' :: Fractional a => Lens'' a a 
recip' = \x-> (recip x, \ds -> - ds / (x * x))

div' :: Fractional a => Lens'' (a,a) a 
div' = (\(x,y) -> (x / y, \d -> (d/y,-x*d/(y * y))))

sin' :: Floating a => Lens'' a a
sin' = \x -> (sin x, \dx -> dx * (cos x))

cos' :: Floating a => Lens'' a a
cos' = \x -> (cos x, \dx -> -dx * (sin x))

pow' :: Num a => Integer -> Lens'' a a
pow' n = \x -> (x ^ n, \dx -> (fromInteger n) * dx * x ^ (n-1)) 

--cmul :: Num a => a -> Lens' a a
--cmul c = lens (* c) (\x -> \dx -> c * dx)

exp' :: Floating a => Lens'' a a
exp' = \x -> let ex = exp x in
                      (ex, \dx -> dx * ex)

fst' :: Num b => Lens'' (a,b) a
fst' = (\(a,b) -> (a, \ds -> (ds, 0)))

snd' :: Num a => Lens'' (a,b) b
snd' = (\(a,b) -> (b, \ds -> (0, ds)))

-- some monoidal combinators
swap' :: Lens'' (a,b) (b,a)
swap' = (\(a,b) -> ((b,a), \(db,da) -> (da, db)))

assoc' :: Lens'' ((a,b),c) (a,(b,c))
assoc' = \((a,b),c) -> ((a,(b,c)), \(da,(db,dc)) -> ((da,db),dc))

assoc'' :: Lens'' (a,(b,c)) ((a,b),c)
assoc'' = \(a,(b,c)) -> (((a,b),c), \((da,db),dc)->  (da,(db,dc)))

par' :: Lens'' a b -> Lens'' c d -> Lens'' (a,c) (b,d)
par' l1 l2 = l3 where
    l3 (a,c) = let (b , j1) = l1 a in
               let (d, j2) = l2 c in
               ((b,d) , j1 *** j2) 
first' :: Lens'' a b -> Lens'' (a, c) (b, c)
first' l = par' l id'

second' :: Lens'' a b -> Lens'' (c, a) (c, b)
second' l = par' id' l

labsorb :: Lens'' ((),a) a
labsorb (_,a) = (a, \a' -> ((),a'))

labsorb' :: Lens'' a ((),a)
labsorb' a = (((),a), \(_,a') -> a')

rabsorb :: Lens'' (a,()) a
rabsorb = comp labsorb swap'

And here are the two dimensional combinators. I tried to write them point-free in terms of the combinators above to demonstrate that there is no monkey business going on. We

type WAD' w w' a b = Lens'' (w,a) (w',b)
type WAD'' w a b = WAD' w () a b -- terminate the weights for a closed network
{- For any monoidal category we can construct this composition? -}
-- horizontal composition
hcompose :: forall w w' w'' w''' a b c. WAD' w' w'' b c -> WAD' w w''' a b -> WAD' (w',w) (w'',w''') a c
hcompose f g = comp f' g' where 
               f' :: Lens'' ((w',r),b) ((w'',r),c)
               f' = (first' swap') `comp` assoc'' `comp` (par' id' f) `comp` assoc' `comp`  (first' swap') 
               g' :: Lens'' ((r,w),a) ((r,w'''),b)
               g' = assoc'' `comp` (par' id' g) `comp` assoc' 



rotate :: WAD' w w' a b -> WAD' a b w w'                                      
rotate f = swap' `comp` f `comp` swap'

-- vertical composition of weights
vcompose :: WAD' w'  w'' c d -> WAD' w w' a b -> WAD' w w'' (c, a) (d, b)
vcompose f g = rotate (hcompose (rotate f)  (rotate g) )                             

-- a double par.
diagpar :: forall w w' a b w'' w''' c d. WAD' w  w' a b -> WAD' w'' w''' c d 
           -> WAD' (w,w'') (w',w''') (a, c) (b, d)
diagpar f g = t' `comp` (par' f g) `comp` t where
                t :: Lens'' ((w,w''),(a,c)) ((w,a), (w'',c)) -- yikes. just rearrangements.
                t =  assoc'' `comp` (second' ((second' swap') `comp` assoc' `comp` swap')) `comp` assoc'
                t' :: Lens'' ((w',b), (w''',d)) ((w',w'''),(b,d)) -- the tranpose of t
                t' =  assoc'' `comp` (second'  ( swap'  `comp` assoc'' `comp` (second' swap')))  `comp` assoc'

id''' :: WAD' () () a a
id''' = id'






-- rotate:: WAD' w a a w
-- rotate = swap'

liftIO :: Lens'' a b -> WAD' w w a b
liftIO = second'

liftW :: Lens'' w w' -> WAD' w w' a a
liftW = first'


wassoc' = liftW assoc' 
wassoc'' = liftW assoc'' 

labsorb'' :: WAD' ((),w) w a a
labsorb'' = first' labsorb

labsorb''' :: WAD' w ((),w) a a
labsorb''' = first' labsorb'

wswap' :: WAD' (w,w') (w',w) a a
wswap' = first' swap'
-- and so on we can lift all combinators

I wonder if this is actually nice?

I asked around and it seems like this idea may be what davidad is talking about when he refers to dioptics

http://events.cs.bham.ac.uk/syco/strings3-syco5/slides/dalrymple.pdf

Perhaps this will initiate a convo.

Edit: He confirms that what I’m doing appears to be a dioptic. Also he gave a better link http://events.cs.bham.ac.uk/syco/strings3-syco5/papers/dalrymple.pdf

He is up to some interesting diagrams

https://twitter.com/davidad/status/1179760373030801408?s=20

Bits and Bobbles

  • Does this actually work or help make things any better?
  • Recurrent neural nets flip my intended role of weights and inputs.
  • Do conv-nets naturally require higher dimensional diagrams?
  • This weighty style seems like a good fit for my gauss seidel and iterative LQR solvers. A big problem I hit there was getting all the information to the outside, which is a similar issue to getting the weights around in a neural net.

Functors, Vectors, and Quantum Circuits

Vectors are dang useful things, and any energy you put into them seems to pay off massive dividends.

Vectors and Linear Algebra are useful for:

  • 2D, 3D, 4D geometry stuff. Computer graphics, physics etc.
  • Least Squares Fitting
  • Solving discretized PDEs
  • Quantum Mechanics
  • Analysis of Linear Dynamical Systems
  • Probabilistic Transition Matrices

There are certain analogies between Haskell Functors and Vectors that corresponds to a style of computational vector mathematics that I think is pretty cool and don’t see talked about much.

Due to the expressivity of its type system, Haskell has a first class notion of container that many other languages don’t. In particular, I’m referring to the fact that Haskell has higher kinded types * -> * (types parametrized on other types) that you can refer to directly without filling them first. Examples in the standard library include Maybe, [], Identity, Const b, and Either b. Much more vector-y feeling examples can be found in Kmett’s linear package V0, V1, V2, V3, V4. For example, the 4 dimensional vector type V4

data V4 a = V4 a a a a

This really isn’t such a strange, esoteric thing as it may appear. You wouldn’t blink an eye at the type

struct V4 {
   double x, y, z, w;
}

in some other language. What makes Haskell special is how compositional and generic it is. We can build thousand element structs with ease via composition. What we have here is an alternative to the paradigm of computational vectors ~ arrays. Instead we have computational vectors ~ structs. In principle, I see no reason why this couldn’t be as fast as arrays, although with current compiler expectations it probably isn’t.

Monoidal categories are a mathematical structure that models this analogy well. It has been designed by mathematicians for aesthetic elegance, and it seems plausible that following its example leads us to interesting, useful, and pleasant vector combinators. And I personally find something that tickles me about category theory.

So to get started, let’s talk a bit about functors.

The Algebra of Functors

Functors in Haskell are a typeclass for containers. They allow you to map functions over all the items in the container. They are related to the categorical notion of functor, which is a mapping between categories.

type Container = * -> * -- Note: This is actually a kind signature.
-- Kinds and types are the same thing in Haskell.

You can lift the product and sum of types to the product and sum of Functors which you may find in Data.Functor.Product and Data.Functor.Sum. This is analogous to the lifting of ordinary addition and multiplication to the addition and multiplication of polynomials, which are kind of like numbers with a “hole”.

newtype Product f g a = Pair (f a , g a)
newtype Sum f g a = Sum (Either (f a) (g a))

Functors also compose. A container of containers of a is still a container of a. We can form composite containers by using the Compose newtype wrapper.

newtype Compose f g a = Compose (f (g a))

When you use this Compose newtype, instead of having to address the individual elements by using fmap twice, a single application of fmap will teleport you through both layers of the container.

Product, Sum, and Compose are all binary operator on functors. The type constructor has kind

{- Enter into ghci -}
:kind Compose ==> (* -> *) -> (* -> *) -> (* -> *)
:kind Product ==> (* -> *) -> (* -> *) -> (* -> *)
:kind Sum ==> (* -> *) -> (* -> *) -> (* -> *)

Some important other functors from the algebra of types perspective are Const Void a, Const () a, and Identity a. These are identity elements for Sum, Product, and Compose respectively.

You can define mappings between containers that don’t depend on the specifics of their contents. These mappings can only rearrange, copy and forget items of their contained type. This can be enforced at the type level by the polymorphic type signature

type f ~> g = forall a. f a -> g a

These mappings correspond in categorical terminology to natural transformations between the functors f and g. There is a category where objects are Functors and morphisms are natural transformations. Sum, Product, and Compose all obeys the laws necessary to be a monoidal product on this category.

How the lifting of functions works for Compose is kind of neat.

mon_prod :: Functor f' => (f ~> f') -> (g ~> g') -> (Compose f g ~> Compose f' g')
mon_prod ntf ntg (Compose fg) = Compose (fmap ntg (ntf fg))
-- or equvalently Compose (ntf (fmap ntg fg)) with a (Functor f) typeclass requirement.

Because the natural transformations require polymorphic types, when you apply ntf to fg the polymorphic variable a in the type of ntf restricts to a ~ g a'.

Product and Sum have a straight forward notion of commutativity ( (a,b) is isomorphic to (b,a)) . Compose is more subtle. sequenceA from the Traversable typeclass can swap the ordering of composition. sequenceA . sequenceA may or may not be the identity depending on the functors in question, so it has some flavor of a braiding operation. This is an interesting post on that topic https://parametricity.com/posts/2015-07-18-braids.html

Combinators of these sorts are used arise in at least the following contexts

  • Data types a la carte – A systematic way of building extensible data types
  • GHC Generics – A system for building generic functions that operate on data types that can be described with sums, products, recursion, and holes.
  • In and around the Lens ecosystem

Also see the interesting post by Russell O’Connor and functor oriented programming http://r6.ca/blog/20171010T001746Z.html. I think the above is part of that to which he is referring.

Vector Spaces as Shape

Vector spaces are made of two parts, the shape (dimension) of the vector space and the scalar.

Just as a type of kind * -> * can be thought of as a container modulo it’s held type, it can also be a vector modulo its held scalar type. The higher kinded type for vector gives an explicit slot to place the scalar type.

type Vect = * -> *

The standard Haskell typeclass hierarchy gives you some of the natural operations on vectors if you so choose to abuse it in that way.

  • Functor ~> Scalar Multiplication: smul s = fmap (* s)
  • Applicative ~> Vector Addition: vadd x y = (+) <$> x <*> y
  • Traversable ~> Tranposition. sequenceA has the type of transposition and works correctly for the linear style containers like V4.

The linear library does use Functor for scalar multiplication, but defines a special typeclass for addition, Additive. I think this is largely for the purposes for bringing Map like vectors into the fold, but I’m not sure.

Once we’ve got the basics down of addition and scalar multiplication, the next thing I want is operations for combining vector spaces. Two important ones are the Kronecker product and direct sum. In terms of indices, the Kronecker product is a space that is indexed by the cartesian product (,) of its input space indices and the direct sum is a space indexed by the Either of its input space indices. Both are very useful constructs. I use the Kronecker product all the time when I want to work on 2D or 3D grids for example. If you’ll excuse my python, here is a toy 2-D finite difference Laplace equation example. We can lift the 1D second derivative matrix K = \partial_x^2 using the kronecker product K2 = K \otimes I + I \otimes K. The direct sum is useful as a notion of stacking matrices.

import numpy as np
N = 10 # We're making a 10x10 grid
row = np.zeros(N)
row[0] = -2
row[1] = 1
K = np.toeplitz(row,row) #toeplitz makes constant diagonal matrices
I = np.eye(N) #identity matrix
K2 = np.kron(K,I) + np.kron(I,K)

The following is perhaps the most important point of the entire post.

type Kron = Compose
type DSum = Product

Compose of vector functors gives the Kronecker product, and Product gives the direct sum (this can be confusing but its right. Remember, the sum in direct sum refers to the indices).

We can form the Kronecker product of vectors given a Functor constraint.

kron :: (Num a, Functor f, Functor g) => f a -> g a -> Kron f g a
kron f g = Compose $ fmap (\s1 -> fmap (\s2 -> s1 * s2) g) f

dsum :: f a -> g a -> DSum f g a
dsum f g = Pair f g

Notice we have two distinct but related things called kron: Kron and kron. One operates on vectors spaces and the other operates on vector values.

Building vector spaces out of small combinators like V2, V4, DSum, Kron is interesting for a number of reasons.

  • It is well typed. Similar to Nat indexed vectors, the types specify the size of the vector space. We can easily describe vector spaced as powers of 2 as V16 = Kron V2 (Kron V2 (Kron V2 (Kron V2 V1))), similarly in terms of its prime factors, or we can do a binary expansion (least significant bit first) V5 = DSum V1 (Kron V2 (DSum V0 (Kron V2 V1))) or other things. We do it without going into quasi-dependently typed land or GADTs.
  • It often has better semantic meaning. It is nice to say Measurements, or XPosition or something rather than just denote the size of a vector space in terms of a nat. It is better to say a vector space is the Kron of two meaningful vector spaces than to just say it is a space of size m*n. I find it pleasant to think of the naturals as a free Semiring rather than as the Peano Naturals and I like the size of my vector space defined similarly.
  • Interesting opportunities for parallelism. See Conal Elliott’s paper on scans and FFT: http://conal.net/papers/generic-parallel-functional/

What do linear operators look like?

In the Vectors as shape methodology, Vectors look very much like Functors.

I have been tempted to lift the natural transformation type above to the following for linear operators.

type LinOp f g = forall a. (Num a) => f a -> g a

In a sense this works, we could implement kron because many of the container type (V1, V2, V3, etc) in the linear package implement Num. However, choosing Num is a problem. Why not Fractional? Why not Floating? Sometimes we want those. Why not just specifically Double?

We don’t really want to lock away the scalar in a higher rank polymorphic type. We want to ensure that everyone is working in the same scalar type before allowing things to proceed.

type LinOp f g a = f a -> g a

Note also that this type does not constrain us to linearity. Can we form the Kronecker product of linear operators? Yes, but I’m not in love with it. This is not nearly so beautiful as the little natural transformation dance.

kron''' :: (Applicative f', Applicative g', Traversable f, Traversable g') =>
    (f a -> f' a) -> (g a -> g' a) -> (Kron f g a -> Kron f' g' a)
kron'' lf lg (Compose fga) = Compose $ sequenceA $ (fmap lf) $ sequenceA $ (fmap lg fga)

This was a nice little head scratcher for me. Follow the types, my friend! I find this particularly true for uses of sequenceA. I find that if I want the containers swapped in ordering. In that situation sequenceA is usually the right call. It could be called transpose.

Giving the vector direct access to the scalar feels a bit off to me. I feel like it doesn’t leave enough “room” for compositionally. However, there is another possibility for a definition of morphisms could be that I think is rather elegant.

type LinOp1 f g a = forall k. Additive k => Kron f k a -> Kron g k a

Does this form actually enforce linearity? You may still rearrange objects. Great. You can also now add and scalar multiply them with the Additive k constraint. We also expose the scalar, so it can be enforced to be consistent.

One other interesting thing to note is that these forms allow nonlinear operations. fmap, liftU2 and liftI2 are powerful operations, but I think if we restricted Additive to just a correctly implemented scalar multiply and vector addition operation, and zero, we’d be good.

class Additive' f where
  smul :: Num a => a -> f a -> f a
  vadd :: Num a => f a -> f a -> f a
  zero :: Num a => f a

We can recover the previous form by instantiation k to V1. V1, the 1-d vector space, is almost a scalar and can play the scalars role in many situations. V1 is the unit object with respect to the monoidal product Kron.

There seems to be a missing instance to Additive that is useful. There is probably a good reason it isn’t there, but I need it.

instance (Additive g, Additive f) => Additive (Compose f g) where
    (Compose v) ^+^ (Compose w) = Compose (liftU2 (^+^) v w)
    zero = zero
    liftU2 f (Compose x) (Compose y) = Compose $ liftU2 (liftU2 f) x y
    liftI2 f (Compose x) (Compose y) = Compose $ liftI2 (liftI2 f) x y

Monoidal Categories

The above analogy can be put into mathematical terms by noting that both vectors and functor are monoidal categories. I talked a quite a bit about monoidal categories in a previous post http://www.philipzucker.com/a-touch-of-topological-computation-3-categorical-interlude/ .

Categories are the combo of a collection of objects and arrows between the objects. The arrows can compose as long as the head of one is on the same object as the tail of the other. On every object, there is always an identity arrow, which when composed will do nothing.

We need a little extra spice to turn categories into monoidal categories. One way of thinking about it is that monoidal categories have ordinary category composition and some kind of horizontal composition, putting things side to side. Ordinary composition is often doing something kind of sequentially, applying a sequence of functions, or a sequence of matrices. The horizontal composition is often something parallel feeling, somehow applying the two arrows separately to separate pieces of the system.

Why are they called Monoidal?

There is funny game category people play where they want to lift ideas from other fields and replace the bits and pieces in such a way that the entire thing is defined in terms of categorical terminology. This is one such example.

A monoid is a binary operations that is associative and has an identity.

Sometimes people are more familiar with the concept of a group. If not, ignore the next sentence. Monoids are like groups without requiring an inverse.

Numbers are seperately monoids under both addition, multiplication and minimization (and more), all of which are associative operations with identity (0, 1, and infinity respectively).

Exponentiation is a binary operation that is not a monoid, as it isn’t associative.

The Monoid typeclass in Haskell demonstrates this http://hackage.haskell.org/package/base-4.12.0.0/docs/Data-Monoid.html

class Semigroup a => Monoid a where
        mempty  :: a
        mappend :: a -> a -> a

A common example of a monoid is list, where mempty is the empty list and mappend appends the lists.

There are different set-like intuitions for categories. One is that the objects in the category are big opaque sets. This is the case for Hask, Rel and Vect.

A different intuitiion is that the category itself is like a set, and the objects are the elements of that set. There just so happens to be some extra structure knocking around in there: the morphisms. This is the often more the feel for the examples of preorders or graphs. The word “monoidal” means that they we a binary operation on the objects. But in the category theory aesthetic, you also need that binary operation to “play nice” with the morphisms that are hanging around too.

Functors are the first thing that has something like this. It has other properties that come along for the ride. A Functor is a map that takes objects to objects and arrows to arrows in a nice way. A binary functor takes two objects to and object, and two arrows to one arrow in a way that plays nice (commutes) with arrow composition.

String diagrams

String diagrams are a graphical notation for monoidal categories. Agin I discussed this more here.

Morphisms are denoted by boxes. Regular composition is shown by plugging arrows together vertically. Monoidal product is denoted by putting the arrows side to side.

When I was even trying to describe what a monoidal category was, I was already using language evocative of string diagrams.

You can see string diagrams in the documentation for the Arrow library. Many diagrams that people use in various fields can be formalized as the string diagrams for some monoidal category. This is big chunk of Applied Category Theory.

This is the connection to quantum circuits, which are after all a graphical notation for very Kroneckery linear operations.

type Qubit = V2
type C = Complex Double

assoc :: Functor f => (Kron (Kron f g) h) ~> (Kron f (Kron g h))
assoc = Compose . (fmap Compose) . getCompose . getCompose

assoc' :: Functor f =>  (Kron f (Kron g h)) ~> (Kron (Kron f g) h)
assoc' (Compose x)  = Compose $ Compose $ (fmap getCompose x) 

kron'' :: (Additive f, Additive g, Additive k, Additive f', Additive g') => 
  LinOp1 f f' a -> LinOp1 g g' a -> Kron (Kron f g) k a -> Kron (Kron f' g') k a
kron'' lf lg fgk = let v = (assoc fgk) in assoc' (Compose $ fmap lg $ getCompose (lf v))

sigx' :: LinOp1 Qubit Qubit C 
sigx' (Compose (V2 up down)) = Compose $ V2 down up  

sigz' :: LinOp1 Qubit Qubit C 
sigz' (Compose (V2 up down)) = Compose  $ V2 up ((-1) *^ down) 

sigy' :: LinOp1 Qubit Qubit C 
sigy' (Compose (V2 up down)) = Compose $ V2 ((-i) *^ down) (i *^ up) where i = 0 :+ 1

swap' :: (Traversable f, Applicative g) => LinOp1 (Kron f g) (Kron g f) a 
swap' (Compose (Compose fgk)) = Compose $ Compose $ sequenceA fgk 

cnot :: LinOp1 (Kron Qubit Qubit) (Kron Qubit Qubit) a 
cnot (Compose (Compose (V2 (V2 up1 down1) v))) = Compose $ Compose $ V2 (V2 down1 up1) v

phase :: Double -> LinOp1 Qubit Qubit C
phase phi (Compose (V2 up down)) = Compose $ V2 up ((cis phi) *^ down)

lefting :: (Additive f, Additive k, Additive g) => LinOp1 f g a -> LinOp1 (Kron f k) (Kron g k) a
lefting l = kron'' l id -- Qubit.assoc' . l . Qubit.assoc 

righting :: (Additive k, Additive f, Additive g) => LinOp1 f g a -> LinOp1 (Kron k f) (Kron k g) a
righting l = kron'' id l -- (Compose (Compose fkk)) = Compose $ Compose $ (fmap (getCompose . l . Compose) fkk)

example :: LinOp1 (Kron Qubit Qubit) (Kron Qubit Qubit) C 
example = (lefting sigx') . (lefting sigy') . (righting sigz') . swap'
example circuit

There is an annoying amount of stupid repetitive book keeping with the associative structure of Kron. This can largely be avoided hopefully with coerce, but I’m not sure. I was having trouble with roles when doing it generically.

Bit and Bobbles

  • Woof. This post was more draining to write than I expected. I think there is still a lot left to say. Sorry about the editing everyone! Bits and pieces of this post are scattered in this repo
  • How would you go about this in other languages? C, Rust, OCaml, C++, Agda
  • The discussion of Vect = * -> * is useful for discussion of 2-Vect, coming up next. What if we make vectors of Vect? Wacky shit.
  • Metrics and Duals vectors. type Dual f a = f a -> a. type Dual1 f a = forall k. Additive k => Kron f k a -> k a
  • Adjunction diagrams have cups and caps. Since we have been using representable functors, they actually have a right adjunction that is tupling with the vector space index type. This gives us something that almost feels like a metric but a weirdly constrained metric.
  • LinOp1 form is yoneda? CPS? Universally quantified k is evocative of forall c. (a -> c) -> (b -> c)

References

Appendix

Representable/Naperian Functors

Containers that are basically big product types are also known as representable, Naperian, or logarithmic. Representable places emphasis on the isomorphism between such a container type and the type (->) i which by the algebra of types correspond is isomorphic to a^i (i copies of a). They are called Naperian/Logarithmic because there is a relationship similar to exponentiation between the index type a and the container type f. If you take the Product f g, this container is indexed by (a + b) = Either a b. Compose f g is indexed by the product (a,b). (f r) ~ r^a The arrow type is written as an exponential b^a because if you have finite enumerable types a and b, that is the number of possible tabulations available for f. The Sum of two representable functors is no longer representable. Regular logarithms of sums Log(f + g) do not have good identities associated with them.

See Gibbons article. There is a good argument to be made that representable functors are a good match for vectors/well typed tensor programming.

But note that there is a reasonable interpretation for container types with sum types in them. These can be thought of as subspaces, different bases, or as choices of sparsity patterns. When you define addition, you’ll need to say how these subspaces reconcile with each other.
— two bases at 45 degrees to each other.

data V2_45 a = XY a a | XY' a a
data Maybe a = Just a | Notinhg -- a 1-d vectro space with a special marker for the zero vector.
data Maybe2 a = Just2 a a | Nothing2 -- a 2d vector space with special zero marker

Monoidal Products on Hask

Hask is a name for the category that has objects as Haskell types and morphisms as Haskell functions.

Note that it’s a curious mixing of type/value layers of Haskell. The objects are types whereas the function morphisms are Haskell values. Composition is given by (.) and the identity morphisms are given by id.

For Haskell, you can compose functions, but you can also smash functions together side by side. These combinators are held in Control.Arrow.

You can smash together types with tuple (,) or with Either. Both of these are binary operators on types. The corresponding mapping on morphisms are given by

(***) :: a b c -> a b' c' -> a (b, b') (c, c') 
(+++) :: a b c -> a b' c' -> a (Either b b') (Either c c') 

These are binary operators on morphisms that play nice with the composition structure of Haskell.

Monoidal Combinators of Functors

A monoidal category also has unit objects. This is given by the Identity functor

rightUnitor :: Functor f => Compose f Identity a -> f a
rightUnitor (Compose f) = fmap runIdentity f

rightUnitor' :: f ~> Compose f Identity
rightUnitor' = Compose . fmap Identity

leftUnitor' :: f ~> Compose Identity f
leftUnitor' = Compose . Identity

leftUnitor :: Identity *** f ~> f
leftUnitor = runIdentity . getCompose

There is also a sense of associativity. It is just newtype rearrangement, so it can also be achieved with a `coerce` (although not polymorphically?).

assoc :: Functor f => ((f *** g) *** h) ~> (f *** (g *** h))
assoc = Compose . (fmap Compose) . getCompose . getCompose

assoc' :: Functor f => (f *** (g *** h)) ~> ((f *** g) *** h)
assoc' (Compose x) = Compose $ Compose $ (fmap getCompose x)

Similarly, we can define a monoidal category structure using Product or Sum instead of Compose.

These are all actually just newtype rearrangement, so they should all just be instances of coerce, but I couldn’t get the roles to go through generically?

Concolic Weakest Precondition is Kind of Like a Lens

That’s a mouthful.

Lens are described as functional getters and setters. The simple lens type is

type Lens a b = a -> (b, b -> a)

. The setter is

a->b

and the getter is

a -> b -> a

This type does not constrain lenses to obey the usual laws of getters and setters. So we can use/abuse lens structures for nontrivial computations that have forward and backwards passes that share information. Jules Hedges is particular seems to be a proponent for this idea.

I’ve described before how to encode reverse mode automatic differentiation in this style. I have suspicions that you can make iterative LQR and guass-seidel iteration have this flavor too, but I’m not super sure. My attempts ended somewhat unsatisfactorily a whiles back but I think it’s not hopeless. The trouble was that you usually want the whole vector back, not just its ends.

I’ve got another example in imperative program analysis that kind of makes sense and might be useful though. Toy repo here: https://github.com/philzook58/wp-lens

In program analysis it sometimes helps to run a program both concretely and symbolically. Concolic = CONCrete / symbOLIC. Symbolic stuff can slowly find hard things and concrete execution just sprays super fast and can find the dumb things really quick.  

We can use a lens structure to organize a DSL for describing a simple imperative language

The forward pass is for the concrete execution. The backward pass is for transforming the post condition to a pre condition in a weakest precondition analysis. Weakest precondition semantics is a way of specifying what is occurring in an imperative language. It tells how each statement transforms post conditions (predicates about the state after the execution) into pre conditions (predicates about before the execution).  The concrete execution helps unroll loops and avoid branching if-then-else behavior that would make the symbolic stuff harder to process. I’ve been flipping through Djikstra’s book on this. Interesting stuff, interesting man.

I often think of a state machine as a function taking s -> s. However, this is kind of restrictive. It is possible to have heterogenous transformations s -> s’. Why not? I think I am often thinking about finite state machines, which we really don’t intend to have a changing state size. Perhaps we allocated new memory or something or brought something into or out of scope. We could model this by assuming the memory was always there, but it seems wasteful and perhaps confusing. We need to a priori know everything we will need, which seems like it might break compositionally.

We could model our language making some data type like
data Imp = Skip | Print String | Assign String Expr | Seq Imp Imp | ...
and then build an interpreter

interp :: Imp -> s -> s'

Imp.

But we can also cut out the middle man and directly define our language using combinators.

type Stmt s s' = s ->s'

To me this has some flavor of a finally tagless style.


Likewise for expressions. Expressions evaluate to something in the context of the state (they can lookup variables), so let’s just use

type Expr s a = s -> a

And, confusingly (sorry), I think it makes sense to use Lens in their original getter/setter intent for variables. So Lens structure is playing double duty.

type Var s a = Lens' s a

With that said, here we go.


type Stmt s s' = s -> s' 
type Lens' a b = a -> (b, b -> a)
set l s a = let (_, f) = l s in f a

type Expr s a = s -> a
type Var s a = Lens' s a

skip :: Stmt s s
skip = id

sequence :: Stmt s s' -> Stmt s' s'' -> Stmt s s''
sequence = flip (.)

assign :: Var s a -> Expr s a -> Stmt s s
assign v e = \s -> set v s (e s)

(===) :: Var s a -> Expr s a -> Stmt s s
v === e = assign v e

ite :: Expr s Bool -> Stmt s s' -> Stmt s s' -> Stmt s s'
ite e stmt1 stmt2 = \s -> if (e s) then stmt1 s else stmt2 s

while :: Expr s Bool -> Stmt s s -> Stmt s s
while e stmt = \s -> if (e s) then ((while e stmt) (stmt s)) else s

assert :: Expr s Bool -> Stmt s s  
assert e = \s -> if (e s) then s else undefined 

abort :: Stmt s s'  
abort = const undefined

Weakest precondition can be done similarly, instead we start from the end and work backwards

Predicates are roughly sets. A simple type for sets is

type Pred s = s -> Bool 
Now, this doesn’t have much deductive power, but I think it demonstrates the principles simply. We could replace Pred with perhaps an SMT solver expression, or some data type for predicates, for which we’ll need to implement things like substitution. Let’s not today.

A function

a -> b 
is equivalent to
forall c. (b -> c) -> (a -> c)
. This is some kind of CPS / Yoneda transformation thing. A state transformer
s -> s'
to predicate transformer
(s' -> Bool) -> (s -> Bool)
is somewhat evocative of that. I’m not being very precise here at all.

Without further ado, here’s how I think a weakest precondition looks roughly.


type Lens' a b = a -> (b, b -> a)
set l s a = let (_, f) = l s in f a

type Expr s a = s -> a
type Var s a = Lens' s a
type Pred s = s -> Bool
type Stmt s s' = Pred s' -> Pred s 

skip :: Stmt s s
skip = \post -> let pre = post in pre -- if

sequence :: Stmt s s' -> Stmt s' s'' -> Stmt s s''
sequence = (.)

assign :: Var s a -> Expr s a -> Stmt s s
assign v e = \post -> let pre s = post (set v s (e s)) in pre

(===) :: Var s a -> Expr s a -> Stmt s s
v === e = assign v e

ite :: Expr s Bool -> Stmt s s' -> Stmt s s' -> Stmt s s'
ite e stmt1 stmt2 = \post -> let pre s = if (e s) then (stmt1 post) s else (stmt2 post) s in pre

abort :: Stmt s s'  
abort = \post -> const False

assert :: Expr s Bool -> Stmt s s  
assert e = \post -> let pre s = (e s) && (post s) in pre

{-
-- tougher. Needs loop invariant
while :: Expr s Bool -> Stmt s s -> Stmt s s
while e stmt = \post -> let pre s = if (e s) then ((while e stmt) (stmt post)) s else  in pre
-}

Finally here is a combination of the two above that uses the branching structure of the concrete execution to aid construction of the precondition. Although I haven’t expanded it out, we are using the full s t a b parametrization of lens in the sense that states go forward and predicates come back.


type Lens' a b = a -> (b, b -> a)
set l s a = let (_, f) = l s in f a


type Expr s a = s -> a
type Var s a = Lens' s a
type Pred a = a -> Bool
type Stmt s s' = s -> (s', Pred s' -> Pred s) -- eh. Screw the newtype

skip :: Stmt s s
skip = \x -> (x, id)


sequence :: Stmt s s' -> Stmt s' s'' -> Stmt s s''
sequence f g =   \s -> let (s', j) = f s in
                       let (s'', j') = g s' in
                           (s'', j . j')
assign :: Var s a -> Expr s a -> Stmt s s
assign v e = \s -> (set v s (e s), \p -> \s -> p (set v s (e s)))

--if then else
ite :: Expr s Bool -> Stmt s s' -> Stmt s s' -> Stmt s s'
ite e stmt1 stmt2 = \s -> 
                    if (e s) 
                    then let (s', wp) = stmt1 s in
                         (s', \post -> \s -> (e s) && (wp post s))
                    else let (s', wp) = stmt2 s in
                            (s', \post -> \s -> (not (e s)) && (wp post s))

assert :: Pred s -> Stmt s s
assert p = \s -> (s, \post -> let pre s = (post s) && (p s) in pre)

while :: Expr s Bool -> Stmt s s -> Stmt s s
while e stmt = \s -> if e s then let (s' , wp) = (while e stmt) s in
                                 (s', \post -> let pre s'' = (post s'') && (wp post s'') in pre)   
                            else (s, \p -> p)

{-

-- declare and forget can change the size and shape of the state space.
-- These are heterogenous state commpands
declare :: Iso (s,Int) s' -> Int -> Stmt s s'   
declare iso defalt = (\s -> to iso (s, defalt), \p -> \s -> p $ to iso (s, defalt)) 

forget :: Lens' s s' -> Stmt s s' -- forgets a chunk of state

declare_bracket :: Iso (s,Int) s' -> Int ->  Stmt s' s' -> Stmt s s
declare_bracket iso defalt stmt = (declare iso default) . stmt . (forget (_1 . iso))

Neat. Useful? Me dunno.

Linear Algebra of Types

It gives my brain a pleasant thrum to learn new mathematics which mimics the algebra I learned in middle school. Basically this means that the new system has operations with properties that match those of regular numbers as much as possible. Two pretty important operations are addition and multiplication with the properties of distributivity and associativity. Roughly this corresponds to the mathematical notion of a Semiring.

Some examples of semirings include 

  • Regular multiplication and addition
  • And-Or
  • Min-plus 
  • Matrices.
  • Types

I have written before about how types also form a semiring, using Either for plus and (,) for times. These constructions don’t obey distributivity or associativity “on the nose”, but instead are isomorphic to the rearranged type, which when you squint is pretty similar to equality.

Matrices are grids of numbers which multiply by “row times column”. You can form matrices out of other semirings besides just numbers. One somewhat trivial but interesting example is block matrices, where the elements of the matrix itself are also matrices. Another interesting example is that of relations, which can be thought of as matrices of boolean values. Matrix multiplication using the And-Or semiring on the elements corresponds to relational composition.

What if we put our type peanut butter in our matrix chocolate and consider matrices of types, using the Either(,) semiring?

The simplest implementation to show how this could go can be made using the naive list based implementation of vectors and matrices. We can directly lift this representation to the typelevel and the appropriate value-level functions to type families.

type a :*: b = (a,b)
type a :+: b = Either a b

type family Dot v v' where
    Dot '[x] '[y] = x :*: y 
    Dot (x : xs) (y : ys) = (x :*: y) :+: (Dot xs ys)

type family MVMult m v where
    MVMult '[r] v = '[Dot r v]
    MVMult (r : rs) v = (Dot r v) : (MVMult rs v)

type family VMMult m v where
    VMMult v ' = '[Dot v c]
    VMMult v (c : cs) = (Dot v c) : (VMMult v cs)

type family MMMult' m m' where
    MMMult' '[r] cs = '[VMMult r cs]
    MMMult' (r : rs) cs = (VMMult r cs) : (MMMult' rs cs)

type family MMMult m m' where
    MMMult m m' = MMMult' m (Transpose m')

type family Transpose m where
    Transpose ((r1 : rs') : rs) = (r1 : (Heads rs)) : (Conss rs' (Transpose (Tails rs)))
    Transpose '[] = '[]

-- some mapped helper functions
-- verrrrrry ugly. Eh. Get 'er dun
type family Heads v where
    Heads ((v : vs) : xs) = v : (Heads xs)
    Heads '[] = '[]
type family Tails v where
    Tails ((v : vs) : xs) = vs : (Tails xs)
    Tails '[] = '[]
type family Conss v vs where
    Conss (x : xs) (y : ys) = (x : y) : (Conss xs ys)
    Conss '[] '[] = '[]

type family Index v i where
    Index (x : xs) 0 = x
    Index (x : xs) n = Index xs (n-1)

This was just for demonstration purposes. It is not my favorite representation of vectors. You can lift a large fraction of possible ways to encode vector spaces at the value level up to the type level, such as the linear package, or using dual vectors type V2 a = a -> a -> a. Perhaps more on that another day.

What is the point?

Ok. That’s kind of neat, but why do it? Well, one way to seek an answer to that question is to ask “what are matrices useful for anyway?”

One thing they can do is describe transition systems. You can write down a matrix whose entire a_{ij} describes something about the transition from state i to state j. For example the entry could be:

  • The cost of getting from i to j (min-plus gives shortest path),
  • The count of ways to get from i to j (combinatorics of paths)
  • The connectivity of the system from i to j using boolean values and the and-or semiring
  • The probability of transition from i to j
  • The quantum amplitude of going from i to j if we’re feeling saucy.

If we form a matrix describing a single time step, then multiplying this matrix by itself gives 2 time steps and so on.

Lifting this notion to types, we can build a type exactly representing all the possible paths from state i to j.

Concretely, consider the following humorously bleak transition system: You are going between home and work. Every 1 hour period you can make a choice to do a home activity, commute, or work. There are different options of activities at each.

data Commute = Drive
data Home = Sleep | Eat
data Work = TPSReport | Bitch | Moan


This is described by the following transition diagram

Stapler.

The transitions are described by the following matrix.type:

type T = '[ '[Home    ,  Commute ],  
            '[Commute ,  Work    ]]

What is the data type that describe all possible 4-hour day? You’ll find the appropriate data types in the following matrix.

type FourHour = MMMult T (MMMult T (MMMult T T))

Now, time to come clean. I don’t think this is necessarily the best way to go about this problem. There are alternative ways of representing it.

Here are two data types that describe an indefinite numbers of transition steps.

data HomeChoice = StayHome Home HomeChoice | GoWork Commute WorkChoice
data WorkChoice = StayWork Work WorkChoice | GoHome Commute HomeChoice

Another style would hold the current state as a type parameter in the type using a GADT.

data Path state where   
   StayWork :: Work -> Path Work -> Path Work
   CommuteHome :: Commute -> Path Home ->  Path Work
   StayHome :: Home -> Path Home -> Path Home
   CommuteWork :: Commute -> Path Work ->  Path Home

We could construct types that are to the above types as Vec n is to [] by including an explicit step size parameter.

Still, food for thought.

Further Thoughts

The reason i was even thinking about this is because we can lift the above construction to perform a linear algebra of vectors spaces. And I mean the spaces, not the vectors themselves. This is a confusing point.

Vector spaces have also have two natural operations on them that act like addition and multiplication, the direct sum and kronecker product. These operations do form a semiring, although again not on the nose.

This is connected to the above algebra of types picture by considering the index types of these vector spaces. The simplest way to denote this in Haskell is using the free vector space construction as shown in this Dan Piponi post. The Kronecker product makes tuples of the indices and the direct sum has an index that is the Either of the original index types.

type Vec b r = [(b, a)]
-- Example 2D vector space type
type V2D = Vec Bool Double

This is by far not the only way to go about it. We can also consider using the Compose-Product semiring on functors (Compose is Kron, Product is DSum) to get a more index-free kind of feel and work with dense vectors.

Going down this road (plus a couple layers of mathematical sophistication) leads to a set of concepts known as 2Vect. Dan Roberts and James Vicary produced a Mathematica package for 2Vect which is basically incomprehensible to me. It seems to me that typed functional programming is a more appropriate venue for something of this kind of pursuit, given how evocative/ well modeled by category theory it can be. These mathematical ideas are applicable to describing anyonic vector spaces. See my previous post below. It is not a coincidence that the Path data type above is so similar to FibTree data type. The root type variable takes the place of the work/home state, and the tuple structure take the place of a Vec-like size parameter n .

More to on this to come probably as I figure out how to explain it cleanly.

Edit: WordPress, your weird formatting is killing me.

Edit: Hoo Boy. This is why we write blog posts. Some relevant material was pointed out to me that I was not aware of. Thanks @DrEigenbastard.

https://link.springer.com/chapter/10.1007/978-3-319-19797-5_6

http://blog.sigfpe.com/2010/08/constraining-types-with-regular.html

http://blog.sigfpe.com/2010/08/divided-differences-and-tomography-of.html