Neural Networks with Weighty Lenses (DiOptics?)

I wrote a while back how you can make a pretty nice DSL for reverse mode differentiation based on the same type as Lens. I’d heard some interesting rumblings on the internet around these ideas and so was revisiting them.

type Lens s t a b = s -> (a, b -> t)
type AD x dx y dy = x -> (y, dy -> dx)

Composition is defined identically for reverse mode just as it is for lens.

The forward computation shares info with the backwards differential propagation, which corresponds to a transposed Jacobian

After chewing on it a while, I realized this really isn’t that exotic. How it works is that you store the reverse mode computation graph, and all necessary saved data from the forward pass in the closure of the (dy -> dx). I also have a suspicion that if you defunctionalized this construction, you’d get the Wengert tape formulation of reverse mode ad.

Second, Lens is just a nice structure for bidirectional computation, with one forward pass and one backward pass which may or may not be getting/setting. There are other examples for using it like this.

It is also pretty similar to the standard “dual number” form type FAD x dx y dy = (x,dx)->(y,dy) for forward mode AD. We can bring the two closer by a CPS/Yoneda transformation and then some rearrangement.

     x -> (y, dy -> dx) 
==>  x -> (y, forall s. (dx -> s) -> (dy -> s))
==>  forall s. (x, dx -> s) -> (y, dx -> s) 

and meet it in the middle with

(x,dx) -> (y,dy)
==> forall s. (x, s -> dx) -> (y, s -> dy)

I ended the previous post somewhat unsatisfied by how ungainly writing that neural network example was, and I called for Conal Elliot’s compiling to categories plugin as a possible solution. The trouble is piping the weights all over the place. This piping is very frustrating in point-free form, especially when you know it’d be so trivial pointful. While the inputs and outputs of layers of the network compose nicely (you no longer need to know about the internal computations), the weights do not. As we get more and more layers, we get more and more weights. The weights are in some sense not as compositional as the inputs and outputs of the layers, or compose in a different way that you need to maintain access to.

I thought of a very slight conceptual twist that may help.

The idea is we keep the weights out to the side in their own little type parameter slots. Then we define composition such that it composes input/outputs while tupling the weights. Basically we throw the repetitive complexity appearing in piping the weights around into the definition of composition itself.

These operations are easily seen as 2 dimensional diagrams.

Three layers composed, exposing the weights from all layers
The 2-D arrow things can be built out of the 1-d arrows of the original basic AD lens by bending the weights up and down. Ultimately they are describing the same thing

Here’s the core reverse lens ad combinators

import Control.Arrow ((***))

type Lens'' a b = a -> (b, b -> a)

comp :: (b -> (c, (c -> b))) -> (a -> (b, (b -> a))) -> (a -> (c, (c -> a)))
comp f g x = let (b, dg) = g x in
             let (c, df) = f b in
             (c, dg . df)

id' :: Lens'' a a
id' x = (x, id) 

relu' :: (Ord a, Num a) => Lens'' a a
relu' = \x -> (frelu x, brelu x) where
        frelu x | x > 0 = x
                | otherwise = 0
        brelu x dy | x > 0 = dy
                   | otherwise = 0

add' :: Num a => Lens'' (a,a) a 
add' = \(x,y) -> (x + y, \ds -> (ds, ds))

dup' :: Num a => Lens'' a (a,a)
dup' = \x -> ((x,x), \(dx,dy) -> dx + dy)

sub' :: Num a => Lens'' (a,a) a 
sub' = \(x,y) -> (x - y, \ds -> (ds, -ds))

mul' :: Num a => Lens'' (a,a) a 
mul' = \(x,y) -> (x * y, \dz -> (dz * y, x * dz))

recip' :: Fractional a => Lens'' a a 
recip' = \x-> (recip x, \ds -> - ds / (x * x))

div' :: Fractional a => Lens'' (a,a) a 
div' = (\(x,y) -> (x / y, \d -> (d/y,-x*d/(y * y))))

sin' :: Floating a => Lens'' a a
sin' = \x -> (sin x, \dx -> dx * (cos x))

cos' :: Floating a => Lens'' a a
cos' = \x -> (cos x, \dx -> -dx * (sin x))

pow' :: Num a => Integer -> Lens'' a a
pow' n = \x -> (x ^ n, \dx -> (fromInteger n) * dx * x ^ (n-1)) 

--cmul :: Num a => a -> Lens' a a
--cmul c = lens (* c) (\x -> \dx -> c * dx)

exp' :: Floating a => Lens'' a a
exp' = \x -> let ex = exp x in
                      (ex, \dx -> dx * ex)

fst' :: Num b => Lens'' (a,b) a
fst' = (\(a,b) -> (a, \ds -> (ds, 0)))

snd' :: Num a => Lens'' (a,b) b
snd' = (\(a,b) -> (b, \ds -> (0, ds)))

-- some monoidal combinators
swap' :: Lens'' (a,b) (b,a)
swap' = (\(a,b) -> ((b,a), \(db,da) -> (da, db)))

assoc' :: Lens'' ((a,b),c) (a,(b,c))
assoc' = \((a,b),c) -> ((a,(b,c)), \(da,(db,dc)) -> ((da,db),dc))

assoc'' :: Lens'' (a,(b,c)) ((a,b),c)
assoc'' = \(a,(b,c)) -> (((a,b),c), \((da,db),dc)->  (da,(db,dc)))

par' :: Lens'' a b -> Lens'' c d -> Lens'' (a,c) (b,d)
par' l1 l2 = l3 where
    l3 (a,c) = let (b , j1) = l1 a in
               let (d, j2) = l2 c in
               ((b,d) , j1 *** j2) 
first' :: Lens'' a b -> Lens'' (a, c) (b, c)
first' l = par' l id'

second' :: Lens'' a b -> Lens'' (c, a) (c, b)
second' l = par' id' l

labsorb :: Lens'' ((),a) a
labsorb (_,a) = (a, \a' -> ((),a'))

labsorb' :: Lens'' a ((),a)
labsorb' a = (((),a), \(_,a') -> a')

rabsorb :: Lens'' (a,()) a
rabsorb = comp labsorb swap'

And here are the two dimensional combinators. I tried to write them point-free in terms of the combinators above to demonstrate that there is no monkey business going on. We

type WAD' w w' a b = Lens'' (w,a) (w',b)
type WAD'' w a b = WAD' w () a b -- terminate the weights for a closed network
{- For any monoidal category we can construct this composition? -}
-- horizontal composition
hcompose :: forall w w' w'' w''' a b c. WAD' w' w'' b c -> WAD' w w''' a b -> WAD' (w',w) (w'',w''') a c
hcompose f g = comp f' g' where 
               f' :: Lens'' ((w',r),b) ((w'',r),c)
               f' = (first' swap') `comp` assoc'' `comp` (par' id' f) `comp` assoc' `comp`  (first' swap') 
               g' :: Lens'' ((r,w),a) ((r,w'''),b)
               g' = assoc'' `comp` (par' id' g) `comp` assoc' 

rotate :: WAD' w w' a b -> WAD' a b w w'                                      
rotate f = swap' `comp` f `comp` swap'

-- vertical composition of weights
vcompose :: WAD' w'  w'' c d -> WAD' w w' a b -> WAD' w w'' (c, a) (d, b)
vcompose f g = rotate (hcompose (rotate f)  (rotate g) )                             

-- a double par.
diagpar :: forall w w' a b w'' w''' c d. WAD' w  w' a b -> WAD' w'' w''' c d 
           -> WAD' (w,w'') (w',w''') (a, c) (b, d)
diagpar f g = t' `comp` (par' f g) `comp` t where
                t :: Lens'' ((w,w''),(a,c)) ((w,a), (w'',c)) -- yikes. just rearrangements.
                t =  assoc'' `comp` (second' ((second' swap') `comp` assoc' `comp` swap')) `comp` assoc'
                t' :: Lens'' ((w',b), (w''',d)) ((w',w'''),(b,d)) -- the tranpose of t
                t' =  assoc'' `comp` (second'  ( swap'  `comp` assoc'' `comp` (second' swap')))  `comp` assoc'

id''' :: WAD' () () a a
id''' = id'

-- rotate:: WAD' w a a w
-- rotate = swap'

liftIO :: Lens'' a b -> WAD' w w a b
liftIO = second'

liftW :: Lens'' w w' -> WAD' w w' a a
liftW = first'

wassoc' = liftW assoc' 
wassoc'' = liftW assoc'' 

labsorb'' :: WAD' ((),w) w a a
labsorb'' = first' labsorb

labsorb''' :: WAD' w ((),w) a a
labsorb''' = first' labsorb'

wswap' :: WAD' (w,w') (w',w) a a
wswap' = first' swap'
-- and so on we can lift all combinators

I wonder if this is actually nice?

I asked around and it seems like this idea may be what davidad is talking about when he refers to dioptics

Perhaps this will initiate a convo.

Edit: He confirms that what I’m doing appears to be a dioptic. Also he gave a better link

He is up to some interesting diagrams

Bits and Bobbles

  • Does this actually work or help make things any better?
  • Recurrent neural nets flip my intended role of weights and inputs.
  • Do conv-nets naturally require higher dimensional diagrams?
  • This weighty style seems like a good fit for my gauss seidel and iterative LQR solvers. A big problem I hit there was getting all the information to the outside, which is a similar issue to getting the weights around in a neural net.

Gröbner Bases and Optics

Geometrical optics is a pretty interesting topic. It really is almost pure geometry/math rather than physics.

Huygens principle says that we can compute the propagation of a wave by considering the wavelets produced by each point on the wavefront separately.

In physical optics, this corresponds to the linear superposition of the waves produced at each point by a propagator function \int dx' G(x,x'). In geometrical optics, which was Huygens original intent I think (these old school guys were VERY geometrical), this is the curious operation of taking the geometrical envelope of the little waves produced by each point.

The gist of Huygens principles. Ripped from wikipedia The envelope is an operation on a family of curves. You can approximate it by a finite difference procedure. Take two subsequent curves close together in the family, find their intersection. Keep doing that and the join up all the intersections. This is roughly the approach I took in this post

Taking the envelope of a family of lines. Ripped from wikipedia

You can describe a geometrical wavefront implicitly with an equations \phi(x,y) = 0. Maybe the wavefront is a circle, or a line, or some wacky shape.

The wavelet produced by the point x,y after a time t is described implicitly by d(\vec{x},\vec{x'})^2 - t^2 = (x-x')^2 + (y-y')^2 - t^2 = 0.

This described a family of curves, the circles produced by the different points of the original wavefront. If you take the envelope of this family you get the new wavefront at time t.

How do we do this? Grobner bases is one way if we make \phi a polynomial equation. For today’s purposes, Grobner bases are a method for solving multivariate polynomial equations. Kind of surprising that such a thing even exists. It’s actually a guaranteed terminating algorithm with horrific asymptotic complexity. Sympy has an implementation. For more on Grobner bases, the links here are useful Especially check out the Cox Little O’Shea books

The algorithm churns on a set of multivariate polynomials and spits out a new set that is equivalent in the sense that the new set is equal to zero if and only if the original set was. However, now (if you ask for the appropriate term ordering) the polynomials are organized in such a way that they have an increasing number of variables in them. So you solve the 1-variable equation (easy), and substitute into the 2 variable equation. Then that is a 1-variable equation, which you solve (easy) and then you substitute into the three variable equation, and so on. It’s analogous to gaussian elimination.

So check this out

from sympy import *

x1, y1, x2, y2, dx, dy = symbols('x1, y1, x2, y2, dx, dy')

def dist(x,y,d):
    return x**2 + y**2 - d**2

e1 = dist(x1,y1,2) # the original circle of radius 2
e2 = dist(x1-x2 ,y1 - y2 , 1) # the parametrized wavefront after time 1

# The two envelope conditions.
e3 = diff(e1,x1)*dx + diff(e1,y1)*1
e4 = diff(e2,x1)*dx + diff(e2,y1)*1

envelope = groebner([e1,e2,e3,e4], y1, x1, dx, dy, x2, y2, order='lex')[-1]
plot_implicit(e1, show=False)
plot_implicit(envelope, show = True)

The envelope conditions can be found by introducing two new differential variables dx, and dy. They are constrained to lie tangent to the point on the original circle by the differential equation e3, and then we require that the two subsequent members of the curve family intersect by the equation e4. Hence we get the envelope. Ask for the Grobner basis with that variable ordering gives us an implicit equations for x2, y2 with no mention of the rest if we just look at the last equation of the Grobner basis.

I set arbitrarily dy = 1 because the overall scale of them does not matter, only the direction. If you don’t do this, the final equation is scaled homogenously in dy.

This does indeed show the two new wavefronts at radius 1 and radius 3.

Original circle radius = 2

x1**2 + y1**2 – 4 = 0
Evolved circles found via grobner basis.

(x2**2 + y2**2 – 9)*(x2**2 + y2**2 – 1) = 0

Here’s a different one of a parabola using e1 =  y1 – x1 + x1**2

Original curve y1 – x1 + x1**2 = 0
After 1 time step.

16*x2**6 – 48*x2**5 + 16*x2**4*y2**2 + 32*x2**4*y2 + 4*x2**4 – 32*x2**3*y2**2 – 64*x2**3*y2 + 72*x2**3 + 32*x2**2*y2**3 + 48*x2**2*y2 – 40*x2**2 – 32*x2*y2**3 + 16*x2*y2**2 – 16*x2*y2 – 4*x2 + 16*y2**4 + 32*y2**3 – 20*y2**2 – 36*y2 – 11 = 0

The weird lumpiness here is plot_implicit’s inability to cope, not the actually curve shape Those funky prongs are from a singularity that forms as the wavefront folds over itself.

I tried using a cubic curve x**3 and the grobner basis algorithm seems to crash my computer. 🙁 Perhaps this is unsurprising. ?

I don’t know how to get the wavefront to go in only 1 direction? As is, it is propagating into the past and the future. Would this require inequalities? Sum of squares optimization perhaps?


It’s been suggested on reddit that I’d have better luck using other packages, like Macaulay2, MAGMA, or Singular. Good point

Also it was suggested I use the Dixon resultant, for which there is an implementation in sympy, albeit hidden. Something to investigate

Another interesting angle might be to try to go numerical with a homotopy continuation method with phcpy

or pybertini

Functors, Vectors, and Quantum Circuits

Vectors are dang useful things, and any energy you put into them seems to pay off massive dividends.

Vectors and Linear Algebra are useful for:

  • 2D, 3D, 4D geometry stuff. Computer graphics, physics etc.
  • Least Squares Fitting
  • Solving discretized PDEs
  • Quantum Mechanics
  • Analysis of Linear Dynamical Systems
  • Probabilistic Transition Matrices

There are certain analogies between Haskell Functors and Vectors that corresponds to a style of computational vector mathematics that I think is pretty cool and don’t see talked about much.

Due to the expressivity of its type system, Haskell has a first class notion of container that many other languages don’t. In particular, I’m referring to the fact that Haskell has higher kinded types * -> * (types parametrized on other types) that you can refer to directly without filling them first. Examples in the standard library include Maybe, [], Identity, Const b, and Either b. Much more vector-y feeling examples can be found in Kmett’s linear package V0, V1, V2, V3, V4. For example, the 4 dimensional vector type V4

data V4 a = V4 a a a a

This really isn’t such a strange, esoteric thing as it may appear. You wouldn’t blink an eye at the type

struct V4 {
   double x, y, z, w;

in some other language. What makes Haskell special is how compositional and generic it is. We can build thousand element structs with ease via composition. What we have here is an alternative to the paradigm of computational vectors ~ arrays. Instead we have computational vectors ~ structs. In principle, I see no reason why this couldn’t be as fast as arrays, although with current compiler expectations it probably isn’t.

Monoidal categories are a mathematical structure that models this analogy well. It has been designed by mathematicians for aesthetic elegance, and it seems plausible that following its example leads us to interesting, useful, and pleasant vector combinators. And I personally find something that tickles me about category theory.

So to get started, let’s talk a bit about functors.

The Algebra of Functors

Functors in Haskell are a typeclass for containers. They allow you to map functions over all the items in the container. They are related to the categorical notion of functor, which is a mapping between categories.

type Container = * -> * -- Note: This is actually a kind signature.
-- Kinds and types are the same thing in Haskell.

You can lift the product and sum of types to the product and sum of Functors which you may find in Data.Functor.Product and Data.Functor.Sum. This is analogous to the lifting of ordinary addition and multiplication to the addition and multiplication of polynomials, which are kind of like numbers with a “hole”.

newtype Product f g a = Pair (f a , g a)
newtype Sum f g a = Sum (Either (f a) (g a))

Functors also compose. A container of containers of a is still a container of a. We can form composite containers by using the Compose newtype wrapper.

newtype Compose f g a = Compose (f (g a))

When you use this Compose newtype, instead of having to address the individual elements by using fmap twice, a single application of fmap will teleport you through both layers of the container.

Product, Sum, and Compose are all binary operator on functors. The type constructor has kind

{- Enter into ghci -}
:kind Compose ==> (* -> *) -> (* -> *) -> (* -> *)
:kind Product ==> (* -> *) -> (* -> *) -> (* -> *)
:kind Sum ==> (* -> *) -> (* -> *) -> (* -> *)

Some important other functors from the algebra of types perspective are Const Void a, Const () a, and Identity a. These are identity elements for Sum, Product, and Compose respectively.

You can define mappings between containers that don’t depend on the specifics of their contents. These mappings can only rearrange, copy and forget items of their contained type. This can be enforced at the type level by the polymorphic type signature

type f ~> g = forall a. f a -> g a

These mappings correspond in categorical terminology to natural transformations between the functors f and g. There is a category where objects are Functors and morphisms are natural transformations. Sum, Product, and Compose all obeys the laws necessary to be a monoidal product on this category.

How the lifting of functions works for Compose is kind of neat.

mon_prod :: Functor f' => (f ~> f') -> (g ~> g') -> (Compose f g ~> Compose f' g')
mon_prod ntf ntg (Compose fg) = Compose (fmap ntg (ntf fg))
-- or equvalently Compose (ntf (fmap ntg fg)) with a (Functor f) typeclass requirement.

Because the natural transformations require polymorphic types, when you apply ntf to fg the polymorphic variable a in the type of ntf restricts to a ~ g a'.

Product and Sum have a straight forward notion of commutativity ( (a,b) is isomorphic to (b,a)) . Compose is more subtle. sequenceA from the Traversable typeclass can swap the ordering of composition. sequenceA . sequenceA may or may not be the identity depending on the functors in question, so it has some flavor of a braiding operation. This is an interesting post on that topic

Combinators of these sorts are used arise in at least the following contexts

  • Data types a la carte – A systematic way of building extensible data types
  • GHC Generics – A system for building generic functions that operate on data types that can be described with sums, products, recursion, and holes.
  • In and around the Lens ecosystem

Also see the interesting post by Russell O’Connor and functor oriented programming I think the above is part of that to which he is referring.

Vector Spaces as Shape

Vector spaces are made of two parts, the shape (dimension) of the vector space and the scalar.

Just as a type of kind * -> * can be thought of as a container modulo it’s held type, it can also be a vector modulo its held scalar type. The higher kinded type for vector gives an explicit slot to place the scalar type.

type Vect = * -> *

The standard Haskell typeclass hierarchy gives you some of the natural operations on vectors if you so choose to abuse it in that way.

  • Functor ~> Scalar Multiplication: smul s = fmap (* s)
  • Applicative ~> Vector Addition: vadd x y = (+) <$> x <*> y
  • Traversable ~> Tranposition. sequenceA has the type of transposition and works correctly for the linear style containers like V4.

The linear library does use Functor for scalar multiplication, but defines a special typeclass for addition, Additive. I think this is largely for the purposes for bringing Map like vectors into the fold, but I’m not sure.

Once we’ve got the basics down of addition and scalar multiplication, the next thing I want is operations for combining vector spaces. Two important ones are the Kronecker product and direct sum. In terms of indices, the Kronecker product is a space that is indexed by the cartesian product (,) of its input space indices and the direct sum is a space indexed by the Either of its input space indices. Both are very useful constructs. I use the Kronecker product all the time when I want to work on 2D or 3D grids for example. If you’ll excuse my python, here is a toy 2-D finite difference Laplace equation example. We can lift the 1D second derivative matrix K = \partial_x^2 using the kronecker product K2 = K \otimes I + I \otimes K. The direct sum is useful as a notion of stacking matrices.

import numpy as np
N = 10 # We're making a 10x10 grid
row = np.zeros(N)
row[0] = -2
row[1] = 1
K = np.toeplitz(row,row) #toeplitz makes constant diagonal matrices
I = np.eye(N) #identity matrix
K2 = np.kron(K,I) + np.kron(I,K)

The following is perhaps the most important point of the entire post.

type Kron = Compose
type DSum = Product

Compose of vector functors gives the Kronecker product, and Product gives the direct sum (this can be confusing but its right. Remember, the sum in direct sum refers to the indices).

We can form the Kronecker product of vectors given a Functor constraint.

kron :: (Num a, Functor f, Functor g) => f a -> g a -> Kron f g a
kron f g = Compose $ fmap (\s1 -> fmap (\s2 -> s1 * s2) g) f

dsum :: f a -> g a -> DSum f g a
dsum f g = Pair f g

Notice we have two distinct but related things called kron: Kron and kron. One operates on vectors spaces and the other operates on vector values.

Building vector spaces out of small combinators like V2, V4, DSum, Kron is interesting for a number of reasons.

  • It is well typed. Similar to Nat indexed vectors, the types specify the size of the vector space. We can easily describe vector spaced as powers of 2 as V16 = Kron V2 (Kron V2 (Kron V2 (Kron V2 V1))), similarly in terms of its prime factors, or we can do a binary expansion (least significant bit first) V5 = DSum V1 (Kron V2 (DSum V0 (Kron V2 V1))) or other things. We do it without going into quasi-dependently typed land or GADTs.
  • It often has better semantic meaning. It is nice to say Measurements, or XPosition or something rather than just denote the size of a vector space in terms of a nat. It is better to say a vector space is the Kron of two meaningful vector spaces than to just say it is a space of size m*n. I find it pleasant to think of the naturals as a free Semiring rather than as the Peano Naturals and I like the size of my vector space defined similarly.
  • Interesting opportunities for parallelism. See Conal Elliott’s paper on scans and FFT:

What do linear operators look like?

In the Vectors as shape methodology, Vectors look very much like Functors.

I have been tempted to lift the natural transformation type above to the following for linear operators.

type LinOp f g = forall a. (Num a) => f a -> g a

In a sense this works, we could implement kron because many of the container type (V1, V2, V3, etc) in the linear package implement Num. However, choosing Num is a problem. Why not Fractional? Why not Floating? Sometimes we want those. Why not just specifically Double?

We don’t really want to lock away the scalar in a higher rank polymorphic type. We want to ensure that everyone is working in the same scalar type before allowing things to proceed.

type LinOp f g a = f a -> g a

Note also that this type does not constrain us to linearity. Can we form the Kronecker product of linear operators? Yes, but I’m not in love with it. This is not nearly so beautiful as the little natural transformation dance.

kron''' :: (Applicative f', Applicative g', Traversable f, Traversable g') =>
    (f a -> f' a) -> (g a -> g' a) -> (Kron f g a -> Kron f' g' a)
kron'' lf lg (Compose fga) = Compose $ sequenceA $ (fmap lf) $ sequenceA $ (fmap lg fga)

This was a nice little head scratcher for me. Follow the types, my friend! I find this particularly true for uses of sequenceA. I find that if I want the containers swapped in ordering. In that situation sequenceA is usually the right call. It could be called transpose.

Giving the vector direct access to the scalar feels a bit off to me. I feel like it doesn’t leave enough “room” for compositionally. However, there is another possibility for a definition of morphisms could be that I think is rather elegant.

type LinOp1 f g a = forall k. Additive k => Kron f k a -> Kron g k a

Does this form actually enforce linearity? You may still rearrange objects. Great. You can also now add and scalar multiply them with the Additive k constraint. We also expose the scalar, so it can be enforced to be consistent.

One other interesting thing to note is that these forms allow nonlinear operations. fmap, liftU2 and liftI2 are powerful operations, but I think if we restricted Additive to just a correctly implemented scalar multiply and vector addition operation, and zero, we’d be good.

class Additive' f where
  smul :: Num a => a -> f a -> f a
  vadd :: Num a => f a -> f a -> f a
  zero :: Num a => f a

We can recover the previous form by instantiation k to V1. V1, the 1-d vector space, is almost a scalar and can play the scalars role in many situations. V1 is the unit object with respect to the monoidal product Kron.

There seems to be a missing instance to Additive that is useful. There is probably a good reason it isn’t there, but I need it.

instance (Additive g, Additive f) => Additive (Compose f g) where
    (Compose v) ^+^ (Compose w) = Compose (liftU2 (^+^) v w)
    zero = zero
    liftU2 f (Compose x) (Compose y) = Compose $ liftU2 (liftU2 f) x y
    liftI2 f (Compose x) (Compose y) = Compose $ liftI2 (liftI2 f) x y

Monoidal Categories

The above analogy can be put into mathematical terms by noting that both vectors and functor are monoidal categories. I talked a quite a bit about monoidal categories in a previous post .

Categories are the combo of a collection of objects and arrows between the objects. The arrows can compose as long as the head of one is on the same object as the tail of the other. On every object, there is always an identity arrow, which when composed will do nothing.

We need a little extra spice to turn categories into monoidal categories. One way of thinking about it is that monoidal categories have ordinary category composition and some kind of horizontal composition, putting things side to side. Ordinary composition is often doing something kind of sequentially, applying a sequence of functions, or a sequence of matrices. The horizontal composition is often something parallel feeling, somehow applying the two arrows separately to separate pieces of the system.

Why are they called Monoidal?

There is funny game category people play where they want to lift ideas from other fields and replace the bits and pieces in such a way that the entire thing is defined in terms of categorical terminology. This is one such example.

A monoid is a binary operations that is associative and has an identity.

Sometimes people are more familiar with the concept of a group. If not, ignore the next sentence. Monoids are like groups without requiring an inverse.

Numbers are seperately monoids under both addition, multiplication and minimization (and more), all of which are associative operations with identity (0, 1, and infinity respectively).

Exponentiation is a binary operation that is not a monoid, as it isn’t associative.

The Monoid typeclass in Haskell demonstrates this

class Semigroup a => Monoid a where
        mempty  :: a
        mappend :: a -> a -> a

A common example of a monoid is list, where mempty is the empty list and mappend appends the lists.

There are different set-like intuitions for categories. One is that the objects in the category are big opaque sets. This is the case for Hask, Rel and Vect.

A different intuitiion is that the category itself is like a set, and the objects are the elements of that set. There just so happens to be some extra structure knocking around in there: the morphisms. This is the often more the feel for the examples of preorders or graphs. The word “monoidal” means that they we a binary operation on the objects. But in the category theory aesthetic, you also need that binary operation to “play nice” with the morphisms that are hanging around too.

Functors are the first thing that has something like this. It has other properties that come along for the ride. A Functor is a map that takes objects to objects and arrows to arrows in a nice way. A binary functor takes two objects to and object, and two arrows to one arrow in a way that plays nice (commutes) with arrow composition.

String diagrams

String diagrams are a graphical notation for monoidal categories. Agin I discussed this more here.

Morphisms are denoted by boxes. Regular composition is shown by plugging arrows together vertically. Monoidal product is denoted by putting the arrows side to side.

When I was even trying to describe what a monoidal category was, I was already using language evocative of string diagrams.

You can see string diagrams in the documentation for the Arrow library. Many diagrams that people use in various fields can be formalized as the string diagrams for some monoidal category. This is big chunk of Applied Category Theory.

This is the connection to quantum circuits, which are after all a graphical notation for very Kroneckery linear operations.

type Qubit = V2
type C = Complex Double

assoc :: Functor f => (Kron (Kron f g) h) ~> (Kron f (Kron g h))
assoc = Compose . (fmap Compose) . getCompose . getCompose

assoc' :: Functor f =>  (Kron f (Kron g h)) ~> (Kron (Kron f g) h)
assoc' (Compose x)  = Compose $ Compose $ (fmap getCompose x) 

kron'' :: (Additive f, Additive g, Additive k, Additive f', Additive g') => 
  LinOp1 f f' a -> LinOp1 g g' a -> Kron (Kron f g) k a -> Kron (Kron f' g') k a
kron'' lf lg fgk = let v = (assoc fgk) in assoc' (Compose $ fmap lg $ getCompose (lf v))

sigx' :: LinOp1 Qubit Qubit C 
sigx' (Compose (V2 up down)) = Compose $ V2 down up  

sigz' :: LinOp1 Qubit Qubit C 
sigz' (Compose (V2 up down)) = Compose  $ V2 up ((-1) *^ down) 

sigy' :: LinOp1 Qubit Qubit C 
sigy' (Compose (V2 up down)) = Compose $ V2 ((-i) *^ down) (i *^ up) where i = 0 :+ 1

swap' :: (Traversable f, Applicative g) => LinOp1 (Kron f g) (Kron g f) a 
swap' (Compose (Compose fgk)) = Compose $ Compose $ sequenceA fgk 

cnot :: LinOp1 (Kron Qubit Qubit) (Kron Qubit Qubit) a 
cnot (Compose (Compose (V2 (V2 up1 down1) v))) = Compose $ Compose $ V2 (V2 down1 up1) v

phase :: Double -> LinOp1 Qubit Qubit C
phase phi (Compose (V2 up down)) = Compose $ V2 up ((cis phi) *^ down)

lefting :: (Additive f, Additive k, Additive g) => LinOp1 f g a -> LinOp1 (Kron f k) (Kron g k) a
lefting l = kron'' l id -- Qubit.assoc' . l . Qubit.assoc 

righting :: (Additive k, Additive f, Additive g) => LinOp1 f g a -> LinOp1 (Kron k f) (Kron k g) a
righting l = kron'' id l -- (Compose (Compose fkk)) = Compose $ Compose $ (fmap (getCompose . l . Compose) fkk)

example :: LinOp1 (Kron Qubit Qubit) (Kron Qubit Qubit) C 
example = (lefting sigx') . (lefting sigy') . (righting sigz') . swap'
example circuit

There is an annoying amount of stupid repetitive book keeping with the associative structure of Kron. This can largely be avoided hopefully with coerce, but I’m not sure. I was having trouble with roles when doing it generically.

Bit and Bobbles

  • Woof. This post was more draining to write than I expected. I think there is still a lot left to say. Sorry about the editing everyone! Bits and pieces of this post are scattered in this repo
  • How would you go about this in other languages? C, Rust, OCaml, C++, Agda
  • The discussion of Vect = * -> * is useful for discussion of 2-Vect, coming up next. What if we make vectors of Vect? Wacky shit.
  • Metrics and Duals vectors. type Dual f a = f a -> a. type Dual1 f a = forall k. Additive k => Kron f k a -> k a
  • Adjunction diagrams have cups and caps. Since we have been using representable functors, they actually have a right adjunction that is tupling with the vector space index type. This gives us something that almost feels like a metric but a weirdly constrained metric.
  • LinOp1 form is yoneda? CPS? Universally quantified k is evocative of forall c. (a -> c) -> (b -> c)



Representable/Naperian Functors

Containers that are basically big product types are also known as representable, Naperian, or logarithmic. Representable places emphasis on the isomorphism between such a container type and the type (->) i which by the algebra of types correspond is isomorphic to a^i (i copies of a). They are called Naperian/Logarithmic because there is a relationship similar to exponentiation between the index type a and the container type f. If you take the Product f g, this container is indexed by (a + b) = Either a b. Compose f g is indexed by the product (a,b). (f r) ~ r^a The arrow type is written as an exponential b^a because if you have finite enumerable types a and b, that is the number of possible tabulations available for f. The Sum of two representable functors is no longer representable. Regular logarithms of sums Log(f + g) do not have good identities associated with them.

See Gibbons article. There is a good argument to be made that representable functors are a good match for vectors/well typed tensor programming.

But note that there is a reasonable interpretation for container types with sum types in them. These can be thought of as subspaces, different bases, or as choices of sparsity patterns. When you define addition, you’ll need to say how these subspaces reconcile with each other.
— two bases at 45 degrees to each other.

data V2_45 a = XY a a | XY' a a
data Maybe a = Just a | Notinhg -- a 1-d vectro space with a special marker for the zero vector.
data Maybe2 a = Just2 a a | Nothing2 -- a 2d vector space with special zero marker

Monoidal Products on Hask

Hask is a name for the category that has objects as Haskell types and morphisms as Haskell functions.

Note that it’s a curious mixing of type/value layers of Haskell. The objects are types whereas the function morphisms are Haskell values. Composition is given by (.) and the identity morphisms are given by id.

For Haskell, you can compose functions, but you can also smash functions together side by side. These combinators are held in Control.Arrow.

You can smash together types with tuple (,) or with Either. Both of these are binary operators on types. The corresponding mapping on morphisms are given by

(***) :: a b c -> a b' c' -> a (b, b') (c, c') 
(+++) :: a b c -> a b' c' -> a (Either b b') (Either c c') 

These are binary operators on morphisms that play nice with the composition structure of Haskell.

Monoidal Combinators of Functors

A monoidal category also has unit objects. This is given by the Identity functor

rightUnitor :: Functor f => Compose f Identity a -> f a
rightUnitor (Compose f) = fmap runIdentity f

rightUnitor' :: f ~> Compose f Identity
rightUnitor' = Compose . fmap Identity

leftUnitor' :: f ~> Compose Identity f
leftUnitor' = Compose . Identity

leftUnitor :: Identity *** f ~> f
leftUnitor = runIdentity . getCompose

There is also a sense of associativity. It is just newtype rearrangement, so it can also be achieved with a `coerce` (although not polymorphically?).

assoc :: Functor f => ((f *** g) *** h) ~> (f *** (g *** h))
assoc = Compose . (fmap Compose) . getCompose . getCompose

assoc' :: Functor f => (f *** (g *** h)) ~> ((f *** g) *** h)
assoc' (Compose x) = Compose $ Compose $ (fmap getCompose x)

Similarly, we can define a monoidal category structure using Product or Sum instead of Compose.

These are all actually just newtype rearrangement, so they should all just be instances of coerce, but I couldn’t get the roles to go through generically?

Concolic Weakest Precondition is Kind of Like a Lens

That’s a mouthful.

Lens are described as functional getters and setters. The simple lens type is

type Lens a b = a -> (b, b -> a)

. The setter is


and the getter is

a -> b -> a

This type does not constrain lenses to obey the usual laws of getters and setters. So we can use/abuse lens structures for nontrivial computations that have forward and backwards passes that share information. Jules Hedges is particular seems to be a proponent for this idea.

I’ve described before how to encode reverse mode automatic differentiation in this style. I have suspicions that you can make iterative LQR and guass-seidel iteration have this flavor too, but I’m not super sure. My attempts ended somewhat unsatisfactorily a whiles back but I think it’s not hopeless. The trouble was that you usually want the whole vector back, not just its ends.

I’ve got another example in imperative program analysis that kind of makes sense and might be useful though. Toy repo here:

In program analysis it sometimes helps to run a program both concretely and symbolically. Concolic = CONCrete / symbOLIC. Symbolic stuff can slowly find hard things and concrete execution just sprays super fast and can find the dumb things really quick.  

We can use a lens structure to organize a DSL for describing a simple imperative language

The forward pass is for the concrete execution. The backward pass is for transforming the post condition to a pre condition in a weakest precondition analysis. Weakest precondition semantics is a way of specifying what is occurring in an imperative language. It tells how each statement transforms post conditions (predicates about the state after the execution) into pre conditions (predicates about before the execution).  The concrete execution helps unroll loops and avoid branching if-then-else behavior that would make the symbolic stuff harder to process. I’ve been flipping through Djikstra’s book on this. Interesting stuff, interesting man.

I often think of a state machine as a function taking s -> s. However, this is kind of restrictive. It is possible to have heterogenous transformations s -> s’. Why not? I think I am often thinking about finite state machines, which we really don’t intend to have a changing state size. Perhaps we allocated new memory or something or brought something into or out of scope. We could model this by assuming the memory was always there, but it seems wasteful and perhaps confusing. We need to a priori know everything we will need, which seems like it might break compositionally.

We could model our language making some data type like
data Imp = Skip | Print String | Assign String Expr | Seq Imp Imp | ...
and then build an interpreter

interp :: Imp -> s -> s'


But we can also cut out the middle man and directly define our language using combinators.

type Stmt s s' = s ->s'

To me this has some flavor of a finally tagless style.

Likewise for expressions. Expressions evaluate to something in the context of the state (they can lookup variables), so let’s just use

type Expr s a = s -> a

And, confusingly (sorry), I think it makes sense to use Lens in their original getter/setter intent for variables. So Lens structure is playing double duty.

type Var s a = Lens' s a

With that said, here we go.

type Stmt s s' = s -> s' 
type Lens' a b = a -> (b, b -> a)
set l s a = let (_, f) = l s in f a

type Expr s a = s -> a
type Var s a = Lens' s a

skip :: Stmt s s
skip = id

sequence :: Stmt s s' -> Stmt s' s'' -> Stmt s s''
sequence = flip (.)

assign :: Var s a -> Expr s a -> Stmt s s
assign v e = \s -> set v s (e s)

(===) :: Var s a -> Expr s a -> Stmt s s
v === e = assign v e

ite :: Expr s Bool -> Stmt s s' -> Stmt s s' -> Stmt s s'
ite e stmt1 stmt2 = \s -> if (e s) then stmt1 s else stmt2 s

while :: Expr s Bool -> Stmt s s -> Stmt s s
while e stmt = \s -> if (e s) then ((while e stmt) (stmt s)) else s

assert :: Expr s Bool -> Stmt s s  
assert e = \s -> if (e s) then s else undefined 

abort :: Stmt s s'  
abort = const undefined

Weakest precondition can be done similarly, instead we start from the end and work backwards

Predicates are roughly sets. A simple type for sets is

type Pred s = s -> Bool 
Now, this doesn’t have much deductive power, but I think it demonstrates the principles simply. We could replace Pred with perhaps an SMT solver expression, or some data type for predicates, for which we’ll need to implement things like substitution. Let’s not today.

A function

a -> b 
is equivalent to
forall c. (b -> c) -> (a -> c)
. This is some kind of CPS / Yoneda transformation thing. A state transformer
s -> s'
to predicate transformer
(s' -> Bool) -> (s -> Bool)
is somewhat evocative of that. I’m not being very precise here at all.

Without further ado, here’s how I think a weakest precondition looks roughly.

type Lens' a b = a -> (b, b -> a)
set l s a = let (_, f) = l s in f a

type Expr s a = s -> a
type Var s a = Lens' s a
type Pred s = s -> Bool
type Stmt s s' = Pred s' -> Pred s 

skip :: Stmt s s
skip = \post -> let pre = post in pre -- if

sequence :: Stmt s s' -> Stmt s' s'' -> Stmt s s''
sequence = (.)

assign :: Var s a -> Expr s a -> Stmt s s
assign v e = \post -> let pre s = post (set v s (e s)) in pre

(===) :: Var s a -> Expr s a -> Stmt s s
v === e = assign v e

ite :: Expr s Bool -> Stmt s s' -> Stmt s s' -> Stmt s s'
ite e stmt1 stmt2 = \post -> let pre s = if (e s) then (stmt1 post) s else (stmt2 post) s in pre

abort :: Stmt s s'  
abort = \post -> const False

assert :: Expr s Bool -> Stmt s s  
assert e = \post -> let pre s = (e s) && (post s) in pre

-- tougher. Needs loop invariant
while :: Expr s Bool -> Stmt s s -> Stmt s s
while e stmt = \post -> let pre s = if (e s) then ((while e stmt) (stmt post)) s else  in pre

Finally here is a combination of the two above that uses the branching structure of the concrete execution to aid construction of the precondition. Although I haven’t expanded it out, we are using the full s t a b parametrization of lens in the sense that states go forward and predicates come back.

type Lens' a b = a -> (b, b -> a)
set l s a = let (_, f) = l s in f a

type Expr s a = s -> a
type Var s a = Lens' s a
type Pred a = a -> Bool
type Stmt s s' = s -> (s', Pred s' -> Pred s) -- eh. Screw the newtype

skip :: Stmt s s
skip = \x -> (x, id)

sequence :: Stmt s s' -> Stmt s' s'' -> Stmt s s''
sequence f g =   \s -> let (s', j) = f s in
                       let (s'', j') = g s' in
                           (s'', j . j')
assign :: Var s a -> Expr s a -> Stmt s s
assign v e = \s -> (set v s (e s), \p -> \s -> p (set v s (e s)))

--if then else
ite :: Expr s Bool -> Stmt s s' -> Stmt s s' -> Stmt s s'
ite e stmt1 stmt2 = \s -> 
                    if (e s) 
                    then let (s', wp) = stmt1 s in
                         (s', \post -> \s -> (e s) && (wp post s))
                    else let (s', wp) = stmt2 s in
                            (s', \post -> \s -> (not (e s)) && (wp post s))

assert :: Pred s -> Stmt s s
assert p = \s -> (s, \post -> let pre s = (post s) && (p s) in pre)

while :: Expr s Bool -> Stmt s s -> Stmt s s
while e stmt = \s -> if e s then let (s' , wp) = (while e stmt) s in
                                 (s', \post -> let pre s'' = (post s'') && (wp post s'') in pre)   
                            else (s, \p -> p)


-- declare and forget can change the size and shape of the state space.
-- These are heterogenous state commpands
declare :: Iso (s,Int) s' -> Int -> Stmt s s'   
declare iso defalt = (\s -> to iso (s, defalt), \p -> \s -> p $ to iso (s, defalt)) 

forget :: Lens' s s' -> Stmt s s' -- forgets a chunk of state

declare_bracket :: Iso (s,Int) s' -> Int ->  Stmt s' s' -> Stmt s s
declare_bracket iso defalt stmt = (declare iso default) . stmt . (forget (_1 . iso))

Neat. Useful? Me dunno.

Flappy Bird as a Mixed Integer Program

My birds.

Mixed Integer Programming is a methodology where you can specify convex (usually linear) optimization problems that include integer/boolean variables.

Flappy Bird is a game about a bird avoiding pipes.

We can use mixed integer programming to make a controller for Flappy Bird. Feel free to put this as a real-world application in your grant proposals, people.

While thinking about writing a MIP for controlling a lunar lander game, I realized how amenable to mixed integer modeling flappy bird is. Ben and I put together the demo on Saturday. You can find his sister blog post here.

The bird is mostly in free fall, on parabolic trajectories. This is a linear dynamic, so it can directly be expressed as a linear constraint. It can discretely flap to give itself an upward impulse. This is a boolean force variable at every time step. Avoiding the ground and sky is a simple linear constraint. The bird has no control over its x motion, so that can be rolled out as concrete values. Because of this, we can check what pipes are relevant at time points in the future and putting the bird in the gap is also a simple linear constraint.

There are several different objectives one might want to consider and weight. Perhaps you want to save the poor birds energy and minimize the sum of all flaps cvx.sum(flap). Or perhaps you want to really be sure it doesn’t hit any pipes by maximizing the minimum distance from any pipe. Or perhaps minimize the absolute value of the y velocity, which is a reasonable heuristic for staying in control. All are expressible as linear constraints. Perhaps you might want a weighted combo of these. All things to fiddle with.

There is a pygame flappy bird clone which made this all the much more slick. It is well written and easy to understand and modify. Actually figuring out the appropriate bounding boxes for pipe avoidance was finicky. Figuring out the right combo of bird size and pipe size is hard, combined with computer graphics and their goddamn upside down coordinate system.

We run our solver in a model predictive control configuration. Model predictive control is where you roll out a trajectory as an optimization problem and resolve it at every action step. This turns an open loop trajectory solve into a closed loop control, at the expense of needing to solve a perhaps very complicated problem in real time. This is not always feasible.

My favorite mip modeling tool is cvxpy. It gives you vectorized constraints and slicing, which I love. More tools should aspire to achieve numpy-like interfaces. I’ve got lots of other blog posts using this package which you can find in my big post list the side-bar 👀.

The github repo for the entire code is here:

And here’s the guts of the controller:

import cvxpy as cvx
import numpy as np
import matplotlib.pyplot as plt

N = 20 # time steps to look ahead
path = cvx.Variable((N, 2)) # y pos and vel
flap = cvx.Variable(N-1, boolean=True) # whether or not the bird should flap in each step
last_solution = [False, False, False]
last_path = [(0,0),(0,0)]

SKY = 0
GROUND = (512*0.79)-1

def getPipeConstraints(x, y, lowerPipes):
    constraints = []
    for pipe in lowerPipes:
        dist_from_front = pipe['x'] - x - BIRDDIAMETER
        dist_from_back = pipe['x'] - x + PIPEWIDTH
        if (dist_from_front < 0) and (dist_from_back > 0):
            #print(pipe['y'] + BIRDDIAMETER,  pipe['y'] + PIPEGAPSIZE)
            constraints += [y <= (pipe['y'] - BIRDDIAMETER)] # y above lower pipe
            constraints += [y >= (pipe['y'] - PIPEGAPSIZE)] # y below upper pipe
    #if len(constraints) > 0:
    return constraints

def solve(playery, playerVelY, lowerPipes):
    global last_path, last_solution

    pipeVelX = -4
    playerAccY    =   1   # players downward accleration
    playerFlapAcc =  -14   # players speed on flapping

    # unpack variables
    y = path[:,0]

    vy = path[:,1]

    c = [] #constraints
    c += [y <= GROUND, y >= SKY]
    c += [y[0] == playery, vy[0] == playerVelY]

    x = PLAYERX
    xs = [x]
    for t in range(N-1):
        dt = t//10 + 1
        #dt = 1
        x -= dt * pipeVelX
        xs += [x]
        c += [vy[t + 1] ==  vy[t] + playerAccY * dt + playerFlapAcc * flap[t] ]
        c += [y[t + 1] ==  y[t] + vy[t + 1]*dt ]
        c += getPipeConstraints(x, y[t+1], lowerPipes)

    #objective = cvx.Minimize(cvx.sum(flap)) # minimize total fuel use
    objective = cvx.Minimize(cvx.sum(flap) + 10* cvx.sum(cvx.abs(vy))) # minimize total fuel use

    prob = cvx.Problem(objective, c)
        prob.solve(verbose = False, solver="GUROBI")
        last_path = list(zip(xs, y.value))
        last_solution = np.round(flap.value).astype(bool)
        return last_solution[0], last_path
        last_solution = last_solution[1:]
        last_path = [((x-4), y) for (x,y) in last_path[1:]]
        return last_solution[0], last_path

I think it is largely self explanatory but here are some notes. The dt = t//10 + 1 thing is about decreasing our time resolution the further out from the current time step. This increases the time horizon without the extra computation cost. Intuitively modeling accuracy further out in time should matter less. The last_solution stuff is for in case the optimization solver fails for whatever reason, in which case it’ll follow open-loop the last trajectory it got.

Bits and Bobbles

  • We changed the dynamics slightly from the python original to make it easier to model. We found this did not change the feel of the game. The old dynamics were piecewise affine though, so are also modelable using mixed integer programming. . Generally check out the papers coming out of the Tedrake group. They are sweet. Total fanboy over here.
  • The controller as is is not perfect. It fails eventually, and it probably shouldn’t. A bug? Numerical problems? Bad modeling of the pipe collision? A run tends to get through about a hundred pipes before something gets goofy.
  • Since we had access to the source code, we could mimic the dynamics very well. How robust is flappy bird to noise and bad modeling? We could add wind, or inaccurate pipe data.
  • Unions of Convex Region. Giving the flappy bird some x position control would change the nature of the problem. We could easily cut up the allowable regions of the bird into rectangles, and represent the total space as a union of convex regions, which is also MIP representable.
  • Verification – The finite difference scheme used is crude. It is conceivable for the bird to clip a pipe. Since basically we know the closed form of the trajectories, we could verify that the parabolas do not intersect the regions. For funzies, maybe use sum of squares optimization?
  • Realtime MIP. Our solver isn’t quite realtime. Maybe half real time. One might pursue methods to make the mixed integer program faster. This might involve custom branching heuristics, or early stopping. If one can get the solver fast enough, you might run the solver in parallel and only query a new path plan every so often.
  • 3d flappy bird? Let the bird rotate? What about a platformer (Mario) or lunar lander? All are pretty interesting piecewise affine systems.
  • Is this the best way to do this? Yes and no. Other ways to do this might involve doing some machine learning, or hardcoding a controller that monitors the pipe locations and has some simple feedback. You can find some among the forks of FlapPyBird. I have no doubt that you could write these quickly, fiddle with them and get them to work better and faster than this MIP controller. However, for me there is a difference between could work and should work. You can come up with a thousand bizarre schemes that could work. RL algorithms fall in this camp. But I have every reason to believe the MIP controller should work, which makes it easier to troubleshoot.