Casadi is something I’ve been aware of and not really explored much. It is a C++ / python / matlab library for modelling optimization problems for optimal control with bindings to IPOpt and other solvers. It can produce C code and has differentiation stuff. See below for some examples after I ramble.

I’ve enjoyed cvxpy, but cvxpy is designed specifically for convex problems, of which many control problems are not.

Casadi gives you a nonlinear modelling language and easy access to IPOpt, an interior point solver that works pretty good (along with some other solvers, many of which are proprietary however).

While the documentation visually looks very slick I actually found it rather confusing in contents at first. I’m not sure why. Something is off.

It also has a bunch of helper classes for DAE building and other things. They honestly really put me off. The documentation is confusing enough that I am not convinced they give you much.

The integrator classes give you access to external smart ode solvers from the Sundials suite. They give you good methods for difficult odes and dae (differential algebraic equations, which are ODEs with weird constraints like x^1 + y^1 == 1) Not clear to me if you can plug those in to an optimization, other than by a shooting method.

Casadi can also output C which is pretty cool.

I kind of wondered about Casadi vs Sympy. Sympy has lot’s of general purpose symbolic abilities. Symbolic solving, polynomial smarts, even some differential equation understanding. There might be big dividends to using it. But it is a little harder to get going. I feel like there is an empty space for a mathemtical modelling language that uses sympy as it’s underlying representation. I guess monkey patching sympy expressions into casadi expression might not be so hard. Sympy can also output fast C code. Sympy doesn’t really have any support for sparseness that I know of.

As a side note, It can be useful to put these other languages into numpy if you need extended reshaping abilities. The other languages often stop at matrices, which is odd to me.

Hmm. Casadi actually does have access to mixed integer programs via bonmin (and commercial solvers). That’s interesting. Check out lotka volterra minlp example

The optim interface makes some of this look better. optim.minimize and subject_to. Yeah, this is more similar to the interfaces I’m used to. It avoids the manual unpacking of the solution and the funky feel of making everything into implicit == 0 expressions.

Here is a simple harmonic oscillator example using the more raw casadi interface. x is positive, v is velocity, u is a control force. I’m using a very basic leap frog integration. You tend to have to stack things into a single vector with vertcat when building the final problem.

Let’s use the opti interface, which is pretty slick. Here is a basic cartpole https://web.casadi.org/blog/opti/

Very fast. Very impressive. Relatively readable code. I busted this out in like 15 minutes. IPopt solves the thing in the blink of an eye (about 0.05s self reported). Might be even faster if I warm start it with a good solution, as I would in online control (which may be feasible at this speed). Can add the initial condition as a parameter to the problem

I should try this on an openai gym.

Thoughts on Faking Some of GADTs in Rust

I’m a guy who is somewhat familiar with Haskell who is trying to learn Rust. So I thought I’d try to replicate some cool Haskell functionality in Rust. I would love to hear comments, because I’m trying to learn. I have no sense of Rust aesthetics yet and in particular I have no idea how this interacts with the borrow system. What follows is a pretty rough brain dump

GADTs (Generalized algebraic data types) are an extension in Haskell that allows you to write constrained type signatures for your data constructors. They also change how the type checking of pattern matching is processed.

GADTs are sometimes described/faked by being built by making data types that hold equality/unification constraints. Equality constraints in Haskell like a ~ Int are fairly magical and the Rust compiler does not support them in an obvious way. Maybe this is the next project. Figure out how to fake ’em if one can. I don’t think this is promising though, because faking them will be a little wonky, and then GADTs are a little wonky on top of that. See https://docs.rs/refl/0.1.2/refl/ So we’ll go another (related) road.

And here is one style of encoding using smart constructors and a typeclass for elimination (pattern matching is replicated as a function that takes callbacks for the data held in the different cases). Regular functions can have a restricted type signature than the most general one their implementation implies. The reason to use a typeclass is so that we can write the eliminator as returning the same type that the GADT supplies. There isn’t an explicit equality constraint. A kind of Leibnitz equality

is hiding in the eliminator. The Leibnitz equality can be used in place of (~) constraints at some manual cost. http://code.slipthrough.net/2016/08/10/approximating-gadts-in-purescript/

https://jesper.sikanda.be/files/leibniz-equality.pdf

The

is a problem for Rust. Rust does not have higher kinded types, although they can be faked to some degree. https://gist.github.com/CMCDragonkai/a5638f50c87d49f815b8 There are murmurs of Associated Type Constructors / GATs , whatever those are , that help ease the pain, but I’m pretty sure they are not implemented anywhere yet.

I’m going to do something related, a defunctionalization of the higher kinded types. We make an application trait, that will apply the given type function tag to the argument. What I’m doing is very similar to what happens in the singletons library, so we may be getting some things for free.

Then in order to define a new typelevel function rip out a quick tag type and an App impl.

It might be possible to sugar this up with a macro. It may also be possible to write typelevel functions in a point free style without defining new function tag names. The combinators Id, Comp, Par, Fst, Snd, Dup, Const are all reasonably definable and fairly clear for small functions. Also the combinator S if you want to talk SKI combinatory calculus, which is unfit for humans. https://en.wikipedia.org/wiki/SKI_combinator_calculus For currying, I used a number for how many arguments are left to be applied (I’m not sure I’ve been entirely consistent with these numbers). You need to do currying quite manually. It may be better to work with tuplized arguments

Anyway, the following is a translation of the above Haskell (well, I didn’t wrap an actual i64 or bool in there but I could have I think). You need to hide the actual constructors labeled INTERNAL in a user inaccessible module.

The smart constructors put the right type in the parameter spot

Then pattern matching is a custom trait per gadtified type. Is it possible to unify the different elimination traits that will come up into a single Elim trait? I’m 50-50 about whether this is possible. What we’re doing is a kind of fancy map_or_else if that helps you.

https://doc.rust-lang.org/std/option/enum.Option.html#method.map_or_else

Usage. You have to explicitly pass the return type function to the eliminator. No inference is done for you. It’s like Coq’s match but worse. BTW the dbg! macro is the greatest thing on earth. Well done, Rust.

You can make helpers that don’t require explicit types to be given

One could also make an Eq a b type with Refl similarly. Then we need typelevel function tags that take two type parameter. Which, with currying or tupling, we may already have.

Questions:

Is this even good? Or is it a road of nightmares? Is this even emulating GADTs or am I just playing phantom type games?

We aren’t at full gadt. We don’t have existential types. Rust has some kind of existential story evolving (already there?), but the state of it is confusing to me. Something to play with. Higher rank functions would help?

Are overlapping typeclasses a problem in Rust?

Again, I have given nearly zero thought to borrowing and how it interacts with this. I’m a Rust n00b. I should think about it. Different eliminators based on whether you own or are borrowing?.

How much of singleton style dependent types do we get from this? It feels like we have already paid the cost of defunctionalizing. http://hackage.haskell.org/package/singletons

My current playground for this is at https://github.com/philzook58/typo

Cvxpy and NetworkX Flow Problems

Networkx outputs scipy sparse incidence matrices

https://networkx.github.io/documentation/networkx-1.9/reference/generated/networkx.linalg.graphmatrix.incidence_matrix.html#networkx.linalg.graphmatrix.incidence_matrix

https://docs.scipy.org/doc/scipy/reference/sparse.html

Networkx also has it’s own flow solvers, but cvxpy gives you some interesting flexibility, like turning the problem mixed integer, quadratic terms, and other goodies. Plus it is very easy to get going as you’ll see.

So here’s a basic example of putting these two together. Very straightforward and cool.

Here was a cool visual from a multi commodity flow problem (nx.draw_networkx_edges)

Nice, huh.

A Little Bloop on Typed Template Haskell

I’ve found looking at my statistics that short, to the point, no bull crap posts are the most read and probably most useful.

I’ve been tinkering around with typed template Haskell, which a cursory internet search doesn’t make obvious even exists. The first thing to come up is a GHC implementation wiki that seems like it might be stalled in some development branch. No. Typed template Haskell is already in GHC since GHC 7.8. And the first thing that comes up on a template haskell google search is the style of Template Haskell where you get deep into the guts of the syntax tree, which is much less fun.

Here’s a basic addition interpreter example.

The typed splice  takes a out of TExpQ a. The typed quote [|| ||] puts it in. I find that you tend to be able to follow the types to figure out what goes where. If you’re returning a TExpQ, you probably need to start a quote. If you need to recurse, you often need to splice. Etc.

You tend to have to put the actual use of the splice in a different file. GHC will complain otherwise.

At the top of your file put this to have the template haskell splices dumped into a file

Or have your package.yaml look something like this

If you’re using stack, you need to dive into .stack/dist/x86/Cabal/build and then /src or into the executable folder something-exe-tmp/app to find .dump-splices files.

Nice. I don’t know whether GHC might have optimized this thing down anyhow or not, but we have helped the process along.

Some more examples: unrolling the recursion on a power function (a classic)

You can unroll a fibonacci calculation

This is blowing up in calculation though (no memoization, you silly head). We can implement sharing in the code using let expression (adapted from https://wiki.haskell.org/The_Fibonacci_sequence). Something to think about.

Tinkering around, you’ll occasionally find GHC can’t splice and quote certain things. This is related to cross stage persistence and lifting, which are confusing to me. You should look in the links below for more info. I hope I’ll eventually get a feel for it.

If you want to see more of my fiddling in context to figure out how to get stuff to compile here’s my github that I’m playing around in https://github.com/philzook58/staged-fun

Simon Peyton Jones has a very useful talk slides. Extremely useful https://www.cl.cam.ac.uk/events/metaprog2016/Template-Haskell-Aug16.pptx

Matthew Pickering’s post keyed me into that Typed Template Haskell is even there.

http://mpickering.github.io/posts/2019-02-14-stage-3.html

MuniHac 2018: Keynote: Beautiful Template Haskell

https://markkarpov.com/tutorial/th.html Mark Karpov has a useful Template Haskell tutorial

Oleg Kiselyov: I’m still trying to unpack the many things that are going on here. Oleg’s site and papers are a very rich resource.

I don’t think that staged metaprogramming requires finally tagless style, as I first thought upon reading some of his articles. It is just very useful.

http://okmij.org/ftp/meta-programming/

http://okmij.org/ftp/meta-programming/tutorial/

Typed Template Haskell is still unsound (at least with the full power of the Q templating monad) http://okmij.org/ftp/meta-programming/#TExp-problem

You can write things in a finally tagless style such that you can write it once and have it interpreted as staged or in other ways. There is a way to also have a subclass of effects (Applicatives basically?) less powerful than Q that is sound.

A Touch of Topological Quantum Computation 3: Categorical Interlude

Welcome back, friend.

In the last two posts, I described the basics of how to build and manipulate the Fibonacci anyon vector space in Haskell.

As a personal anecdote, trying to understand the category theory behind the theory of anyons is one of the reasons I started learning Haskell. These spaces are typically described using the terminology of category theory. I found it very frustrating that anyons were described in an abstract and confusing terminology. I really wondered if people were just making things harder than they have to be. I think Haskell is a perfect playground to clarify these constructions. While the category theory stuff isn’t strictly necessary, it is interesting and useful once you get past the frustration.

but I hope everyone can get something out of it. Give it a shot if you’re interested, and don’t sweat the details.

The Aroma of Categories

I think Steve Awodey gives an excellent nutshell of category theory in the introductory section to his book:

“What is category theory? As a first approximation, one could say that category theory is the mathematical study of (abstract) algebras of functions. Just as group theory is the abstraction of the idea of a system of permutations of a set or symmetries of a geometric object, so category theory arises from the idea of a system of functions among some objects.”

For my intuition, a category is any “things” that plug together. The “in” of a thing has to match the “out” of another thing in order to hook them together. In other words, the requirement for something to be a category is having a notion of composition. The things you plug together are called the morphisms of the category and the matching ports are the objects of the category. The additional requirement of always having an identity morphism (a do-nothing connection wire) is usually there once you have composition, although it is good to take especial note of it.

Category theory is an elegant framework for how to think about these composing things in a mathematical way. In my experience, thinking in these terms leads to good abstractions, and useful analogies between disparate things.

It is helpful for any abstract concept to list some examples to expose the threads that connect them. Category theory in particular has a ton of examples connecting to many other fields because it is a science of analogy. These are the examples of categories I usually reach for. Which one feels the most comfortable to you will depend on your background.

• Hask. Objects are types. Morphisms are functions between those types
• Vect. Objects are vector spaces, morphisms are linear maps (roughly matrices).
• Preorders. Objects are values. Morphisms are the inequalities between those values.
• Sets. Objects are Sets. Morphisms are functions between sets.
• Cat. Objects are categories, Morphisms are functors. This is a pretty cool one, although complete categorical narcissism.
• Systems and Processes.
• The Free Category of a directed graphs. Objects are vertices. Morphisms are paths between vertices

Generic Programming and Typeclasses

The goal of generic programming is to run programs that you write once in many way.

There are many ways to approach this generic programming goal, but one way this is achieved in Haskell is by using Typeclasses. Typeclasses allow you to overload names, so that they mean different things based upon the types involved. Adding a vector is different than adding a float or int, but there are programs that can be written that reasonably apply in both situations.

Writing your program in a way that it applies to disparate objects requires abstract ways of talking about things. Mathematics is an excellent place to mine for good abstractions. In particular, the category theory abstraction has demonstrated itself to be a very useful unified vocabulary for mathematical topics. I, and others, find it also to be a beautiful aesthetic by which to structure programs.

In the Haskell base library there is a Category typeclass defined in base. In order to use this, you need to import the Prelude in an unusual way.

The Category typeclass is defined on the type that corresponds to the morphisms of the category. This type has a slot for the input type and a slot for the output type. In order for something to be a category, it has to have an identity morphisms and a notion of composition.

The most obvious example of this Category typeclass is the instance for the ordinary Haskell function (->). The identity corresponds to the standard Haskell identity function, and composition to ordinary Haskell function composition.

Another example of a category that we’ve already encountered is that of linear operators which we’ll call LinOp. LinOp is an example of a Kliesli arrow, a category built using monadic composition rather than regular function composition. In this case, the monad Q from my first post takes care of the linear pipework that happens between every application of a LinOp. The fish <=< operator is monadic composition from Control.Monad.

A related category is the FibOp category. This is the category of operations on Fibonacci anyons, which are also linear operations. It is LinOp specialized to the Fibonacci anyon space. All the operations we’ve previously discussed (F-moves, braiding) are in this category.

The “feel” of category theory takes focus away from the objects and tries to place focus on the morphisms. There is a style of functional programming called “point-free” where you avoid ever giving variables explicit names and instead use pipe-work combinators like (.), fst, snd, or (***). This also has a feel of de-emphasizing objects. Many of the combinators that get used in this style have categorical analogs. In order to generically use categorical typeclasses, you have to write your program in this point free style.

It is possible for a program written in the categorical style to be a reinterpreted as a program, a linear algebra operation, a circuit, or a diagram, all without changing the actual text of the program. For more on this, I highly recommend Conal Elliot’s  compiling to categories, which also puts forth a methodology to avoid the somewhat unpleasant point-free style using a compiler plug-in. This might be an interesting place to mine for a good quantum programming language. YMMV.

Monoidal Categories.

Putting two processes in parallel can be considered a kind of product. A category is monoidal if it has this product of this flavor, and has isomorphisms for reassociating objects and producing or consuming a unit object. This will make more sense when you see the examples.

We can sketch out this monoidal category concept as a typeclass, where we use () as the unit object.

Instances

In Haskell, the standard monoidal product for regular Haskell functions is (***) from Control.Arrow. It takes two functions and turns it into a function that does the same stuff, but on a tuple of the original inputs. The associators and unitors are fairly straightforward. We can freely dump unit () and get it back because there is only one possible value for it.

The monoidal product we’ll choose for LinOp is the tensor/outer/Kronecker product.

Otherwise, LinOp is basically a monadically lifted version of (->). The one dimensional vector space Q () is completely isomorphic to just a number. Taking the Kronecker product with it is basically the same thing as scalar multiplying (up to some shuffling).

Now for a confession. I made a misstep in my first post. In order to make our Fibonacci anyons jive nicely with our current definitions, I should have defined our identity particle using type Id = () rather than data Id. We’ll do that now. In addition, we need some new primitive operations for absorbing and emitting identity particles that did not feel relevant at that time.

With these in place, we can define a monoidal instance for FibOp. The extremely important and intriguing F-move operations are the assoc operators for the category. While other categories have assoc that feel nearly trivial, these F-moves don’t feel so trivial.

This is actually useful

The parC operation is extremely useful to explicitly note in a program. It is an opportunity for optimization. It is possible to inefficiently implement parC in terms of other primitives, but it is very worthwhile to implement it in new primitives (although I haven’t here). In the case of (->), parC is an explicit location where actual computational parallelism is available. Once you perform parC, it is not longer obviously apparent that the left and right side of the tuple share no data during the computation. In the case of LinOp and FibOp, parC is a location where you can perform factored linear computations. The matrix vector product $(A \otimes B)(v \otimes w)$ can be performed individually $(Av)\otimes (Bw)$. In the first case, where we densify $A \otimes B$ and then perform the multiplication, it costs $O((N_A N_B)^2)$ time, whereas performing them individually on the factors costs $O( N_A^2 + N_B^2)$ time, a significant savings. Applied category theory indeed.

Laws

Like many typeclasses, these monoidal morphisms are assumed to follow certain laws. Here is a sketch (for a more thorough discussion check out the wikipedia page):

• Functions with a tick at the end like assoc' should be the inverses of the functions without the tick like assoc, e.g. assoc . assoc' = id
• The parC operation is (bi)functorial, meaning it obeys the commutation law parC (f . f') (g . g') = (parC f g) . (parC f' g') i.e. it doesn’t matter if we perform composition before or after the parC.
• The pentagon law for assoc: Applying leftbottom is the same as applying topright
• The triangle law for the unitors:

String Diagrams

String diagrams are a diagrammatic notation for monoidal categories. Morphisms are represented by boxes with lines.

Composition g . f is made by connecting lines.

The identity id is a raw arrow.

The monoidal product of morphisms $f \otimes g$ is represented by placing lines next to each other.

The diagrammatic notion is so powerful because the laws of monoidal categories are built so deeply into it they can go unnoticed. Identities can be put in or taken away. Association doesn’t even appear in the diagram. The boxes in the notation can naturally be pushed around and commuted past each other.

This corresponds to the property

$(id \otimes g) \circ (f \otimes id) = (f \otimes id) \circ (id \otimes g)$

What expression does the following diagram represent?

Is it $(f \circ f') \otimes (g \circ g')$ (in Haskell notation parC (f . f') (g . g') )?

Or is it $(f \otimes g) \circ (f' \otimes g')$ (in Haskell notation (parC f g) . (parC f' g')?

Answer: It doesn’t matter because the functorial requirement of parC means the two expressions are identical.

There are a number of notations you might meet in the world that can be interpreted as String diagrams. Three that seem particular pertinent are:

• Quantum circuits
• Anyon Diagrams!

Braided and Symmetric Monoidal Categories: Categories That Braid and Swap

Some monoidal categories have a notion of being able to braid morphisms. If so, it is called a braided monoidal category (go figure).

The over and under morphisms are inverse of each other over . under = id. The over morphism pulls the left morphism over the right, whereas the under pulls the left under the right. The diagram definitely helps to understand this definition.

These over and under morphisms need to play nice with the associator of the monoidal category. These are laws that valid instance of the typeclass should follow. We actually already met them in the very first post.

If the over and under of the braiding are the same the category is a symmetric monoidal category. This typeclass needs no extra functions, but it is now intended that the law over . over = id is obeyed.

When we draw a braid in a symmetric monoidal category, we don’t have to be careful with which one is over and under, because they are the same thing.

The examples that come soonest to mind have this symmetric property, for example (->) is a symmetric monoidal category..

Similarly LinOp has an notion of swapping that is just a lifting of swap

However, FibOp is not symmetric! This is perhaps at the core of what makes FibOp so interesting.

Automating Association

Last time, we spent a lot of time doing weird typelevel programming to automate the pain of manual association moves. We can do something quite similar to make the categorical reassociation less painful, and more like the carefree ideal of the string diagram if we replace composition (.) with a slightly different operator

Before defining reassoc, let’s define a helper LeftCollect typeclass. Given a typelevel integer n, it will reassociate the tree using a binary search procedure to make sure the left branch l at the root has Count l = n.

Once we have LeftCollect, the typeclass ReAssoc is relatively simple to define. Given a pattern tree, we can count the elements in it’s left branch and LeftCollect the source tree to match that number. Then we recursively apply reassoc in the left and right branch of the tree. This means that every node has the same number of children in the tree, hence the trees will end up in an identical shape (modulo me mucking something up).

It seems likely that one could write equivalent instances that would work for an arbitrary monoidal category with a bit more work. We are aided somewhat by the fact that FibOp has a finite universe of possible leaf types to work with.

Closing Thoughts

While our categorical typeclasses are helpful and nice, I should point out that they are not going to cover all the things that can be described as categories, even in Haskell. Just like the Functor typeclass does not describe all the conceptual functors you might meet. One beautiful monoidal category is that of Haskell Functors under the monoidal product of Functor Composition. More on this to come, I think. https://parametricity.com/posts/2015-07-18-braids.html

We never even touched the dot product in this post. This corresponds to another doodle in a string diagram, and another power to add to your category. It is somewhat trickier to work with cleanly in familiar Haskell terms, I think because (->) is at least not super obviously a dagger category?

You can find a hopefully compiling version of all my snippets and more in my chaotic mutating Github repo https://github.com/philzook58/fib-anyon

See you next time.

References

The Rosetta Stone paper by Baez and Stay is probably the conceptual daddy of this entire post (and more).

Bartosz Milewski’s Category Theory for Programmer’s blog (online book really) and youtube series are where I learned most of what I know about category theory. I highly recommend them (huge Bartosz fanboy).

https://www.math3ma.com/blog/what-is-category-theory-anyway

There are fancier embeddings of category theory and monoidal categories than I’ve shown here. Often you want constrained categories and the ability to choose unit objects. I took a rather simplistic approach here.

https://parametricity.com/posts/2015-07-18-braids.html

Applicative Bidirectional Programming and Automatic Differentiation

I got referred to an interesting paper by a comment of /u/syrak.

http://www2.sf.ecei.tohoku.ac.jp/~kztk/papers/kztk_jfp_am_2018.pdf

Applicative bidirectional programming (PDF), by Kazutaka Matsuda and Meng Wang

In it, they use a couple interesting tricks to make Lens programming more palatable. Lens often need to be be programmed in a point free style, which is rough, but by using some combinators, they are able to program lenses in a pointful style (with some pain points still left over). It is a really interesting, well-written paper. Lots ‘o’ Yoneda and laws. I’m not doing it justice. Check it out!

A while back I noted that reverse mode auto-differentiation has a type very similar to a lens and in fact you can build a working reverse mode automatic differentiation DSL out of lenses and lens-combinators. Many things about lenses, but not all, transfer over to automatic differentiation. The techniques of Matsuda and Wang do appear to transfer fairly well.

This is interesting to me for another reason. Their lift2 and unlift2 functions remind me very much of my recent approach to compiling to categories. The lift2 function is fanning a lens pair. This is basically what my FanOutput typeclass automated. unlift2 is building the input for a function function by supplying a tuple of projection lenses. This is what my BuildInput typeclass did. I think their style may extend many monoidal cartesian categories, not just lenses.

One can use the function b -> a in many of the situations one can use a in. You can do elegant things by making a Num typeclass of b -> a for example. This little fact seems to extend to other categories as well. By making a Num typeclass for Lens s a when a is a Num, we can use reasonable looking notation for arithmetic.

They spend some time discussing the necessity of a Poset typeclass. For actual lawful lenses, the dup implementation needs a way to recombine multiple adjustments to the same object. In the AD-lens case, dup takes care of this by adding together the differentials. This means that everywhere they needed an Eq typeclass, we can use a Num typeclass. There may be usefulness to building a wrapped type data NZ a = Zero | NonZero a like their Tag type to accelerate the many 0 values that may be propagating through the system.

Unfortunately, as is, the performance of this is abysmal. Maybe there is a way to fix it? Unlifting and lifting destroys a lot of sharing and often necessitates adding many redundant zeros. Why are you doing reverse mode differentiation unless you care about performance? Forward mode is simpler to implement. In the intended use case of Matsuda and Wang, they are working with actual lawful lenses, which have far less computational content than AD-lenses. Good lawful lenses should just be shuffling stuff around a little. Maybe one can hope GHC is insanely intelligent and can optimize these zeros away. One point in favor of that is that our differentiation is completely pure (no mutation). Nevertheless, I suspect it will not without help. Being careful and unlifting and lifting manually may also help. In principle, I think the Lensy approach could be pretty fast (since all it is is just packing together exactly what you need to differentiate into a data type), but how to make it fast while still being easily programmable? It is also nice that it is pretty simple to implement. It is the simplest method that I know of if you needed to port operable reverse mode differentiation to a new library (Massiv?) or another functional language (Futhark?). And a smart compiler really does have a shot at finding optimizations/fusions.

While I was at it, unrelated to the paper above, I think I made a working generic auto differentiable fold lens combinator. Pretty cool. mapAccumL is a hot tip.

For practical Haskell purposes, all of this is a little silly with the good Haskell AD packages around, the most prominent being

It is somewhat interesting to note the similarity of type forall s. Lens s appearing in the Matsuda and Wang approach to those those of the forall s. BVar s monad appearing in the backprop package. In this case I believe that the s type variable plays the same role it does in the ST monad, protecting a mutating Wengert tape state held in the monad, but I haven’t dug much into it. I don’t know enough about backprop to know what to make of this similarity.

The github repo with my playing around and stream of consciousness commentary is here

A Touch of Topological Quantum Computation in Haskell Pt. II: Automating Drudgery

Last time we built the basic pieces we need to describe anyons in Haskell. Anyon models describe interesting physical systems where a set of particles (Tau and Id in our case) have certain splitting rules and peculiar quantum properties. The existence of anyons in a system are the core physics necessary to support topological quantum computation. In topological quantum computing, quantum gates are applied by braiding the anyons and measurements performed by fusing anyons together and seeing what particle comes out. Applying gates in this way has inherent error correcting properties.

The tree of particle production with particle labelled leaves picks a basis (think the collection $\{\hat{x}, \hat{y}, \hat{z}\}$ ) for the anyon quantum vector space. An individual basis vector (think $\hat{x}$ ) from this basis is specified by labelling the internal edges of the tree. We built a Haskell data type for a basic free vector space and functions for the basic R-moves for braiding two anyons and reassociating the tree into a new basis with F-moves. In addition, you can move around your focus within the tree by using the function lmap and rmap. The github repo with that and what follows below is here.

Pain Points

We’ve built the atomic operations we need, but they work very locally and are quite manual. You can apply many lmap and rmap to zoom in to the leaves you actually wish to braid, and you can manually perform all the F-moves necessary to bring nodes under the same parent, but it will be rather painful.

The standard paper-and-pencil graphical notation for anyons is really awesome. You get to draw little knotty squiggles to calculate. It does not feel as laborious. The human eye and hand are great at applying a sequence of reasonably optimal moves to untangle the diagram efficiently. Our eye can take the whole thing in and our hand can zip around anywhere.

To try and bridge this gap, we need to build functions that work in some reasonable way on the global anyon tree and that automate simple tasks.

A Couple Useful Functions

Our first useful operation is pullLeftLeaf. This operation will rearrange the tree using F-moves to get the leftmost leaf associated all the way to the root. The leftmost leaf will then have the root as a parent.

Because the tree structure is in the FibTree a b data type, we need the tuple tree type of the pulled tree. This is a slightly non-trivial type computation.

In order to do this, we’ll use a bit of typelevel programming. If this is strange and alarming stuff for you, don’t sweat it too much. I am not the most elegant user of these techniques, but I hope that alongside my prose description you can get the gist of what we’re going for.

(Sandy Maguire has a new book on typelevel programming in Haskell out. Good stuff. Support your fellow Haskeller and toss him some buckos.)

The resulting tree type b is an easily computable function of the starting tree type a. That is what the “functional dependency” notation | a -&gt; b in the typeclass definition tells the compiler.

The first 4 instances are base cases. If you’re all the way at the leaf, you basically want to do nothing. pure is the function that injects the classical tree description into a quantum state vector with coefficient 1.

The meat is in the last instance. In the case that the tree type matches ((a,b),c), we recursively call PullLeftLeaf on (a,b) which returns a new result (a',b'). Because of the recursion, this a' is the leftmost leaf. We can then construct the return type by doing a single reassociation step. The notation ~ forces two types to unify. We can use this conceptually as an assignment statement at the type level. This is very useful for building intermediate names for large expressions, as assert statements to ensure the types are as expected, and also occasionally to force unification of previously unknown types. It’s an interesting operator for sure.

The recursion at the type level is completely reflected in the actual function definition. We focus on the piece (a,b) inside t by using lmap. We do a recursive call to pullLeftLeaf, and finally fmove' performs the final reassociation move. It is all rather verbose, but straightforward I hope.

You can also build a completely similar PullRightLeaf.

A Canonical Right Associated Basis

One common way of dealing with larger trees is to pick a canonical basis of fully right associated trees. The fully right associated tree is a list-like structure. Its uniformity makes it easier to work with.

By recursively applying pullLeftLeaf, we can fully right associate any tree.

This looks quite similar to the implementation of pullLeftLeaf. It doesn’t actually have much logic to it. We apply pullLeftLeaf, then we recursively apply rightAssoc in the right branch of the tree.

B-Moves: Braiding in the Right Associated Basis

Now we have the means to convert any structure to it’s right associated canonical basis. In this basis, one can apply braiding to neighboring anyons using B-moves, which can be derived from the braiding R-moves and F-moves.

The B-move applies one F-move so that the two neighboring leaves share a parent, uses the regular braiding R-move, then applies the inverse F-move to return back to the canonical basis. Similarly, bmove'  is the same thing except applies the under braiding braid' rather that the over braiding braid.

(Image Source : Preskill’s notes)

Indexing to Leaves

We also may desire just specifying the integer index of where we wish to perform a braid. This can be achieved with another typeclass for iterated rmaping. When the tree is in canonical form, this will enable us to braid two neighboring leaves by an integer index. This index has to be a typelevel number because the output type depends on it.

In fact there is quite a bit of type computation. Given a total tree type s and an index n this function will zoom into the subpart a of the tree at which we want to apply our function. The subpart a is replaced by b, and then the tree is reconstructed into t. t is s with the subpart a mapped into b.  I have intentionally made this reminiscent of the type variables of the lens type Lens s t a b .

This looks much noisier that it has to because we need to work around some of the unfortunate realities of using the typeclass system to compute types. We can’t just match on the number n in order to pick which instance to use because the patterns 0 and n are overlapping. The pattern n can match the number 0 if n ~ 0. The pattern matching in the type instance is not quite the same as the regular Haskell pattern matching we use to define functions. The order of the definitions does not matter, so you can’t have default cases. The patterns you use cannot be unifiable. In order to fix this, we make the condition if n is greater than 0 an explicit type variable gte. Now the different cases cannot unify. It is a very common trick to need a variable representing some branching condition.

For later convenience, we define rmapN which let’s us not need to supply the necessary comparison type gte.

Parentifying Leaves Lazily

While it is convenient to describe anyon computations in a canonical basis, it can be quite inefficient. Converting an arbitrary  anyon tree into the standard basis will often result in a dense vector. A natural thing to do for the sake of economy is only do reassociation on demand.

The algorithm for braiding two neighboring leaves is pretty straightforward. We need to reassociate these leaves so that they have the same parent. First we need the ability to map into the least common ancestor of the two leaves. To reassociate these two leaves to have a common parent we pullrightLeaf the left subtree and then pullLeftLeaf the left subtree. Finally, there is a bit extra bit of shuffling to actually get them to be neighbors.

As a first piece, we need a type level function to count the number of leaves in a tree. In this case, I am inclined to use type families rather than multi parameter typeclasses as before, since I don’t need value level stuff coming along for the ride.

Next, we make a typeclass for mapping into the least common ancestor position.

We find the least common ancestor position by doing a binary search on the size of the left subtrees at each node. Once the size of the left subtree equals n, we’ve found the common ancestor of leaf n and leaf n+1.

Again, this LCAMap typeclass has a typelevel argument gte that directs it which direction to go down the tree.

The Twiddle typeclass will perform some final cleanup after we’ve done all the leaf pulling. At that point, the leaves still do not have the same parent. They are somewhere between 0 and 2 F-moves off depending on whether the left or right subtrees may be just a leaf or larger trees. twiddle is not a recursive function.

Putting this all together we get the nmap function that can apply a function after parentifying two leaves. By far the hardest part is writing out that type signature.

Usage Example

Here’s some simple usage:

Note that rmapN is 0-indexed but nmap is 1-indexed. This is somewhat horrifying, but that is what was natural in the implementation.

Here is a more extended example showing how to fuse some particles.

I started with the tree at the top and traversed downward implementing each braid and fusion. Implicitly all the particles shown in the diagram are Tau particles. The indices refer to particle position, not to the particles “identity” as you would trace it by eye on the page. Since these are identical quantum  particles, the particles don’t have identity as we classically think of it anyhow.

The particle pairs are indexed by the number on the left particle. First braid 1 over 2, then 2 over 3, fuse 1 and 2, braid 2 under 3, fuse 2 and 3, and then fuse 1 and 2. I got an amplitude for the process of -0.618, corresponding to a probability of 0.382. I would give myself 70% confidence that I implemented all my signs and conventions correctly. The hexagon and pentagon equations from last time being correct gives me some peace of mind.

Syntax could use a little spit polish, but it is usable. With some readjustment, one could use the Haskell do notation removing the need for explicit &gt;&gt;=.

Next Time

Anyons are often described in categorical terminology. Haskell has a category culture as well. Let’s explore how those mix!

Compiling to Categories 3: A Bit Cuter

Ordinary Haskell functions form a cartesian closed category. Category means you can compose functions. Cartesian basically means you can construct and deconstruct tuples and Closed means that you have first class functions you can pass around.

Conal Elliott’s Compiling to Categories is a paradigm for reinterpreting ordinary functions as the equivalent in other categories. At an abstract level, I think you could describe it as a mechanism to build certain natural law abiding Functors from Hask to other categories. It’s another way to write things once and have them run many ways, like polymorphism or generic programming. The ordinary function syntax is human friendly compared to writing raw categorical definitions, which look roughly like point-free programming (no named variables). In addition, by embedding it as a compiler pass within GHC, he gets to leverage existing GHC optimizations as optimizations for other categories. Interesting alternative categories include the category of graphs, circuits, and automatically differentiable functions. You can find his implementation here

I’ve felt hesitance at using a GHC plugin plus I kind of want to do it in a way I understand, so I’ve done different versions of this using relatively normal Haskell (no template haskell, no core passes, but a butt ton of hackery).

The first used explicit tags. Give them to the function and see where they come out. That is one way to probe a simple tuple rearrangement function.

The second version worked almost entirely at the typelevel. It worked on the observation that a completely polymorphic tuple type signature completely specifies the implementation. You don’t have to look at the term level at all. It unified the polymorphic values in the input with a typelevel number indexed Tag type. Then it searched through the input type tree to find the piece it needed. I did end up passing stuff in to the term level because I could use this mechanism to embed categorical literals. The typeclass hackery I used to achieve this all was heinous.

I realized today another way related to both that is much simpler and fairly direct. It has some pleasing aesthetic properties and some bad ones. The typeclass hackery is much reduced and the whole thing fits on a screen, so I’m pleased.

Here are the basic categorical definitions. FreeCat is useful for inspection in GHCi of what comes out of toCCC.

And here is the basic toCCC implementation

Here is some example usage

What we do is generate a tuple to give to your function. The function is assumed to be polymorphic again but now not necessarily totally polymorphic (this is important because Num typeclass usage will unify variables). Once we hit a leaf of the input tuple, we place the categorical morphism that would extract that piece from the input. For example for the input type (a,(b,c)) we pass it the value (fstC ,(fstC . sndC, sndC . sndC )). Detecting when we are at a leaf again requires somehow detecting a polymorphic location, which is a weird thing to do. We use the Incoherent IsTup instance from last time to do this. It is close to being safe, because we immediately unify the polymorphic variable with a categorial type, so if something goes awry, a type error should result. We could make it more ironclad by unifying immediately to something that contains the extractor and a user inaccessible type.

We apply the function to this input. Now the output is a tuple tree of morphisms. We recursively traverse down this tree with a fanC for every tuple. This all barely requires any typelevel hackery. The typelevel stuff that is there is just so that I can traverse down tuple trees basically.

One benefit is that we can now use some ordinary typeclasses. We can make a simple implementation of Num for (k z a) like how we would make it for (z -> a). This let’s us use the regular (+) and (*)  operators for example.

What is not good is the performance. As it is, the implementation takes many global duplication of the input to create all the fanCs. In many categories, this is very wasteful.This may be a fixable problem, either via passing in more sophisticated objects that just the bare extraction morphisms to to input (CPS-ified? Path Lists?) or via the GHC rewrite rules mechanism. I have started to attempt that, but have not been successful in getting any of my rewrite rules to fire yet, because I have no idea what I’m doing. If anyone could give me some advice, I’d be much obliged. You can check that out here. For more on rewrite rules, check out the GHC user manual and this excellent tutorial by Mark Karpov here.

Another possibility is to convert to FreeCat, write regular Haskell function optimization passes over the FreeCat AST and then interpret it. This adds interpretation overhead, which may or may not be acceptable depending on your use case. It would probably not be appropriate for automatically differentiable functions, but may be for outputting circuits or graphs.

Another problem is dealing with boolean operations. The booleans operators and comparison operators are not sufficiently polymorphic in the Prelude. We could define new operators that work as drop in replacements in the original context, but I don’t really have the ability to overload the originals. It is tough because if we do things like this, it feels like we’re really kind of building a DSL more than we are compiling to categories. We need to write our functions with the DSL in mind and can’t just import and use some function that had no idea about the compiling to categories stuff.

I should probably just be using Conal’s concat project. This is all a little silly.

Bouncing a Ball with Mixed Integer Programming

Just gonna dump this draft out there since I’ve moved on (I’ll edit this if I come back to it). You can embed collisions in mixed integer programming.  I did it below using a strong acceleration force that turns on when you enter the floor. What this corresponds to is a piecewise linear potential barrier.

Such a formulation might be interesting for the trajectory optimization of shooting a hoop, playing Pachinko, Beer Pong, or Pinball.

More things to consider:

Is this method trash? Yes. You can actually embed the mirror law of collisions directly without needing to using a funky barrier potential.

You can extend this to ball trapped in polygon, or a ball that is restricted from entering obstacle polygons. Check out the IRIS project – break up region into convex regions

https://github.com/rdeits/ConditionalJuMP.jl Gives good support for embedding conditional variables.

https://github.com/joehuchette/PiecewiseLinearOpt.jl On a related note, gives a good way of defining piecewise linear functions using Mixed Integer programming.

Pajarito is another interesting Julia project. A mixed integer convex programming solver.

Russ Tedrake papers – http://groups.csail.mit.edu/locomotion/pubs.shtml

Break up obstacle objects into delauney triangulated things.

www.mit.edu/~jvielma/presentations/MINLPREPSOLJUL_NORTHE18.pdf

A Simple Interior Point Linear Programming Solver in Python

This solver is probably not useful for anything. For almost all purposes, let me point you to cvxpy.

If you want an open source solver CBC/CLP and GLPK and OSQP are good.

If you want proprietary, you can get a variable number constrained trial license to Gurobi for free.

Having said that, here we go.

The simplex method gets more press, and certainly has it’s advantages, but the interior point method makes much more sense to me. What follows is the basic implementation described in Stephen Boyd’s course and book http://web.stanford.edu/~boyd/cvxbook/

In the basic interior point method, you can achieve your inequality constraints $\phi(x) \ge 0$ by using a logarithmic potential to punish getting close to them $-\gamma \ln (\phi(x))$ where $\gamma$ is a parameter we’ll talk about in a bit.  From my perspective, the logarithm is a somewhat arbitrary choice. I believe some properties of the logarithmic potential is necessary for some convergence guarantees.

The basic unconstrained newton step takes a locally quadratic approximation to the function you’re trying to optimize and finds the minimum of that. This basically comes down to taking a step that is the inverse hessian applied to the gradient.

$\min_{dx} f(x_0+dx) \approx f(x_0) + \nabla f(x_0)dx + \frac{1}{2} dx^T H dx$

$(H)_{ij} = \partial_{ij}f(x_0)$

$\nabla f(x_0) +H dx = 0 \rightarrow dx =- H^{-1}\nabla f$

We can maintain a linear constraint on the variable x during this newton step. Instead of setting the gradient to zero, we set it so that it is perpendicular to the constraint plane using the Lagrange multiplier procedure.

$\nabla f(x_0) +H dx = -A^T \lambda \rightarrow Hdx + A^T \lambda = - \nabla f$

$A(x_0 + dx) = b$

This is a block linear system

$\begin{bmatrix} H & A^T \\ A & 0 \\ \end{bmatrix} \begin{bmatrix} dx \\ \lambda \end{bmatrix} = \begin{bmatrix} -\nabla f \\ b - Ax_0 \end{bmatrix}$

Despite the logarithm potential, there is no guarantee that the newton step would not take us outside the allowed region. This is why we need a line search on top of the newton step. We scale the newton dx to $\alpha dx$. Because the function we’re optimizing is convex and the region we’re in is convex, there is some step length in that newton direction that will work. So if we keep decreasing the overall step size, we’ll eventually find something acceptable.

As part of the interior point method, once it has converged we decrease the parameter $\gamma$ applied to the logarithm potential. This will allow the inequality constraints to satisfied tighter and tighter with smaller gamma.

The standard form of an LP is

$\min c^T x$

$A x = b$

$x \ge 0$

This doesn’t feel like the form you’d want. One way you can construct this is by adding slack variables and splitting regular variables into a positive and negative piece

$x = x_+ - x_-$

$Ax \ge b \rightarrow Ax +s = b, s \ge 0$

The interior point formulation of this is

$\min c^T x- \gamma \sum_i \ln(x_i)$

$Ax = b$

The Hessian and gradient are quite simple here

$\nabla f = -\frac{\gamma}{x_i}$

$(H)_{ij} = \delta_{ij} \frac{\gamma}{x_i^2}$

The optimum conditions for this are

$\nabla (c^T x - \gamma \ln(x))= c - \gamma \frac{1}{x} = 0$

$Ax=b$

Now in the above, I’m not sure I got all the signs right, but I did implement it in python. The result seems to be correct and does work. I haven’t tested extensively, YMMV. It’s a useful starting point.

Musings:

I wanted to build this because I’ve been getting really into mixed integer programming and have been wondering how much getting deep in the guts of the solver might help. Given my domain knowledge of the problems at hand, I have probably quite good heuristics. In addition, I’ve been curious about a paper that has pointed out an interesting relatively unexploited territory, combining machine learning with mixed integer programming https://arxiv.org/pdf/1811.06128

For these purposes, I want a really simple optimization solver.

But this is silly. I should use CLP or OSQP as a black box if I really want to worry about the mixed integer aspect.

MIOSQP is interesting.

It is interesting how the different domains of discrete optimization and search seem to have relatively similar sets of methods. Maybe I’m crazy. Maybe at the loose level I’m gonna talk almost anything is like almost anything else.

Clause learning and Cutting plane addition feel rather similar.

Relaxation to LP and unit propagation are somewhat similar. Or is unit propagation like elimination?

Mixed integer programs build their own heuristics.

Fourier Motzkin and resolution are similar methods. In Fourier motzkin, you eliminate variables in linear inequalities by using algebra to bring that variable by itself on one side of the inequality and then matching up all the <= to all the uses of >= of that variable. There are packages that compute these things. See CDD or Polyhedra.jl

Resolution takes boolean formula. You can eliminate a variable q from a CNF formula by taking all the negated instances $\not q$ and combining them with all positive instances.