## Computational Category Theory in Python I: Dictionaries for FinSet

Category theory is a mathematical theory with reputation for being very abstract.

Category theory is an algebraic theory of functions. It has the flavor of connecting up little pipes and ports that is reminiscent of dataflow languages or circuits, but with some hearty mathematical underpinnings.

So is this really applicable to programming at all? Yes, I think so.

Here’s one argument. Libraries present an interface to their users. One of the measures of the goodness or badness of an interface is how often you are inclined to peek under the hood to get it to do the thing that you need. Designing these interfaces is hard. Category theory has taken off as a field because it has been found to be a useful and uniform interface to a surprising variety of very different mathematics. I submit that it is at least plausible that software interfaces designed with tasteful mimicry of category theory may achieve similar uniformity across disparate software domains. This is epitomized for me in Conal Elliott’s Compiling to Categories. http://conal.net/papers/compiling-to-categories/

I think it is easy to have the miscomprehension that a fancy language like Haskell or Agda is necessary to even begin writing software that encapsulates category theory based ideas, but this is simply not the case. I’ve been under this misapprehension before.

It just so happens that category theory is especially useful in those languages for explaining some programming patterns especially those concerning polymorphism. See Bartosz Milewski’s Category theory for Programmers.

But this is not the only way to use category theory.

There’s a really delightful book by Rydeheard and Burstall called Computational Category Theory. The first time I looked at it, I couldn’t make heads or tails of it, going on the double uphill battle of category theory and Standard ML. But looking at it now, it seems extremely straightforward and well presented. It’s a cookbook of how to build category theoretic interfaces for software.

So I think it is interesting to perform some translation of its concepts and style into python, the lingua franca of computing today.

In particular, there is a dual opportunity to both build a unified interface between some of the most commonly used powerful libraries in the python ecosystem and also use these implementations to help explain categorical concepts in concrete detail. I hope to have the attention span to do this following:

A very simple category is that of finite sets. The objects in the category can be represented by python sets. The morphisms can be represented by python dictionaries. Nothing abstract here. We can rip and tear these things apart any which way we please.

The manipulations are made even more pleasant by the python features of set and dictionary comprehension which will mimic the definitions you’ll find on the wikipedia page for these constructions quite nicely.

Composition is defined as making a new dictionary by feeding the output of the first dictionary into the second. The identity dictionary over a set is one that has the same values as keys. The definition of products and coproducts (disjoint union) are probably not too surprising.

One really interesting thing about the Rydeheard and Burstall presentation is noticing what are the inputs to these constructions and what are the outputs. Do you need to hand it objects? morphisms? How many? How can we represent the universal property? We do so by outputting functions that construct the required universal morphisms. They describe this is a kind of skolemization . The constructive programmatic presentation of the things is incredibly helpful to my understanding, and I hope it is to yours as well.

Here is a python class for FinSet. I’ve implemented a couple of interesting constructions, such as pullbacks and detecting monomorphisms and epimorphisms.

I’m launching you into the a deep end here if you have never seen category theory before (although goddamn does it get deeper). Do not be surprised if this doesn’t make that much sense. Try reading Rydeheard and Burstall chapter 3 and 4 first or other resources.

Here’s some fun exercises (Ok. Truth time. It’s because I got lazy). Try to implement exponential and pushout for this category.

## Rough Ideas on Categorical Combinators for Model Checking Petri Nets using Cvxpy

Petri nets are a framework for modeling dynamical systems that is very intuitive to some people. The vanilla version is that there are discrete tokens at nodes on a graph representing resources of some kind and tokens can be combined according to the firing of transition rules into new tokens in some other location.

This is a natural generalization of chemical reaction kinetics, where tokens are particular kinds of atoms that need to come together. It also is a useful notion for computer systems, where tokens represent some computational resource.

To me, this becomes rather similar to a flow problem or circuit problem. Tokens feel a bit like charge transitions are a bit like current (although not necessarily conservative). In a circuit, one can have such a small current that the particulate nature of electric current in terms of electrons is important. This happens for shot noise or for coulomb blockade for example.

If the number of tokens is very large, it seems intuitively sensible to me that one can well approximate the behavior by relaxing to a continuum. Circuits have discrete electrons and yet are very well approximated by ohm’s laws and the like. Populations are made of individuals, and yet in the thousands their dynamics are well described by differential equations.

It seems to me that mixed integer programming is a natural fit for this problem. Mixed integer programming has had its implementations and theory heavily refined for over 70 years so now very general purpose and performant off the shelf solvers are available. The way mixed integer programs are solved is by considering their quickly solved continuous relaxation (allowing fractional tokens and fractional transitions more akin to continuous electrical circuit flow) and using this information to systematically inform a discrete search process. This seems to me a reasonable starting approximation. Another name for petri nets is Vector Addition Systems, which has more of the matrix-y flavor.

We can encode a bounded model checking for reachability of a petri net into a mixed integer program fairly easily. We use 2-index variables, the first of which will be used for time step. We again turn to the general purpose functional way of encoding pointful dsls into pointfree ones as I have done here and here. The key point is that you need to be careful where you generate fresh variables. This is basically copy catting my posts here. http://www.philipzucker.com/categorical-combinators-for-graphviz-in-python/ http://www.philipzucker.com/a-sketch-of-categorical-relation-algebra-combinators-in-z3py/

I’m like 70% sure what I did here makes sense, but I’m pretty sure the general idea makes sense with some fiddling.

The big piece is the weighted_block function. It let’s you build a combinator with an internal state and input and output flow variables. You give matrices with entries for every possible transition. Whether transitions occurred between $t$ and $t+1$ is indicated by integer variables. There is also possible accumulation of tokens at nodes, which also requires integer variables. Perhaps we’d want to expose the token state of the nodes to the outside too?

We can also get out a graphical representation of the net by reinterpreting our program into GraphCat. This is part of the power of the categorical interface. http://www.philipzucker.com/categorical-combinators-for-graphviz-in-python/

When I talked to Zach about this, he seemed skeptical that MIP solvers wouldn’t eat die a horrible death if you threw a moderately large petri net into them. Hard to say without trying.

#### Thoughts

There is an interesting analogy to be found with quantum field theory in that if you lift up to considering distributions of tokens, it looks like an occupation number representation. See Baez. http://math.ucr.edu/home/baez/stoch_stable.pdf

If you relax the integer constraint and the positivity constraints, this really is a finite difference formulation for capacitor circuits. The internal states would then be the charge of the capacitor. Would the positivity constraint be useful for diodes?

I wonder how relevant the chunky nature of petri nets might be for considering superconducting circuits, which have quantization of quantities from quantum mechanical effects.

Cvxpy let’s you describe convex regions. We can use this to implement a the convex subcategory of Rel which you can ask reachability questions. Relational division won’t work probably. I wonder if there is something fun there with respect the the integerizing operation and galois connections.

Edit: I should have googled a bit first. https://www.sciencedirect.com/science/article/pii/S0377221705009240 mathemtical programming tecchniques for petri net reachability. So it has been tried, and it sounds like the results weren’t insanely bad.

## Categorical Combinators for Graphviz in Python

Graphviz is a graph visualization tool https://www.graphviz.org/. In Conal Elliott’s Compiling to categories http://conal.net/papers/compiling-to-categories/, compiling code to its corresponding graphviz representation was one very compelling example. These graphs are very similar to the corresponding string diagram of the monoidal category expression, but they also look like boolean circuit diagrams. Besides in Conal Elliott’s Haskell implementation, there is an implementation in the Julia Catlab.jl project https://epatters.github.io/Catlab.jl/stable/

I’ve basically implemented a toy version of a similar thing in python. It lets you do things like this

Why python?

• Python is the lingua franca of computing these days. Many people encounter it, even people whose main thing isn’t computers.
• The python ecosystem is nutso good.
• Julia is kind of an uphill battle for me. I’m fighting the battle ( https://github.com/philzook58/Rel.jl ) , but I already know python pretty well. I can rip this out and move on.

What I did was implement some wrappers around the graphviz python package. It exposes a not very feature rich stateful interface. It is pretty nice that it prints the graphs inline in jupyter notebooks though.

The code is written in a style very similar (and hopefully overloadable with) to that of z3py relation algebra. http://www.philipzucker.com/a-sketch-of-categorical-relation-algebra-combinators-in-z3py/ . There is a fairly general cookbook method here for categorifying dsl. Since graphviz does not directly expose fresh node creation as far as I can tell, I made my own using a random number generator. The actual combinators are graphviz object processors, so we build up a giant functional chain of processors and then actually execute it with run. The inputs and outputs are represented by lists of node names. The is some design space to consider here.

I also had to use class based wrappers Based on the precedent set by the python 3 matrix multiplication operator @, I think it is a requirement that this also be used for category composition. id is a keyword or something in python unfortunately. For monoidal product, I feel like overloading power ** looks nice even if it is a nonsensical analogy, * is also not too bad. I went with * for now.

The graphviz graphs aren’t quite string diagrams. They make no promise to preserve the ordering of your operations, but they seem to tend to.

Here’s some example usage

For example something like this ought to work. Then you can get the graph of some matrix computation to go along with it’s numpy implementation

Maybe outputting tikz is promising? https://github.com/negrinho/sane_tikz

## Stupid is as Stupid Does: Floating Point in Z3Py

Floating points are nice and all. You can get pretty far pretending they are actually numbers. But they don’t obey some mathematical properties that feel pretty obvious. A classic to glance through is “What Every Computer Scientist Should Know About Floating-Point Arithmetic” https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

We can check some properties with z3py. Here are a couple simple properties that succeed for mathematical integers and reals, but fail for floating point

I recently saw on twitter a reference to a Sylvie Boldo paper https://hal.archives-ouvertes.fr/hal-01148409/ “Stupid is as Stupid Does: Taking the Square Root of the Square of a Floating-Point Number”.

In it, she uses FlocQ and Coq to prove a somewhat surprising result that the naive formula $\sqrt{x^2} = |x|$ actually is correct for the right rounding mode of floating point, something I wouldn’t have guessed.

Z3 confirms for Float16. I can’t get Float32 to come back after even a day on a fairly beefy computer. If I use FPSort(ebits,sbits) rather than a standard size, it just comes back unknown, so i can’t really see where the cutoff size is. This does not bode well for checking properties of floating point in z3 in general. I think a brute force for loop check of 32 bit float properties is feasible. I might even be pretty fast. To some degree, if z3 is taking forever to find a counterexample, I wonder to what to degree the property is probably true.

If anyone has suggestions, I’m all ears.

## A Sketch of Categorical Relation Algebra Combinators in Z3Py

I’ve discussed relation algebra before. Relations are sets of tuples. There, I implemented the relations naively using lists for sets. This is very simple, and very clean especially with list comprehension syntax. It is however horrifically inefficient, and we could only deal with finitely enumerable domains. The easiest path to fixing these problems is to cash out to an external solver, in this case z3.

There are many beautifully implemented solvers out there and equally beautiful DSL/modeling languages. Examples in mind include sympy, cvxpy, and z3. These modeling languages require you to instantiate variable objects and build expressions out of them and then hand it off to the solver. This is a reasonable interface, but there are advantages to a more categorical/point-free style DSL.

Point-free languages are ones that do not include binding forms that introduce bound/dummy variables. Examples of binding forms like this are $\lambda \sum \max \min \int \sup \lim \forall \exists$. One problem lies in the fact that the names of bound variables don’t matter, and that they end up accidentally smashing into each other. You may have experienced this in physics or math class as the dummy indices or dummy variable problem causing you to screw up your calculation of some cross product identity or some complicated tensor sum. These are surprisingly subtle problems, very difficult to diagnose and get right. de Bruijn indices is a technique for giving the bound variables canonical names, but it sucks to implement in its own way. When you make a DSL point free, it is a joy to manipulate, optimize, and search. I think this may be the core of why category theory is good language for mathematics and programming.

Point-free style also tends to have significant economy of size, for better or worse. Sometimes it is better to have an expression very dense in information. This is important if you are about the algebraically manipulate an expression with paper and pencil. Every manipulation requires a great deal of mind numbing copying as you proceed line by line, so it can be excruciating if your notation has a lot of unnecessary bulk.

Relations are like functions. The two pieces of the tuple can be roughly thought of as the “input” and the “output”. Relations are only loosely directional though. Part of the point of relations is that the converse (inverse) of a relation is easy to define.

When we are implement relations, we have a choice. Do we want the relation to produce its variables, accept its variable, or accept one and produce the other? There are advantages to each. When relations were [(a,b)], a -> b -> Bool, and a -> [b] converting between these forms was a rather painful enumeration process. The sting of converting between them is taken out by the fact that the conversion is no longer a very computationally expensive process, since we’re working at the modeling layer.

When you’re converting a pointful DSL to pointfree DSL, you have to be careful where you instantiate fresh variables or else you’ll end up with secret relations that you didn’t intend. Every instantiation of id needs to be using fresh variables for example. You don’t want the different id talking to each other. Sometimes achieving this involves a little currying and/or thunking.

There is a pattern that I have notice when I’m using modeling languages. When you have a function or relation on variables, there are constraints produced that you have to keep a record of. The pythonic way is to have a Model or Solver object, and then have that objects mutate an internal record of the set of constraints. I don’t particularly enjoy this style though. It feels like too much boiler plate.

In Haskell, I would use something like a Writer monad to automatically record the constraints that are occurring. Monads are not really all that pleasant even in Haskell, and especially a no go in python without “do” syntax.

However, because we are going point free it is no extra cost at all to include this pipework along for the ride in the composition operation.

Here are implementations of the identity and composition for three different styles. Style 1 is fully receptive, style 2 is mixed (function feeling) and style 3 is fully productive of variables.

Fair warning, I’m being sketchy here. I haven’t really tried this stuff out.

z3 is a simply typed language. You can get away with some polymorphism at the python level (for example the == dispatches correctly accord to the object) but sometimes you need to manually specify the sort of the variables. Given these types, the different styles are interconvertible

We can create the general cadre of relation algebra operators. Here is a somewhat incomplete list

Questions about relation algebra expressions are often phrased in term of relational inclusion. You can construct a relation algebra expression, use the rsub in the appropriate way and ask z3’s prove function if it is true.

Z3 has solvers for

• Combinatorial Relations
• Linear Relations
• Polyhedral Relations
• Polynomial Relations
• Interval Relations – A point I was confused on. I thought interval relations were not interesting. But I was interpetting the term incorrectly. I was thinking of relations on AxB that are constrained to take the form of a product of intervals. In this case, the choice of A has no effect on the possible B whatsoever, so it feels non relational. However, there is also I_A x I_B , relations over the intervals of A and B. This is much closer to what is actually being discussed in interval arithmetic.

Applications we can use this for:

• Graph Problems. The Edges can be thought of as a relation between vertices. Relation composition Using the starn operator is a way to ask for paths through the graph.
• Linear Relations – To some degree this might supplant my efforts on linear relations. http://www.philipzucker.com/linear-relation-algebra-of-circuits-with-hmatrix/ Z3 is fully capable of understanding linear relations.
• Safety and liveness of control systems. Again. a transition relation is natural here. It is conceivable that the state space can be heterogenous in time, which is the interesting power of the categorical style. I feel like traditional control systems usually maintain the same state space throughout.
• Program verification
• Games? Nash equilibria?

### Other Thoughts

• Maybe we are just building a shitty version of alloy. https://alloytools.org/
• What about uninterpeted relations? What about higher order relations? What about reflecting into a z3 ADT for a relational language. Then we could do relational program synthesis. This is one style, just hand everything off to smt. https://github.com/nadia-polikarpova/cse291-program-synthesis/tree/master/lectures
• I should try to comply with python conventions, in particular numpy and pandas conventions. @ should be composition for example, since relation composition has a lot of flavor of matrix composition. I should overload a lot of operators, but then I’d need to wrap in a class 🙁
• Z3 has special support for some relations. How does that play in? https://theory.stanford.edu/~nikolaj/programmingz3.html#sec-special-relations https://z3prover.github.io/api/html/ml/Z3.Relation.html
• As long as you only use composition, there is a chaining of existentials, which really isn’t so bad.
• What we’ve done here is basically analogous/identical to what John Wiegley did compiling to the category of z3. Slightly different in that he only allowed for existential composition rather than relational division. http://newartisans.com/2017/04/haskell-and-z3/
• We can reduced the burden on z3 if we know the constructive proof objects that witness our various operations. Z3 is gonna do better if we can tell it exactly which y witness the composition of operators, or clues to which branch of an Or it should use.
• It’s a bummer, but when you use quantifiers, you don’t see countermodels? Maybe there is some hook where you can, or in the dump of the proof object.
• What about recursion schemes? The exact nature of z3 to handle unbounded problems is fuzzy to me. It does have the support to define recursive functions. Also explicit induction predicates can go through sometimes. Maybe look at the Cata I made in fancy relaion algebra post
• I think most proof assistants have implementations of relation algebra available. I find you can do a surprising amount in z3.

## Stupid Z3Py Tricks: Verifying Sorting Networks off of Wikipedia

Sorting networks are a circuit flavored take on sorting. Although you can build circuits for any size input, any particular circuit works for a fixed sized input. They are like an unrolling of the loops or recursion of more familiar sorting algorithms. They come up also in the context of parallel and gpu sorting

Here’s an interesting thing. We can go to Wikipedia and get a little python snippet for the comparison order of a Batcher odd-even mergesort. Kind of a confusing algorithm. Why does it even work? Is it even right? It’s written in some kind of funky, indexful generator style.

Well we can confirm this relatively straightforwardly using z3 by replacing the implementation of compare_and_swap with its z3 equivalent. We then ask z3 .

This comes back unsat, hence there are no inputs or executions that do not come back sorted. If I delete some elements from pair_to_compare, it comes back sat, showing that it doesn’t always sort.

The trick here is that the circuit is fixed size, so we have no need for induction, one of the main things z3 is rather finicky at.

It’s somewhat interesting to note that the output of odd_even_merge is a sequence of instructions, we can think of this as interpreting a very small 1 instruction programming language.

We can also confirm similarly a simple odd-even bubblesort and other similar algorithms.

Q: What about using uninterpreted sorts rather than integers? Integers is pretty convincing to me.

same_elems is slightly weaker than a permutation predicate. Wasn’t super obvious to me the best way to do a permutation predicate in z3. Would I want to internalize the array?

Edit: Upon further thought, actually the sort IS a nice predicate for permutation. How do we compute if two things are permutations of each other? By sorting them and forcing a zipped equality. Alternatively count the number of each element (a piece of bucket sort). Since this sort is done by composing swaps, it is somewhat intrinsically a permutation

As a bummer though, I think randomized testing on arrays would be equally or perhaps more convincing of the correctness of the algorithm. Oh well.

## Programming and Interactive Proving With Z3Py

I’ve been fiddling with z3py, figuring out some functionality and realizing some interesting things you could do with it. I think I’m at a point where it is nice to checkpoint myself with a blog post.

I’m a little surprised z3py doesn’t overload the & and | operators and some kind of implies operator for BoolRef. You can insert them later using this.

#### Functional Programming

Python is not the best functional programming environment imo. And by functional programming I implicitly mean roughly ML-like FP a la Haskell or OCaml. I don’t venture much into lisp land.

The lack of good algebraic datatypes (the class syntax is so ungainly) and a type system hurts. The lack of pattern matching hurts. The lambda keyword is so long it makes me sad.

But you have full access to z3 from the python bindings. Z3 does have algebraic data types, and a type system. It has built in substitution mechanisms and evaluation. And it has insane search procedures and the ability to prove things. Pretty damn cool!

Unfortunately the type system is rather simplistic, being basically simply typed rather than polymorphic or something else. But using python a a schema/macro system for z3 seems a plausible way forward.

To build templated types, you can have constructor functions in python for the appropriate types.

You can access the constructors from the returned types. Check this out. You get detector functions is_Nothing and is_Just , the extractor function fromJust and constructor functions Nothing and Just. I do a lot of dir exploration with z3py. It’s hard to know what’s available sometimes

It’s possible to build a general purpose match combinator on these types since you can introspect the number of constructors of the ADT using num_constructors, constructor, recognizer, and accessor. There might be a match inside z3py somewhere? I think it’s part of the SMTLIB standard now.

Example usage:

Z3 has a substitution mechanism built in. This is useful for instantiating ForAll and for evaluating Lambda. The substitute_vars function is what you want like so substitute_vars(f.body(), x, y, z)

It is possible to reflect the syntax in a fairly straightforward way back into python via a lambdify function, mimicking the equivalent very useful function from sympy. Lambdify is basically an interp function. Here is a start for such a function. I by no means have implemented interpretation of the entirety of z3. Also I feel like this implementation is very clunky. Some kind of CPS?

There is the ability to define recursive functions in z3. It is also plausible to define them via. In this way you can get a reversible functional programming language, maybe some subset of mercury / curry’s power.

### Interactive Theorem Proving

Z3 is awesome at thoerem proving. But somethings it just doesn’t handle right and needs human guidance.

Through searching, there are a couple interesting python interactive theorem prover projects. Cody pointed me to a project he worked on a while back, Boole https://github.com/avigad/boole . It has a dependently typed lambda calculus in it with the purpose of gluing together many systems, I think. He implemented a lot of stuff from scratch. I think I want to try to get less and do less. There is also holpy https://arxiv.org/abs/1905.05970 which appears to be being actively developed. It’s roughly a translation of hol to python I think. It’s available from a strange chinese github on the author’s website if you go looking for it.

This suggests an interesting approach. Most interactive theorem provers start unautomated and add it later. Instead we can iteratively build an interface to de-automate z3.

Altogether, this approach is more HOL flavored than Coq/Agda flavored. z3 terms are our logic and python is our manipulation metal language. Ideally, one would want to verify that every.

Python is so unprincipled that I can’t imagine that you could ever build a system up to the trustworthiness of the other theorem provers. But this is freeing in many ways. Since that is off the table, we can just do the best we can.

Using the z3 syntax tree and the z3 proof automation and z3 substitution mechanisms gives us a HUGE step up from implementing them from scratch. Ideally, we’d want to write as little python as possible, and especially as little python as possible that has to be trusted to be implemented correctly.

One big concern is accidental mutation of the proof under our feet by python. Perhaps using hashes and checking them might be a way to at least detect this. I need to have a good think about how to factor out a trusted core from all possible tactics.

I think it helps a little that z3 often will be able to verify the equivalence of small steps in proofs even if it can’t do the entire proof itself.

I think induction principles will need to be injected by hand. Z3 doesn’t really have that built in. There are definitely situations that after you introduce the induction, z3 can slam all the cases no problem. For example, check this out.

Another thing that might be nice is integration/translation to sympy. Sympy has a ton of useful functionality, at the very least differentiation.

Translation and integration with cvxpy for sum of squares proofs would also be quite neat. I already did something with this using sympy. I’m not super sure how you extract exact proofs from the floating point solutions SCS returns? I think there is a thing. I’ve heard the LLL algorithm can be used for this somehow (finding likely algebraic number matches to floating point numbers)?

So here are some sketched out ideas for tactics.

Another question is how to implement an apply tactic gracefully. Fully deconstructing syntax trees and unifying ourselves is not utilizing z3 well. If you have a good idea how to get unification out of z3, I’d be interested to hear from you here. https://stackoverflow.com/questions/59398955/getting-z3-instantiations-of-quantified-variables/59400838#59400838

Here’s an idea though. In the cold light of day, I am still not sure this reasoning makes much sense. Suppose we’re trying to apply forall x. a(x) -> b(x) to a c(y). If forall x. b(x) -> c(y) we’re good and by assumption that is obvious for some reason, like the syntactic instantiation of b gives c. We can ask z3 to prove that and it will hopefully easy. If we can prove forall x. a(x) in the current context, that would be sufficient, but not true typically. It is an overly difficult request. We really only need to prove a(x) for values pertinent to the proof of c(y). Here’s a suspicious strategem. Any a -> b can be weakened to (q -> a) -> (q -> b). In particular we can choose to weaken forall x. a(x) -> b(x) to forall x. ((c(y) -> b(x)) -> a(x)) -> ((c(y) -> b(x)) -> b(x)). Then we can replace the goal with forall x. ((c(y) -> b(x)) -> a(x)) after we prove that (forall x. (c(y) -> b(x)) -> b(x)) -> c(y). Maybe c(y) -> b(x) is sufficient to restrict the values of x? Not sure.

Another rough sketch of induction on Nat. Not right yet.

We could also make a simple induction for ADTs based on the similar introspection we used for match above. It’s ugly but I think it works.

I haven’t really though much about tacticals yet.

## Gröbner Bases and Optics

Geometrical optics is a pretty interesting topic. It really is almost pure geometry/math rather than physics.

Huygens principle says that we can compute the propagation of a wave by considering the wavelets produced by each point on the wavefront separately.

In physical optics, this corresponds to the linear superposition of the waves produced at each point by a propagator function $\int dx' G(x,x')$. In geometrical optics, which was Huygens original intent I think (these old school guys were VERY geometrical), this is the curious operation of taking the geometrical envelope of the little waves produced by each point.

https://en.wikipedia.org/wiki/Envelope_(mathematics) The envelope is an operation on a family of curves. You can approximate it by a finite difference procedure. Take two subsequent curves close together in the family, find their intersection. Keep doing that and the join up all the intersections. This is roughly the approach I took in this post http://www.philipzucker.com/elm-eikonal-sol-lewitt/

You can describe a geometrical wavefront implicitly with an equations $\phi(x,y) = 0$. Maybe the wavefront is a circle, or a line, or some wacky shape.

The wavelet produced by the point x,y after a time t is described implicitly by $d(\vec{x},\vec{x'})^2 - t^2 = (x-x')^2 + (y-y')^2 - t^2 = 0$.

This described a family of curves, the circles produced by the different points of the original wavefront. If you take the envelope of this family you get the new wavefront at time t.

How do we do this? Grobner bases is one way if we make $\phi$ a polynomial equation. For today’s purposes, Grobner bases are a method for solving multivariate polynomial equations. Kind of surprising that such a thing even exists. It’s actually a guaranteed terminating algorithm with horrific asymptotic complexity. Sympy has an implementation. For more on Grobner bases, the links here are useful http://www.philipzucker.com/dump-of-nonlinear-algebra-algebraic-geometry-notes-good-links-though/. Especially check out the Cox Little O’Shea books

The algorithm churns on a set of multivariate polynomials and spits out a new set that is equivalent in the sense that the new set is equal to zero if and only if the original set was. However, now (if you ask for the appropriate term ordering) the polynomials are organized in such a way that they have an increasing number of variables in them. So you solve the 1-variable equation (easy), and substitute into the 2 variable equation. Then that is a 1-variable equation, which you solve (easy) and then you substitute into the three variable equation, and so on. It’s analogous to gaussian elimination.

So check this out

The envelope conditions can be found by introducing two new differential variables dx, and dy. They are constrained to lie tangent to the point on the original circle by the differential equation e3, and then we require that the two subsequent members of the curve family intersect by the equation e4. Hence we get the envelope. Ask for the Grobner basis with that variable ordering gives us an implicit equations for x2, y2 with no mention of the rest if we just look at the last equation of the Grobner basis.

I set arbitrarily dy = 1 because the overall scale of them does not matter, only the direction. If you don’t do this, the final equation is scaled homogenously in dy.

This does indeed show the two new wavefronts at radius 1 and radius 3.

Here’s a different one of a parabola using e1 =  y1 – x1 + x1**2

The weird lumpiness here is plot_implicit’s inability to cope, not the actually curve shape Those funky prongs are from a singularity that forms as the wavefront folds over itself.

I tried using a cubic curve x**3 and the grobner basis algorithm seems to crash my computer. 🙁 Perhaps this is unsurprising. https://en.wikipedia.org/wiki/Elliptic_curve ?

I don’t know how to get the wavefront to go in only 1 direction? As is, it is propagating into the past and the future. Would this require inequalities? Sum of squares optimization perhaps?

Edit:

It’s been suggested on reddit that I’d have better luck using other packages, like Macaulay2, MAGMA, or Singular. Good point

Also it was suggested I use the Dixon resultant, for which there is an implementation in sympy, albeit hidden. Something to investigate

https://github.com/sympy/sympy/blob/master/sympy/polys/multivariate_resultants.py

https://nikoleta-v3.github.io/blog/2018/06/05/resultant-theory.html

Another interesting angle might be to try to go numerical with a homotopy continuation method with phcpy

https://www.semion.io/doc/solving-polynomial-systems-with-phcpy

## Flappy Bird as a Mixed Integer Program

Mixed Integer Programming is a methodology where you can specify convex (usually linear) optimization problems that include integer/boolean variables.

Flappy Bird is a game about a bird avoiding pipes.

We can use mixed integer programming to make a controller for Flappy Bird. Feel free to put this as a real-world application in your grant proposals, people.

While thinking about writing a MIP for controlling a lunar lander game, I realized how amenable to mixed integer modeling flappy bird is. Ben and I put together the demo on Saturday. You can find his sister blog post here.

The bird is mostly in free fall, on parabolic trajectories. This is a linear dynamic, so it can directly be expressed as a linear constraint. It can discretely flap to give itself an upward impulse. This is a boolean force variable at every time step. Avoiding the ground and sky is a simple linear constraint. The bird has no control over its x motion, so that can be rolled out as concrete values. Because of this, we can check what pipes are relevant at time points in the future and putting the bird in the gap is also a simple linear constraint.

There are several different objectives one might want to consider and weight. Perhaps you want to save the poor birds energy and minimize the sum of all flaps cvx.sum(flap). Or perhaps you want to really be sure it doesn’t hit any pipes by maximizing the minimum distance from any pipe. Or perhaps minimize the absolute value of the y velocity, which is a reasonable heuristic for staying in control. All are expressible as linear constraints. Perhaps you might want a weighted combo of these. All things to fiddle with.

There is a pygame flappy bird clone which made this all the much more slick. It is well written and easy to understand and modify. Actually figuring out the appropriate bounding boxes for pipe avoidance was finicky. Figuring out the right combo of bird size and pipe size is hard, combined with computer graphics and their goddamn upside down coordinate system.

We run our solver in a model predictive control configuration. Model predictive control is where you roll out a trajectory as an optimization problem and resolve it at every action step. This turns an open loop trajectory solve into a closed loop control, at the expense of needing to solve a perhaps very complicated problem in real time. This is not always feasible.

My favorite mip modeling tool is cvxpy. It gives you vectorized constraints and slicing, which I love. More tools should aspire to achieve numpy-like interfaces. I’ve got lots of other blog posts using this package which you can find in my big post list the side-bar 👀.

The github repo for the entire code is here: https://github.com/philzook58/FlapPyBird-MPC

And here’s the guts of the controller:

I think it is largely self explanatory but here are some notes. The dt = t//10 + 1 thing is about decreasing our time resolution the further out from the current time step. This increases the time horizon without the extra computation cost. Intuitively modeling accuracy further out in time should matter less. The last_solution stuff is for in case the optimization solver fails for whatever reason, in which case it’ll follow open-loop the last trajectory it got.

### Bits and Bobbles

• We changed the dynamics slightly from the python original to make it easier to model. We found this did not change the feel of the game. The old dynamics were piecewise affine though, so are also modelable using mixed integer programming. http://groups.csail.mit.edu/robotics-center/public_papers/Marcucci18.pdf . Generally check out the papers coming out of the Tedrake group. They are sweet. Total fanboy over here.
• The controller as is is not perfect. It fails eventually, and it probably shouldn’t. A bug? Numerical problems? Bad modeling of the pipe collision? A run tends to get through about a hundred pipes before something gets goofy.
• Since we had access to the source code, we could mimic the dynamics very well. How robust is flappy bird to noise and bad modeling? We could add wind, or inaccurate pipe data.
• Unions of Convex Region. Giving the flappy bird some x position control would change the nature of the problem. We could easily cut up the allowable regions of the bird into rectangles, and represent the total space as a union of convex regions, which is also MIP representable.
• Verification – The finite difference scheme used is crude. It is conceivable for the bird to clip a pipe. Since basically we know the closed form of the trajectories, we could verify that the parabolas do not intersect the regions. For funzies, maybe use sum of squares optimization?
• Realtime MIP. Our solver isn’t quite realtime. Maybe half real time. One might pursue methods to make the mixed integer program faster. This might involve custom branching heuristics, or early stopping. If one can get the solver fast enough, you might run the solver in parallel and only query a new path plan every so often.
• 3d flappy bird? Let the bird rotate? What about a platformer (Mario) or lunar lander? All are pretty interesting piecewise affine systems.
• Is this the best way to do this? Yes and no. Other ways to do this might involve doing some machine learning, or hardcoding a controller that monitors the pipe locations and has some simple feedback. You can find some among the forks of FlapPyBird. I have no doubt that you could write these quickly, fiddle with them and get them to work better and faster than this MIP controller. However, for me there is a difference between could work and should work. You can come up with a thousand bizarre schemes that could work. RL algorithms fall in this camp. But I have every reason to believe the MIP controller should work, which makes it easier to troubleshoot.

## The Classical Coulomb Gas as a Mixed Integer Quadratic Program

The coulomb gas is a model of electrostatics where you take the discreteness of charge into account. That is what makes it hard compared to the continuous charge problem. Previously, I’ve used mixed integer programming to find lowest energy states of the ising model. This is even more obvious application of mixed integer programming to a physics model.

We ordinarily consider electric charge to be a continuum, but it isn’t. It comes in chunks of the electron charge. Historically, people didn’t even know that for quite a while. It is usually a reasonable approximation for most purposes to consider electric charge to be continuous

If you consider a network of capacitors cooled to the the level that there is not enough thermal energy to borrow to get an electron to jump, the charges on the capacitors will be observably discretized. With a sufficiently slow cooling to this state, the charges should arrange themselves into the lowest energy state.

The coulomb gas model also is of interest due to its connections to the XY model, which I’ve taken a stab at with mixed integer programming before. The coulomb gas models the energy of vortices in that model. I think the connection between the models actually requires a statistical or quantum mechanical context though, whereas we’ve been looking at the classical energy minimization.

We can formulate the classical coulomb gas problem very straightforwardly as a mixed integer quadratic program. We can easily include an externally applied field and a charge conservation constraint if we so desire within the framework.

We write this down in python using the cvxpy library, which has a built in free MIQP solver, albeit not a very good one. Commercial solvers are probably quite a bit better.

The results seems reasonable. It makes sense for charge to go in the direction of the electric field. Going to the corners makes sense because then like charges are far apart. So this might be working. Who knows.

### Interesting places to go with this:

Prof Vanderbei shows how you can embed an FFT to enable making statements about both the time and frequency domain while keeping the problem sparse. The low time/memory $N log(N)$ complexity of the FFT is reflected in the spasity of the resulting linear program.

https://vanderbei.princeton.edu/tex/ffOpt/ffOptMPCrev4.pdf

Here’s a sketch about what this might look like. Curiously, looking at the actual number of nonzeros in the problem matrices, there are way too many. I am not sure what is going on. Something is not performing as i expect in the following code

The equivalent dense DFT:

It would be possible to use a frequency domain solution of the interparticle energy rather than the explicit coulomb law form. Hypothetically this might increase the sparsity of the problem.

It seems very possible to me in a similar manner to embed a fast multipole method or barnes-hut approximation within a MIQP. Introducing explicit charge summary variables for blocks would create a sparse version of the interaction matrix. So that’s fun.