Gettin’ that Robot some Tasty Apples: Solving a simple geometrical puzzle in Z3 python

At work there is a monthly puzzler.

“Design a robot that can pick up all 9 apples arranged on a 3 by 3 rectangular grid, and spaced 1m apart. The robot parts they have are faulty. The robot can only turn three times”

I think the intent of the puzzle is that the robot can drive in forward and reverse, but only actually turn 3 times. It’s not very hard to do by hand. I decided to take a crack at this one using Z3 for funzies. Z3 is an SMT solver. It is capable of solving a interesting wide variety of problems.

I interpret this as “find 4 lines that touch all points in the grid, such that each subsequent line intersects.”

It is fairly easy to directly translate this into a Z3 model.

A couple comments:

If we ask z3 to use only 3 lines, it returns unsat. Try to prove that by hand.

However, If the robot is on the projective plane, it is possible with 3 lines. It only needs to drive to infinity and turn twice. All lines intersect exactly once on the projective plane. How convenient.

The problem only seems somewhat difficult to computerize because of the seemingly infinite nature of geometry. If we only consider the lines that touch at least two points, all possible robot paths becomes extremely enumerable. Is there a proof that we only need these lines?

Another interesting approach might be to note that the points are described by the set of equations x*(x-1)*(x-2)=0 and y*(y-1)*(y-2)=0. I think we could then possibly use methods of nonlinear algebra (Groebner bases) to find the lines. Roughly an ideal containment question? Don’t have this one fully thought out yet. I think z3 might be doing something like this behind the scenes.





More Reinforcement Learning with cvxpy

So I spent thanksgiving doing this and playing Zelda. Even though that sounds like a hell of a day, seems a little sad for thanksgiving :(. I should probably make more of an effort to go home next year.

I tried implementing a more traditional q-learning pipeline using cvxpy (rather than the inequality trick of the last time). Couldn’t get it to work as well. And it’s still kind of slow despite a lot of rearrangement to vectorize operations (through batching basically).

I guess I’m still entranced with the idea of avoiding neural networks. In a sense, that is the old boring way of doing things. The Deep RL is the new stuff. Using ordinary function approximators is way older I think. But I feel like it takes a problem out of the equation (dealing with training neural nets). Also I like modeling languages/libraries.

I kept finding show stopping bugs throughout the day (incorrectly written maxaction functions, typos, cross episode data points, etc.), so I wouldn’t be surprised if there is one still in here. It’s very surprising how one can convince oneself that it is kind of working when it is actually impossible it’s working. All these environments are so simple, that I suspect I could randomly sample controllers out of a sack for the time I’ve been fiddling with this stuff and find a good one.


I also did the easy cartpole environment using the inequality trick.  Seems to work pretty well.



I also have some Work in Progress on getting full swingup cartpole. Currently is not really working. Seems to kind of be pumping about right? The continuous force control easy cartpole does work though.


Now I feel that a thing that matters quite a bit is what is your choice of action for the next time step. Hypothetically you want a ton of samples here. I now think that using an action that is just slightly perturbed from the actual action works well because the actual action is tending to become roughly the optimal one. Subsequent time steps have roughly the same data in them.

One advantage of discrete action space is that you can really search it all.

Does that mean I should seriously investigate the sum of squares form? A semidefinite variable per data point sounds bad. I feel like I’d have to seriously limit the amount of data I’m using. Maybe I’ll be pleasantly surprised.

I haven’t even gotten to playing with different polynomials yet. The current implementation is exponentially sized in the number of variables. But in kind of a silly way. I think it would be better to use all terms of a bounded total degree.


Q-Learning with Linear Programming (cvxpy, OpenAI Gym Pendulum)

There is a fun idea of using Linear Programming to do dynamic programming I originally saw in the underactuated robotics textbook.

In my experience reinforcement learning is finicky and depressing. It usually doesn’t work and is very hard to troubleshoot. Do you just need to run it for 10 minutes? 10 years? Is there a bug? God knows. I end up wriggling hyperparameters and praying a lot.

One part of this is the relative finickiness of neural network optimization compared to the technology of convex optimization. Convex optimization solvers are quite reliable and fast.

There is a way of phrasing Q learning as a linear programming problem

The linear programming approach relaxes the Bellman equations.

Q(s_t,a_t)=r_t + \gamma \max_a Q(s_{t+1},a)


\forall a. Q(s_t,a_t) \ge r_t +\gamma Q(s_{t+1},a)

We can approach this forall in a couple ways, one of which is just sampling actions somehow. To make the constraint tight in places you minimize a weighting of Q

\min \sum w_i * Q(s_i,a_i)

If Q is written as a linear combination of basis functions

Q(s,a)=\sum \alpha_i f_i(s,a)

The all of this put together is a linear program in the variables \alpha_i.


For ease, I used cvxpy. I don’t even store my state action pairs, which is quite lazy of me. Even here, compiling the linear program via cvxpy is kind of slow. This preprocessing step takes longer than the actual solve does. You could avoid cvxpy and directly interface a linear programming solver much faster, if that is your thing.

The whole process is still model free. I didn’t plug in pendulum dynamics anywhere. I run openAI gym and use the resulting state-action-state tuples to add inequalities to my cvxpy model. I weight where I want the inequalities to be tightest by using the actual states experienced.

Unfortunately, it still took a couple hours of hyper parameter tuning and fiddling to get the thing to work. So not a grand success on that point.

I made a lot of guesswork for what seemed reasonable

I parametrized the dependence of Q on a by a quadratic so that it is easy to maximize analytically. That is what the polyfit stuff is about. Maximum of ax^2+bx+c is at -b/2a. I really should be checking the sign of the a coefficient. I am just assuming it is positive. Naughty boy.

m assuming that it

Chebyshev polynomials are probably good.

It seemed to help to use a slight perturbation of the actual action used on the right hand side of the Bellman inequality. My reasoning here is that the pendulum is actually a continuous system, so we should be using the differential Bellman equation really.

Should I allow for some kind of slack in the equations? Getting a bad reward or data point or one weird unusual state could ruin things for everyone. Inequalities are unforgiving.

Gamma seemed to matter a decent amount

The regularization of alpha seemed largely irrelevant.

Epsilon greediness seems to not matter much either.



Future ideas:

Might be good to replace the sampling of a with a Sum of Squares condition over the variable a.

Should I damp the update in some way? Add a cost the changing alpha from it’s previous value. A kind of damped update / using a prior.





A improved version. Fixed the bug in my maxaction function. I shouldn’t have been assuming that it was always concave down.

Also vectorized slightly. Fairly significantly improves the solve time. Not much time is spent in cvxpy, now the solve is dominated by about 3 legitimate seconds in OSQP.

You can flip stuff in and out of loops to try different versions. This method is off-policy, so I could keep data around forever. However, it mostly just slowed the solve time.


A Touch of Topological Quantum Computation in Haskell Pt. I

Quantum computing exploits the massive vector spaces nature uses to describe quantum phenomenon.

The evolution of a quantum system is described by the application of matrices on a vector describing the quantum state of the system. The vector has one entry for every possible state of the system, so the number of entries can get very, very large. Every time you add a new degree of freedom to a system, the size of the total state space gets multiplied by the size of the new DOF, so you have a vector exponentially sized in the  number  of degrees of freedom.

Now, a couple caveats. We could have described probabilistic dynamics similarly, with a probability associated with each state. The subtle difference is that quantum amplitudes are complex numbers whereas probabilities are positive real numbers. This allows for interference. Another caveat is that when you perform a measurement, you only get a single state, so you are hamstrung by the tiny amount of information you can actually extract out of this huge vector. Nevertheless, there are a handful of situations where, to everyone’s best guess, you get a genuine quantum advantage over classical or probabilistic computation.

Topological quantum computing is based around the braiding of particles called anyons. These particles have a peculiar vector space associated with them and the braiding applies a matrix to this space. In fact, the space associated with the particles can basically only be manipulated by braiding and other states require more energy or very large scale perturbations to access. Computing using anyons has a robustness compared to a traditional quantum computing systems. It can be made extremely unlikely that unwanted states are accessed or unwanted gates applied. The physical nature of the topological quantum system has an intrinsic error correcting power. This situation is schematically similar in some ways to classical error correction on a magnetic hard disk. Suppose some cosmic ray comes down and flips a spin in your hard disk. The physics of magnets makes the spin tend to realign with it’s neighbors, so the physics supplies an intrinsic classical error correction in this case.

The typical descriptions of how the vector spaces associated with anyons work I have found rather confusing. What we’re going to do is implement these vector spaces in the functional programming language Haskell for concreteness and play around with them a bit.


In many systems, the splitting and joining of particles obey rules. Charge has to be conserved. In chemistry, the total number of each individual atom on each side of a reaction must be the same. Or in particle accelerators, lepton number and other junk has to be conserved.

Anyonic particles have their own system of combination rules. Particle A can combine with B to make C or D. Particle B combined with C always make A. That kind of stuff. These rules are called fusion rules and there are many choices, although they are not arbitrary. They can be described by a table N_{ab}^{c} that holds counts of the ways to combine a and b into c. This table has to be consistent with some algebraic conditions, the hexagon and pentagon equations, which we’ll get to later.

We need to describe particle production trees following these rules in order to describe the anyon vector space.

Fibonacci anyons are one of the simplest anyon systems, and yet sufficiently complicated to support universal quantum computation. There are only two particles types in the Fibonacci system, the I particle and the \tau  particle. The I particle is an identity particle (kind of like an electrically neutral particle). It combines with \tau to make a \tau. However, two \tau particle can combine in two different ways, to make another \tau particle or to make an I particle.

So we make a datatype for the tree structure that has one constructor for each possible particle split and one constructor (TLeaf, ILeaf) for each final particle type. We can use GADTs (Generalize Algebraic Data Types) to make only good particle production history trees constructible. The type has two type parameters carried along with it, the particle at the root of the tree and the leaf-labelled tree structure, represented with nested tuples.

Free Vector Spaces

We need to describe quantum superpositions of these anyon trees. We’ll consider the particles at the leaves of the tree to be the set of particles that you have at the current moment in a time. This is a classical quantity. You will not have a superposition of these leaf particles. However, there are some quantum remnants of the history of how these particles were made. The exact history can never be determined, kind of like how the exact history of a particle going through a double slit cannot be determined. However, there is still a quantum interference effect left over. When you bring particles together to combine them, depending on the quantum connections, you can have different possible resulting particles left over with different probabilities. Recombining anyons and seeing what results is a measurement of the system .

Vectors can be described in different basis sets. The bases for anyon trees are labelled by both a tree structure and what particles are at the root and leaves. Different tree associations are the analog of using some basis x vs some other rotated basis x’. The way we’ve built the type level tags in the FibTree reflects this perspective.

The labelling of inner edges of the tree with particles varies depending on which basis vector we’re talking about. A different inner particle is the analog of \hat{x} vs \hat{y}.

To work with these bases we need to break out of the mindset that a vector put on a computer is the same as an array. While for big iron purposes this is close to true, there are more flexible options. The array style forces you to use integers to index your space, but what if your basis does not very naturally map to integers?

A free vector space over some objects is the linear combination of those objects. This doesn’t have the make any sense. We can form the formal sum (3.7💋+2.3i👩‍🎨) over the emoji basis for example. Until we attach more meaning to it, all it really means is a mapping between emojis and numerical coefficients. We’re also implying by the word vector that we can add two of the combinations coefficient wise and multiply scalars onto them.

We are going to import free vectors as described by the legendary Dan Piponi as described here:

What he does is implement the Free vector space pretty directly. We represent a Vector space using a list of tuples [(a,b)]. The a are basis objects and b are the coefficients attached to them.


The Vector monad factors out the linear piece of a computation. Because of this factoring, the type constrains the mapping to be linear, in a similar way that monads in other contexts might guarantee no leaking of impure computations. This is pretty handy. The function you give to bind correspond to a selector columns of the matrix.

We need some way to zoom into a subtrees and then apply operations. We define the operations lmap and rmap.

You reference a node by the path it takes to get there from the root. For example,  (rmap . lmap . rmap) f applies f at the node that is at the right-left-right position down from the root.


For Fibonacci anyons, the only two non trivial braidings happen when you braid two \tau particles. 

We only have defined how to braid two particles that were split directly from the same particle. How do we describe the braiding for the other cases? Well we need to give the linear transformation for how to change basis into other tree structures. Then we have defined braiding for particles without the same immediate parent also.


We can transform to a new basis. where the histories differs by association. We can braid two particles by associating the tree until they are together. An association move does not change any of the outgoing leaf positions. It can, however, change a particle in an interior position. We can apply an F-move anywhere inside the tree, not only at the final leaves.


Fusion / Dot product

Two particles that split can only fuse back into themselves. So the definition is pretty trivial. This is like \hat{e}_i \cdot \hat{e}_j = \delta_{ij}.

Hexagon and Pentagon equations

The F and R matrices and the fusion rules need to obey consistency conditions called the hexagon and pentagon equations. Certain simple rearrangements have alternate ways of being achieved. The alternative paths need to agree.

Next Time:

With this, we have the rudiments of what we need to describe manipulation of anyon spaces. However, applying F-moves manually is rather laborious. Next time we’ll look into automating this using arcane type-level programming. You can take a peek at my trash WIP repo here


A big ole review on topological quantum computation:
Ady Stern on The fractional quantum hall effect and anyons:


Another good anyon tutorial:

Mathematica program that I still don’t get, but is very interesting:

Kitaev huge Paper:

Bonderson thesis:

Bernevig review:

More food for thought:

The Rosetta Stone: