Failing to Bound Kissing Numbers

https://en.wikipedia.org/wiki/Kissing_number

Cody brought up the other day the kissing number problem.Kissing numbers are the number of equal sized spheres you can pack around another one in d dimensions. It’s fairly self evident that the number is 2 for 1-d and 6 for 2d but 3d isn’t so obvious and in fact puzzled great mathematicians for a while. He was musing that it was interesting that he kissing numbers for some dimensions are not currently known, despite the fact that the first order theory of the real numbers is decidable https://en.wikipedia.org/wiki/Decidability_of_first-order_theories_of_the_real_numbers

I suggested on knee jerk that Sum of Squares might be useful here. I see inequalities and polynomials and then it is the only game in town that I know anything about.

Apparently that knee jerk was not completely wrong

https://arxiv.org/pdf/math/0608426.pdf

Somehow SOS/SDP was used for bounds here. I had an impulse that the problem feels SOS-y but I do not understand their derivation.

One way the problem can be formulated is by finding or proving there is no solution to the following set of equations constraining the centers x_i of the spheres. Set the central sphere at (0,0,0,…) . Make the radii 1. Then\forall i. |x_i|^2 = 2^2 and \forall i j.  |x_i - x_j|^2 \ge 2^2

I tried a couple different things and have basically failed. I hope maybe I’ll someday have a follow up post where I do better.

So I had 1 idea on how to approach this via a convex relaxation

Make a vector x = \begin{bmatrix} x_0 & y _0 & x_1 & y _1 & x_2 & y _2 & ... \end{bmatrix} Take the outer product of this vector x^T x = X Then we can write the above equations as linear equalities and inequalities on X. If we forget that we need X to be the outer product of x (the relaxation step), this becomes a semidefinite program. Fingers crossed, maybe the solution comes back as a rank 1 matrix. Other fingers crossed, maybe the solution comes back and says it’s infeasible. In either case, we have solved our original problem.

Didn’t work though. Sigh. It’s conceivable we might do better if we start packing higher powers into x?

Ok Round 2. Let’s just ask z3 and see what it does. I’d trust z3 with my baby’s soft spot.

It solves for 5 and below. Z3 grinds to a halt on N=6 and above. It ran for days doin nothing on my desktop.

Ok. A different tact. Try to use a positivstellensatz proof. If you have a bunch of polynomial inequalities and equalities if you sum polynomial multiples of these constraints, with the inequalities having sum of square multiples, in such a way to = -1, it shows that there is no real solution to them. We have the distance from origin as equality constraint and distance from each other as an inequality constraint. I intuitively think of the positivstellensatz as deriving an impossibility from false assumptions. You can’t add a bunch of 0 and positive numbers are get a negative number, hence there is no real solution.

I have a small set of helper functions for combining sympy and cvxpy for sum of squares optimization. I keep it here along with some other cute little constructs https://github.com/philzook58/cvxpy-helpers

and here is the attempted positivstellensatz.

It worked in 1-d, but did not work in 2d. At order 3 polynomials N=7, I maxed out my ram.

I also tried doing it in Julia, since sympy was killing me. Julia already has a SOS package

It was faster to encode, but it’s using the same solver (SCS), so basically the same thing.

I should probably be reducing the system with respect to equality constraints since they’re already in a Groebner basis. I know that can be really important for reducing the size of your problem

I dunno.

Blah blah blah blah A bunch of unedited trash

https://github.com/peterwittek/ncpol2sdpa Peter Wittek has probably died in an avalanche? That is very sad.

These notes

https://web.stanford.edu/class/ee364b/lectures/sos_slides.pdf

Positivstullensatz.

kissing number

Review of sum of squares

minimimum sample as LP. ridiculous problem
min t
st. f(x_i) – t >= 0

dual -> one dual variable per sample point
The only dual that will be non zero is that actually selecting the minimum.

Hm. Yeah, that’s a decent analogy.

How does the dual even have a chance of knowing about poly airhtmetic?
It must be during the SOS conversion prcoess. In building the SOS constraints,
we build a finite, limittted version of polynomial multiplication
x as a matrix. x is a shift matrix.
In prpducing the characterstic polynomial, x is a shift matrix, with the last line using the polynomial
known to be zero to
eigenvectors of this matrix are zeros of the poly.

SOS does not really on polynomials persay. It relies on closure of the suqaring operaiton

maybe set one sphere just at x=0 y = 2. That breaks some symmettry

set next sphere in plane something. random plane through origin?

order y components – breaks some of permutation symmettry.

no, why not order in a random direction. That seems better for symmettry breaking

Neural Networks with Weighty Lenses (DiOptics?)

I wrote a while back how you can make a pretty nice DSL for reverse mode differentiation based on the same type as Lens. I’d heard some interesting rumblings on the internet around these ideas and so was revisiting them.

Composition is defined identically for reverse mode just as it is for lens.

The forward computation shares info with the backwards differential propagation, which corresponds to a transposed Jacobian

After chewing on it a while, I realized this really isn’t that exotic. How it works is that you store the reverse mode computation graph, and all necessary saved data from the forward pass in the closure of the (dy -> dx). I also have a suspicion that if you defunctionalized this construction, you’d get the Wengert tape formulation of reverse mode ad.

Second, Lens is just a nice structure for bidirectional computation, with one forward pass and one backward pass which may or may not be getting/setting. There are other examples for using it like this.

It is also pretty similar to the standard “dual number” form type FAD x dx y dy = (x,dx)->(y,dy) for forward mode AD. We can bring the two closer by a CPS/Yoneda transformation and then some rearrangement.

and meet it in the middle with

I ended the previous post somewhat unsatisfied by how ungainly writing that neural network example was, and I called for Conal Elliot’s compiling to categories plugin as a possible solution. The trouble is piping the weights all over the place. This piping is very frustrating in point-free form, especially when you know it’d be so trivial pointful. While the inputs and outputs of layers of the network compose nicely (you no longer need to know about the internal computations), the weights do not. As we get more and more layers, we get more and more weights. The weights are in some sense not as compositional as the inputs and outputs of the layers, or compose in a different way that you need to maintain access to.

I thought of a very slight conceptual twist that may help.

The idea is we keep the weights out to the side in their own little type parameter slots. Then we define composition such that it composes input/outputs while tupling the weights. Basically we throw the repetitive complexity appearing in piping the weights around into the definition of composition itself.

These operations are easily seen as 2 dimensional diagrams.

Three layers composed, exposing the weights from all layers
The 2-D arrow things can be built out of the 1-d arrows of the original basic AD lens by bending the weights up and down. Ultimately they are describing the same thing

Here’s the core reverse lens ad combinators

And here are the two dimensional combinators. I tried to write them point-free in terms of the combinators above to demonstrate that there is no monkey business going on. We

I wonder if this is actually nice?

I asked around and it seems like this idea may be what davidad is talking about when he refers to dioptics

http://events.cs.bham.ac.uk/syco/strings3-syco5/slides/dalrymple.pdf

Perhaps this will initiate a convo.

Edit: He confirms that what I’m doing appears to be a dioptic. Also he gave a better link http://events.cs.bham.ac.uk/syco/strings3-syco5/papers/dalrymple.pdf

He is up to some interesting diagrams

https://twitter.com/davidad/status/1179760373030801408?s=20

Bits and Bobbles

  • Does this actually work or help make things any better?
  • Recurrent neural nets flip my intended role of weights and inputs.
  • Do conv-nets naturally require higher dimensional diagrams?
  • This weighty style seems like a good fit for my gauss seidel and iterative LQR solvers. A big problem I hit there was getting all the information to the outside, which is a similar issue to getting the weights around in a neural net.

Flappy Bird as a Mixed Integer Program

My birds.

Mixed Integer Programming is a methodology where you can specify convex (usually linear) optimization problems that include integer/boolean variables.

Flappy Bird is a game about a bird avoiding pipes.

We can use mixed integer programming to make a controller for Flappy Bird. Feel free to put this as a real-world application in your grant proposals, people.

While thinking about writing a MIP for controlling a lunar lander game, I realized how amenable to mixed integer modeling flappy bird is. Ben and I put together the demo on Saturday. You can find his sister blog post here.

The bird is mostly in free fall, on parabolic trajectories. This is a linear dynamic, so it can directly be expressed as a linear constraint. It can discretely flap to give itself an upward impulse. This is a boolean force variable at every time step. Avoiding the ground and sky is a simple linear constraint. The bird has no control over its x motion, so that can be rolled out as concrete values. Because of this, we can check what pipes are relevant at time points in the future and putting the bird in the gap is also a simple linear constraint.

There are several different objectives one might want to consider and weight. Perhaps you want to save the poor birds energy and minimize the sum of all flaps cvx.sum(flap). Or perhaps you want to really be sure it doesn’t hit any pipes by maximizing the minimum distance from any pipe. Or perhaps minimize the absolute value of the y velocity, which is a reasonable heuristic for staying in control. All are expressible as linear constraints. Perhaps you might want a weighted combo of these. All things to fiddle with.

There is a pygame flappy bird clone which made this all the much more slick. It is well written and easy to understand and modify. Actually figuring out the appropriate bounding boxes for pipe avoidance was finicky. Figuring out the right combo of bird size and pipe size is hard, combined with computer graphics and their goddamn upside down coordinate system.

We run our solver in a model predictive control configuration. Model predictive control is where you roll out a trajectory as an optimization problem and resolve it at every action step. This turns an open loop trajectory solve into a closed loop control, at the expense of needing to solve a perhaps very complicated problem in real time. This is not always feasible.

My favorite mip modeling tool is cvxpy. It gives you vectorized constraints and slicing, which I love. More tools should aspire to achieve numpy-like interfaces. I’ve got lots of other blog posts using this package which you can find in my big post list the side-bar 👀.

The github repo for the entire code is here: https://github.com/philzook58/FlapPyBird-MPC

And here’s the guts of the controller:

I think it is largely self explanatory but here are some notes. The dt = t//10 + 1 thing is about decreasing our time resolution the further out from the current time step. This increases the time horizon without the extra computation cost. Intuitively modeling accuracy further out in time should matter less. The last_solution stuff is for in case the optimization solver fails for whatever reason, in which case it’ll follow open-loop the last trajectory it got.

Bits and Bobbles

  • We changed the dynamics slightly from the python original to make it easier to model. We found this did not change the feel of the game. The old dynamics were piecewise affine though, so are also modelable using mixed integer programming. http://groups.csail.mit.edu/robotics-center/public_papers/Marcucci18.pdf . Generally check out the papers coming out of the Tedrake group. They are sweet. Total fanboy over here.
  • The controller as is is not perfect. It fails eventually, and it probably shouldn’t. A bug? Numerical problems? Bad modeling of the pipe collision? A run tends to get through about a hundred pipes before something gets goofy.
  • Since we had access to the source code, we could mimic the dynamics very well. How robust is flappy bird to noise and bad modeling? We could add wind, or inaccurate pipe data.
  • Unions of Convex Region. Giving the flappy bird some x position control would change the nature of the problem. We could easily cut up the allowable regions of the bird into rectangles, and represent the total space as a union of convex regions, which is also MIP representable.
  • Verification – The finite difference scheme used is crude. It is conceivable for the bird to clip a pipe. Since basically we know the closed form of the trajectories, we could verify that the parabolas do not intersect the regions. For funzies, maybe use sum of squares optimization?
  • Realtime MIP. Our solver isn’t quite realtime. Maybe half real time. One might pursue methods to make the mixed integer program faster. This might involve custom branching heuristics, or early stopping. If one can get the solver fast enough, you might run the solver in parallel and only query a new path plan every so often.
  • 3d flappy bird? Let the bird rotate? What about a platformer (Mario) or lunar lander? All are pretty interesting piecewise affine systems.
  • Is this the best way to do this? Yes and no. Other ways to do this might involve doing some machine learning, or hardcoding a controller that monitors the pipe locations and has some simple feedback. You can find some among the forks of FlapPyBird. I have no doubt that you could write these quickly, fiddle with them and get them to work better and faster than this MIP controller. However, for me there is a difference between could work and should work. You can come up with a thousand bizarre schemes that could work. RL algorithms fall in this camp. But I have every reason to believe the MIP controller should work, which makes it easier to troubleshoot.

The Classical Coulomb Gas as a Mixed Integer Quadratic Program

The coulomb gas is a model of electrostatics where you take the discreteness of charge into account. That is what makes it hard compared to the continuous charge problem. Previously, I’ve used mixed integer programming to find lowest energy states of the ising model. This is even more obvious application of mixed integer programming to a physics model.

We ordinarily consider electric charge to be a continuum, but it isn’t. It comes in chunks of the electron charge. Historically, people didn’t even know that for quite a while. It is usually a reasonable approximation for most purposes to consider electric charge to be continuous

If you consider a network of capacitors cooled to the the level that there is not enough thermal energy to borrow to get an electron to jump, the charges on the capacitors will be observably discretized. With a sufficiently slow cooling to this state, the charges should arrange themselves into the lowest energy state.

The coulomb gas model also is of interest due to its connections to the XY model, which I’ve taken a stab at with mixed integer programming before. The coulomb gas models the energy of vortices in that model. I think the connection between the models actually requires a statistical or quantum mechanical context though, whereas we’ve been looking at the classical energy minimization.

We can formulate the classical coulomb gas problem very straightforwardly as a mixed integer quadratic program. We can easily include an externally applied field and a charge conservation constraint if we so desire within the framework.

We write this down in python using the cvxpy library, which has a built in free MIQP solver, albeit not a very good one. Commercial solvers are probably quite a bit better.

A plot of charge in a constant external electric field.

The results seems reasonable. It makes sense for charge to go in the direction of the electric field. Going to the corners makes sense because then like charges are far apart. So this might be working. Who knows.

Interesting places to go with this:

Prof Vanderbei shows how you can embed an FFT to enable making statements about both the time and frequency domain while keeping the problem sparse. The low time/memory N log(N) complexity of the FFT is reflected in the spasity of the resulting linear program.

https://vanderbei.princeton.edu/tex/ffOpt/ffOptMPCrev4.pdf

Here’s a sketch about what this might look like. Curiously, looking at the actual number of nonzeros in the problem matrices, there are way too many. I am not sure what is going on. Something is not performing as i expect in the following code

The equivalent dense DFT:

It would be possible to use a frequency domain solution of the interparticle energy rather than the explicit coulomb law form. Hypothetically this might increase the sparsity of the problem.

It seems very possible to me in a similar manner to embed a fast multipole method or barnes-hut approximation within a MIQP. Introducing explicit charge summary variables for blocks would create a sparse version of the interaction matrix. So that’s fun.

A Basic Branch and Bound Solver in Python using Cvxpy

Branch and bound is a useful problem solving technique. The idea is, if you have a minimization problem you want to solve, maybe there is a way to relax the constraints to an easier problem. If so, the solution of the easier problem is a lower bound on the possible solution of the hard problem. If the solution of the easier problem just so happens to also obey the more constrained hard problem, then it must also be the solution to the hard problem. You can also use the lower bound coming from a relaxed problem to prune your search tree for the hard problem. If even the relaxed problem doesn’t beat the current best found, don’t bother going down that branch.

A standard place this paradigm occurs is in mixed integer programming. The relaxation of a binary constraint (either 0 or 1) can be all the values in between (any number between 0 and 1). If this relaxed problem can be expressed in a form amenable to a solver like a linear programming solver, you can use that to power the branch and bound search, also using returned solutions for possible heuristics.

I built a basic version of this that uses cvxpy as the relaxed problem solver. Cvxpy already has much much faster mixed integer solvers baked in (which is useful to make sure mine is returning correct results), but it was an interesting exercise. The real reason I’m toying around is I kind of want the ability to add custom branching heuristics or inspect and maintain the branch and bound search tree, which you’d need to get into the more complicated guts of the solvers bound to cvxpy to get at. Julia might be a better choice.

A somewhat similar (and better) project is https://github.com/oxfordcontrol/miosqp which doesn’t use cvxpy explicitly, but does have the branch and bound control in the python layer of the solver. There are also other projects that can use fairly arbitrary solvers like Bonmin

As a toy problem I’m using a knapsack problem where we have objects of different sizes and different values. We want to maximize the value while keeping the total size under the capacity of the bag. This can be phrased linearly like so: \max v \cdot x s.t. \sum_i s_i x_i<= capacity , x \in {0,1}. The basic heuristic I’m using is to branch on variables that are either 0 or 1 in even the relaxed solution. The alternative branch hopefully gets pruned fast.

This is at least solving the problem fairly quickly. It needs better heuristics and to be sped up, which is possible in lots of ways. I was not trying to avoid all performance optimizations. It takes maybe 5 seconds, whereas the cvxpy solver is almost instantaneous.

Edit : I should investigate the Parameter functionality of cvxpy. That would make a make faster version than the one above based on deepcopy. If you made the upper and lower vectors on the binary variables parameters, you could restrict the interval to 0/1 without rebuilding the problem or copying everything.

Mixed Integer Programming & Quantization Error

I though of another fun use case of mixed integer programming the other day. The quantization part of a digital to analog converter is difficult to analyze by the techniques taught in a standard signals course (linear analysis, spectral techniques, convolution that sort of thing). The way it is usually done is via assuming the quantization error is a kind of randomized additive noise.

Mixed Integer programming does have the ability to directly encode some questions about this quantization though. We can directly encode the integer rounding relations by putting the constraint that the quantized signal is exactly +-1/2 a quantization interval away from the original signal. Then we can run further analysis on the signals and compare them. For example, I wrote down a quick cosine transform. Then I ask for the worst case signal that leads to the most error on the quantized transform versus the transform of the unquantized signal. My measure of worst case performance was the sum of the difference of the two transforms. I chose this because it is tractable as a mixed integer linear program. Not all reasonable metrics one might want will be easily encodable in a mixed integer framework it seems. Some of them are maximizing over a convex function, which is naughty. (for example trying to maximize the L2 error \sum|x-y|^2 )

In a variant of this, it is also possible to directly encode the digital signal process in terms of logic gate construction and compare that to the intended analog transform, although this will be a great deal more computational expensive.

This is interesting as a relatively straightforward technique for the analysis of quantization errors.

This also might be an interesting place to use the techniques of Vanderbei https://vanderbei.princeton.edu/tex/ffOpt/ffOptMPCrev4.pdf . He does a neato trick where he partially embeds the FFT algorithm into an optimization problem by adding auxiliary variables. Despite the expense of adding these variables, it greatly increases the sparsity of the constraint matrices, which will probably be a win. I wonder if one might do something similar with a Fast Multipole Method , Barnes Hut, or Wavelet transform? Seems likely. Would be neat, although I’m not sure what for. I was thinking of simulating the coulomb gas. That seems like a natural choice. Oooh. I should do that.

Solving the XY Model using Mixed Integer Optimization in Python

There are many problems in physics that take the form of minimizing the energy. Often this energy is taken to be quadratic in the field. The canonical example is electrostatics. The derivative of the potential \phi gives the electric field E. The energy is given as \int (|\nabla \phi|^2 + \phi \rho) d^3 x . We can encode a finite difference version of this (with boundary conditions!) directly into a convex optimization modelling language like so.

The resulting logarithm potential

It is noted rarely in physics, but often in the convex optimization world that the barrier between easy and hard problems is not linear vs. nonlinear, it is actually more like convex vs. nonconvex. Convex problems are those that are bowl shaped, on round domains. When your problem is convex, you can’t get caught in valleys or on corners, hence greedy local methods like gradient descent and smarter methods work to find the global minimum. When you differentiate the energy above, it results in the linear Laplace equations \nabla^2 \phi = \rho. However, from the perspective of solvability, there is not much difference if we replace the quadratic energy with a convex alternative.

Materials do actually have non-linear permittivity and permeability, this may be useful in modelling that. It is also possible to consider the convex relaxation of truly hard nonlinear problems and hope you get the echoes of the phenomenology that occurs there.

Another approach is to go mixed integer. Mixed Integer programming allows you to force that some variables take on integer values. There is then a natural relaxation problem where you forget the integer variables have to be integers. Mixed integer programming combines a discrete flavor with the continuous flavor of convex programming. I’ve previously shown how you can use mixed integer programming to find the lowest energy states of the Ising model but today let’s see how to use it for a problem of a more continuous flavor.

As I’ve described previously, in the context of robotics, the non-convex constraint that variables lie on the surface of a circle can be approximated using mixed integer programming. We can mix this fairly trivially with the above to make a global solver for the minimum energy state of the XY model. The XY model is a 2d field theory where the value of the field is constrained to lie on a circle. It is a model of a number of physical systems, such as superconductivity, and is the playground for a number of interesting phenomenon, like the Kosterlitz-Thouless phase transition. Our encoding is very similar to the above except we make two copies of the field phi and we then force them to lie on a circle. I’m trying to factor out the circle thing into my library cvxpy-helpers, which is definitely a work in progress.

Now, this isn’t really an unmitigated success as is. I switched to an absolute value potential because GLPK_MI needs it to be linear. ECOS_BB works with a quadratic potential, but it was not doing a great job. The commercial solvers (Gurobi, CPlex, Mosek) are supposed to be a great deal better. Perhaps switching to Julia, with it’s richer ecosystem might be a good idea too. I don’t really like how the solution of the absolute value potential looks. Also, even at such a small grid size it still takes around a minute to solve. When you think about it, it is exploring a ridiculously massive space and still doing ok. There are hundreds of binary variables in this example. But there is a lot of room for tweaking and I think the approach is intriguing.

Musings:

  • Can one do steepest descent style analysis for low energy statistical mechanics or quantum field theory?
  • Is the trace of the mixed integer program search tree useful for perturbative analysis? It seems intuitively reasonable that it visits low lying states
  • The Coulomb gas is a very obvious candidate for mixed integer programming. Let the charge variables on each grid point = integers. Then use the coulomb potential as a quadratic energy. The coulomb gas is dual to the XY model. Does this exhibit itself in the mixed integer formalism?
  • Coulomb Blockade?
  • Nothing special about the circle. It is not unreasonable to make piecewise linear approximations or other convex approximations of the sphere or of Lie groups (circle is U(1) ). This is already discussed in particular about SO(3) which is useful in robotic kinematics and other engineering topics.

Edit: /u/mofo69extreme writes:

By absolute value potential, I mean using |del phi| as compared to a more ordinary quadratic |del phi|2.

This is where I’m getting confused. As you say later, you are actually using two fields, phi_x and phi_y. So I’m guessing your potential is the “L1 norm”

|del phi| = |del phi_x| + |del phi_y|

? This is the only linear thing I can think of.

I don’t feel that the exact specifics of the XY model actually matter all the much.

You should be careful here though. A key point in the XY model is the O(2) symmetry of the potential: you can multiply the vector (phi_x,phi_y) by a 2D rotation matrix and the Hamiltonian is unchanged. You have explicitly broken this symmetry down to Z_4 if your potential is as I have written above. In this case, the results of the famous JKKN paper and this followup by Kadanoff suggest that you’ll actually get a phase transition of the so-called “Ashkin-Teller” universality class. These are actually closely related to the Kosterlitz-Thouless transitions of the XY model; the full set of Ashkin-Teller phase transitions actually continuously link the XY transition with that of two decoupled Ising models.

You should still get an interesting phase transition in any case! Just wanted to give some background, as the physics here is extremely rich. The critical exponents you see will be different from the XY model, and you will actually get an ordered Z_4 phase at low temperatures rather than the quasi-long range order seen in the low temperature phase of the XY model. (You should be in the positive h_4 region of the bottom phase diagram of Figure 1 of the linked JKKN paper.)”

These are some interesting points and references.

2D Robot Arm Inverse Kinematics using Mixed Integer Programming in Cvxpy

Mixed Integer programming is crazy powerful. You can with ingenuity encode many problems into it. The following is a simplification of the ideas appearing in http://groups.csail.mit.edu/robotics-center/public_papers/Dai19.pdf . They do 3d robot arms, I do 2d. I also stick to completely linear approximations.

The surface of a circle is not a convex shape. If you include the interior of a circle it is. You can build a good approximation to the circle as polygons. A polygon is the union of it’s sides, each of which is a line segment. Line sgements are convex set. Unions of convex sets are encodable using mixed integer programming. What I do is sample N regular positions on the surface of a circle. These are the vertices of my polygon. Then I build boolean indicator variables for which segment we are on. Only one of them is be nonzero \sum s_i == 1. If we are on a segment, we are allowed to make positions x that interpolate between the endpoints x_i of that segment x = \lambda_1 x_1 + \lambda_2 x_2, where \lambda_i >= 0 and \sum \lambda=1. These \lambda are only allowed to be nonzero if we are on the segment, so we suppress them with the indicator variables \lambda_i <= s_i + s_{i+1}. That’s the gist of it.

image link

Given a point on the circle (basically sines and cosines of an angle) we can build a 2d rotation matrix R from it. Then we can write down the equations connecting subsequent links on the arm. p_{i+1}=p_{i} +Rl. By using global rotations with respect to the world frame, these equations stay linear. That is a subtle point. p and R are variables, whereas l is a constant describing the geometry of the robot arm. If we instead used rotation matrices connecting frame i to i+1 these R matrices would compound nonlinearly.

All in all, pretty cool!

The Beauty of the Cone: How Convex Cones Simplify Convex Programming

I watched the Stephen Boyd course to get me started in convex programming. At the beginning, he spends some time talking about convex sets rather than launching in convex optimization. I did not appreciate this sufficiently on the first pass. Convex sets are a very geometric topic and I think that for the most part, convex functions are best thought as a special case of them. The epigraph of a scalar valued convex function on R^d , the filled in area above a graph, is a d+1 dimensional convex set. Convex constraints on the domain can be thought of as further cutting this shape. Finding the minimum of the shape can be thought of as a geometrical problem of finding the furthest point in the -y direction.

There is another mathematical topic that I did not appreciate for how powerful and clean it is. If you check out this textbook by Fenchel, he starts with the topic of convex cones rather than sets, I now realize for good reason.

I was sketching out a programmatic representation of convex sets and was annoyed at how ugly things were turning out. First off, infinity is a huge problem. Many convex problems have infinite answers.

The simplest problem is \max_x c^T x with no constraints. The answer plunges off to infinity vaguely in the direction of c. The next simplest problem is \max_x c^T x , a^T x \geq 0. This either goes off to infinity away from the constraint plane, hits the constraint plane and goes off to infinity, or if c and a are parallel, it is an arbitrary location on the constraint plane.

In short, the very most simple convex problems have infinite answers. You actually need to have a fairly complex problem with many conditions before you can guarantee a finite answer. Once we have a bounded LP, or a positive definite quadratic problem do we start to guarantee boundedness.

In order to work with these problems, it is helpful (necessary?) to compactify your space. There are a couple options here. One is to arbitrarily make a box cutoff. If we limit ourselves to an arbitrary box of length 1e30, then every answer that came back as infinite before is now finite, albeit huge. This makes me queasy though. It is ad hoc, actually kind of annoying to program all the corner cases, and very likely to have numerical issues. Another possibility is to extend your space with rays. Rays are thought of as points at infinity. Now any optimization problem that has an infinite answer returns the ray in the direction the thing goes of to infinity at. It is also annoying to make every function work with either rays or points though.

Another slightly less bothersome aesthetic problem is that the natural representation of half spaces is a normal ray and offset a^T x \geq b The principles of duality make one want to make this object as similar to our representation of points as possible, but it has 1-extra dimension and 1 arbitrary degree of freedom (scalar multiplying a and b by the same positive constant does not change the geometrical half space described). This is ugly.

In the field of projective geometry, there is a very beautiful thing that arises. In projective geometry, all scalar multiples of a ray are considered the same thing. This ray is considered a “point”. The reason this makes sense is that projective geometry is a model of perspective and cameras. Two points are completely equivalent from the perspective of a pinhole camera if they lie on the same ray connecting to the pinhole. This ray continues inside the camera and hits the photographic screen. Hence points on the 2D screen correspond to rays in 3D space. It makes elegant sense to consider the pinhole to be the origin or your space, and hence you come to the previous abstract definition. Points at infinity in 3D (like stars effectively) are not a problem since they lie on finitely describable rays. Points at infinite edge of the 2D screen are not really a problem either. Perfectly reasonable points in 3D can map to the infinite edge of the screen in principle. Someone standing perfectly to the side of the pinhole in 3d space has a ray that goes perfectly horizontally into the camera, and in a sense would only hit a hypothetical infinite screen at infinity.

A great many wonderful (and practical!) things fall out of the projective homogenous coordinates. They are ubiquitous in computer graphics, computer vision, and robotics. The mathematical language describing translations and rotations is unified. Both can be described using a single matrix. This is not the intention, but it is a pleasant surprise. Other geometrical questions become simple questions of linear or vector algebra. It is very cool.

Can we use this method for describing the space we want to find convex sets in? I think not. Unfortunately, the topology of projective space is goofy. At the very least in 2D projective space, which can be thought of as a sphere with opposite points identified, do not necessarily have an inside and outside (I’m questioning this idea now)? So convex sets and talking about maximal half planes and such seems questionable.

But I think we can fix it. Cones are good. In a slight twist on the projective geometry idea, what if you only non negative multiples of rays \lambda \geq 0 as the same “point”. You can take as a canonical plane x_0 =1 similar to the pinhole camera. This plane can be thought of as your more ordinary affine space. Now half spaces touching the origin (cones) correspond to affine half spaces. We have a reasonable way of describing points at infinity on this plane, which correspond to rays. Arbitrary convex sets on this plane correspond to cones of rays.

Cones in this context are sets closed under arbitrary non-negative sums of points within them. Hence a cone always includes the origin. Cones are basically convex sets of rays.

By adding in an arbtrary-ish degree of freedom to points, we bring points and half spaces much closer in alignment. Now evaluating whether a point in a half space looks like a^T x \geq 0 with no ugly extra b.

So in closing, as convex sets are kind of a cleaner version of convex functions, so are convex cones a cleaner version of convex sets. This is actually useful, since when you’re programming, you’ll have to deal with way less corner infinite cases. The theory is also more symmetrical and beautiful

Casadi – Pretty Damn Slick

Casadi is something I’ve been aware of and not really explored much. It is a C++ / python / matlab library for modelling optimization problems for optimal control with bindings to IPOpt and other solvers. It can produce C code and has differentiation stuff. See below for some examples after I ramble.

I’ve enjoyed cvxpy, but cvxpy is designed specifically for convex problems, of which many control problems are not.

Casadi gives you a nonlinear modelling language and easy access to IPOpt, an interior point solver that works pretty good (along with some other solvers, many of which are proprietary however).

While the documentation visually looks very slick I actually found it rather confusing in contents at first. I’m not sure why. Something is off.

You should download the “example pack” folder. Why they don’t have these in html on the webpage is insane to me. https://github.com/casadi/casadi/releases/download/3.4.4/casadi-example_pack-v3.4.4.zip

It also has a bunch of helper classes for DAE building and other things. They honestly really put me off. The documentation is confusing enough that I am not convinced they give you much.

The integrator classes give you access to external smart ode solvers from the Sundials suite. They give you good methods for difficult odes and dae (differential algebraic equations, which are ODEs with weird constraints like x^1 + y^1 == 1) Not clear to me if you can plug those in to an optimization, other than by a shooting method.

Casadi can also output C which is pretty cool.

I kind of wondered about Casadi vs Sympy. Sympy has lot’s of general purpose symbolic abilities. Symbolic solving, polynomial smarts, even some differential equation understanding. There might be big dividends to using it. But it is a little harder to get going. I feel like there is an empty space for a mathemtical modelling language that uses sympy as it’s underlying representation. I guess monkey patching sympy expressions into casadi expression might not be so hard. Sympy can also output fast C code. Sympy doesn’t really have any support for sparseness that I know of.

As a side note, It can be useful to put these other languages into numpy if you need extended reshaping abilities. The other languages often stop at matrices, which is odd to me.

Hmm. Casadi actually does have access to mixed integer programs via bonmin (and commercial solvers). That’s interesting. Check out lotka volterra minlp example

https://groups.google.com/forum/#!topic/casadi-users/8xCHmP7UmpI

The optim interface makes some of this look better. optim.minimize and subject_to. Yeah, this is more similar to the interfaces I’m used to. It avoids the manual unpacking of the solution and the funky feel of making everything into implicit == 0 expressions.

https://web.casadi.org/blog/opti/

Here is a simple harmonic oscillator example using the more raw casadi interface. x is positive, v is velocity, u is a control force. I’m using a very basic leap frog integration. You tend to have to stack things into a single vector with vertcat when building the final problem.

Let’s use the opti interface, which is pretty slick. Here is a basic cartpole https://web.casadi.org/blog/opti/

Very fast. Very impressive. Relatively readable code. I busted this out in like 15 minutes. IPopt solves the thing in the blink of an eye (about 0.05s self reported). Might be even faster if I warm start it with a good solution, as I would in online control (which may be feasible at this speed). Can add the initial condition as a parameter to the problem

I should try this on an openai gym.

Haskell has bindings to casadi.

http://hackage.haskell.org/package/dynobud