The Beauty of the Cone: How Convex Cones Simplify Convex Programming

I watched the Stephen Boyd course to get me started in convex programming. At the beginning, he spends some time talking about convex sets rather than launching in convex optimization. I did not appreciate this sufficiently on the first pass. Convex sets are a very geometric topic and I think that for the most part, convex functions are best thought as a special case of them. The epigraph of a scalar valued convex function on R^d , the filled in area above a graph, is a d+1 dimensional convex set. Convex constraints on the domain can be thought of as further cutting this shape. Finding the minimum of the shape can be thought of as a geometrical problem of finding the furthest point in the -y direction.

There is another mathematical topic that I did not appreciate for how powerful and clean it is. If you check out this textbook by Fenchel, he starts with the topic of convex cones rather than sets, I now realize for good reason.

I was sketching out a programmatic representation of convex sets and was annoyed at how ugly things were turning out. First off, infinity is a huge problem. Many convex problems have infinite answers.

The simplest problem is \max_x c^T x with no constraints. The answer plunges off to infinity vaguely in the direction of c. The next simplest problem is \max_x c^T x , a^T x \geq 0. This either goes off to infinity away from the constraint plane, hits the constraint plane and goes off to infinity, or if c and a are parallel, it is an arbitrary location on the constraint plane.

In short, the very most simple convex problems have infinite answers. You actually need to have a fairly complex problem with many conditions before you can guarantee a finite answer. Once we have a bounded LP, or a positive definite quadratic problem do we start to guarantee boundedness.

In order to work with these problems, it is helpful (necessary?) to compactify your space. There are a couple options here. One is to arbitrarily make a box cutoff. If we limit ourselves to an arbitrary box of length 1e30, then every answer that came back as infinite before is now finite, albeit huge. This makes me queasy though. It is ad hoc, actually kind of annoying to program all the corner cases, and very likely to have numerical issues. Another possibility is to extend your space with rays. Rays are thought of as points at infinity. Now any optimization problem that has an infinite answer returns the ray in the direction the thing goes of to infinity at. It is also annoying to make every function work with either rays or points though.

Another slightly less bothersome aesthetic problem is that the natural representation of half spaces is a normal ray and offset a^T x \geq b The principles of duality make one want to make this object as similar to our representation of points as possible, but it has 1-extra dimension and 1 arbitrary degree of freedom (scalar multiplying a and b by the same positive constant does not change the geometrical half space described). This is ugly.

In the field of projective geometry, there is a very beautiful thing that arises. In projective geometry, all scalar multiples of a ray are considered the same thing. This ray is considered a “point”. The reason this makes sense is that projective geometry is a model of perspective and cameras. Two points are completely equivalent from the perspective of a pinhole camera if they lie on the same ray connecting to the pinhole. This ray continues inside the camera and hits the photographic screen. Hence points on the 2D screen correspond to rays in 3D space. It makes elegant sense to consider the pinhole to be the origin or your space, and hence you come to the previous abstract definition. Points at infinity in 3D (like stars effectively) are not a problem since they lie on finitely describable rays. Points at infinite edge of the 2D screen are not really a problem either. Perfectly reasonable points in 3D can map to the infinite edge of the screen in principle. Someone standing perfectly to the side of the pinhole in 3d space has a ray that goes perfectly horizontally into the camera, and in a sense would only hit a hypothetical infinite screen at infinity.

A great many wonderful (and practical!) things fall out of the projective homogenous coordinates. They are ubiquitous in computer graphics, computer vision, and robotics. The mathematical language describing translations and rotations is unified. Both can be described using a single matrix. This is not the intention, but it is a pleasant surprise. Other geometrical questions become simple questions of linear or vector algebra. It is very cool.

Can we use this method for describing the space we want to find convex sets in? I think not. Unfortunately, the topology of projective space is goofy. At the very least in 2D projective space, which can be thought of as a sphere with opposite points identified, do not necessarily have an inside and outside (I’m questioning this idea now)? So convex sets and talking about maximal half planes and such seems questionable.

But I think we can fix it. Cones are good. In a slight twist on the projective geometry idea, what if you only non negative multiples of rays \lambda \geq 0 as the same “point”. You can take as a canonical plane x_0 =1 similar to the pinhole camera. This plane can be thought of as your more ordinary affine space. Now half spaces touching the origin (cones) correspond to affine half spaces. We have a reasonable way of describing points at infinity on this plane, which correspond to rays. Arbitrary convex sets on this plane correspond to cones of rays.

Cones in this context are sets closed under arbitrary non-negative sums of points within them. Hence a cone always includes the origin. Cones are basically convex sets of rays.

By adding in an arbtrary-ish degree of freedom to points, we bring points and half spaces much closer in alignment. Now evaluating whether a point in a half space looks like a^T x \geq 0 with no ugly extra b.

So in closing, as convex sets are kind of a cleaner version of convex functions, so are convex cones a cleaner version of convex sets. This is actually useful, since when you’re programming, you’ll have to deal with way less corner infinite cases. The theory is also more symmetrical and beautiful

Casadi – Pretty Damn Slick

Casadi is something I’ve been aware of and not really explored much. It is a C++ / python / matlab library for modelling optimization problems for optimal control with bindings to IPOpt and other solvers. It can produce C code and has differentiation stuff. See below for some examples after I ramble.

I’ve enjoyed cvxpy, but cvxpy is designed specifically for convex problems, of which many control problems are not.

Casadi gives you a nonlinear modelling language and easy access to IPOpt, an interior point solver that works pretty good (along with some other solvers, many of which are proprietary however).

While the documentation visually looks very slick I actually found it rather confusing in contents at first. I’m not sure why. Something is off.

You should download the “example pack” folder. Why they don’t have these in html on the webpage is insane to me. https://github.com/casadi/casadi/releases/download/3.4.4/casadi-example_pack-v3.4.4.zip

It also has a bunch of helper classes for DAE building and other things. They honestly really put me off. The documentation is confusing enough that I am not convinced they give you much.

The integrator classes give you access to external smart ode solvers from the Sundials suite. They give you good methods for difficult odes and dae (differential algebraic equations, which are ODEs with weird constraints like x^1 + y^1 == 1) Not clear to me if you can plug those in to an optimization, other than by a shooting method.

Casadi can also output C which is pretty cool.

I kind of wondered about Casadi vs Sympy. Sympy has lot’s of general purpose symbolic abilities. Symbolic solving, polynomial smarts, even some differential equation understanding. There might be big dividends to using it. But it is a little harder to get going. I feel like there is an empty space for a mathemtical modelling language that uses sympy as it’s underlying representation. I guess monkey patching sympy expressions into casadi expression might not be so hard. Sympy can also output fast C code. Sympy doesn’t really have any support for sparseness that I know of.

As a side note, It can be useful to put these other languages into numpy if you need extended reshaping abilities. The other languages often stop at matrices, which is odd to me.

Hmm. Casadi actually does have access to mixed integer programs via bonmin (and commercial solvers). That’s interesting. Check out lotka volterra minlp example

https://groups.google.com/forum/#!topic/casadi-users/8xCHmP7UmpI

The optim interface makes some of this look better. optim.minimize and subject_to. Yeah, this is more similar to the interfaces I’m used to. It avoids the manual unpacking of the solution and the funky feel of making everything into implicit == 0 expressions.

https://web.casadi.org/blog/opti/

Here is a simple harmonic oscillator example using the more raw casadi interface. x is positive, v is velocity, u is a control force. I’m using a very basic leap frog integration. You tend to have to stack things into a single vector with vertcat when building the final problem.

Let’s use the opti interface, which is pretty slick. Here is a basic cartpole https://web.casadi.org/blog/opti/

Very fast. Very impressive. Relatively readable code. I busted this out in like 15 minutes. IPopt solves the thing in the blink of an eye (about 0.05s self reported). Might be even faster if I warm start it with a good solution, as I would in online control (which may be feasible at this speed). Can add the initial condition as a parameter to the problem

I should try this on an openai gym.

Haskell has bindings to casadi.

http://hackage.haskell.org/package/dynobud

A Simple Interior Point Linear Programming Solver in Python

This solver is probably not useful for anything. For almost all purposes, let me point you to cvxpy.

If you want an open source solver CBC/CLP and GLPK and OSQP are good.

If you want proprietary, you can get a variable number constrained trial license to Gurobi for free.

Having said that, here we go.

 

The simplex method gets more press, and certainly has it’s advantages, but the interior point method makes much more sense to me. What follows is the basic implementation described in Stephen Boyd’s course and book http://web.stanford.edu/~boyd/cvxbook/

In the basic interior point method, you can achieve your inequality constraints \phi(x) \ge 0 by using a logarithmic potential to punish getting close to them -\gamma \ln (\phi(x)) where \gamma is a parameter we’ll talk about in a bit.  From my perspective, the logarithm is a somewhat arbitrary choice. I believe some properties of the logarithmic potential is necessary for some convergence guarantees.

The basic unconstrained newton step takes a locally quadratic approximation to the function you’re trying to optimize and finds the minimum of that. This basically comes down to taking a step that is the inverse hessian applied to the gradient.

\min_{dx} f(x_0+dx) \approx f(x_0) + \nabla f(x_0)dx + \frac{1}{2} dx^T H dx

(H)_{ij} = \partial_{ij}f(x_0)

\nabla f(x_0) +H dx = 0 \rightarrow dx =- H^{-1}\nabla f

We can maintain a linear constraint on the variable x during this newton step. Instead of setting the gradient to zero, we set it so that it is perpendicular to the constraint plane using the Lagrange multiplier procedure.

\nabla f(x_0) +H dx = -A^T \lambda \rightarrow Hdx + A^T \lambda = - \nabla f

A(x_0 + dx) = b

This is a block linear system

\begin{bmatrix}  H & A^T \\  A & 0 \\  \end{bmatrix}  \begin{bmatrix}  dx \\ \lambda  \end{bmatrix}  = \begin{bmatrix}  -\nabla f \\ b - Ax_0  \end{bmatrix}

Despite the logarithm potential, there is no guarantee that the newton step would not take us outside the allowed region. This is why we need a line search on top of the newton step. We scale the newton dx to \alpha dx. Because the function we’re optimizing is convex and the region we’re in is convex, there is some step length in that newton direction that will work. So if we keep decreasing the overall step size, we’ll eventually find something acceptable.

As part of the interior point method, once it has converged we decrease the parameter \gamma applied to the logarithm potential. This will allow the inequality constraints to satisfied tighter and tighter with smaller gamma.

The standard form of an LP is

\min c^T x

A x = b

x \ge 0

This doesn’t feel like the form you’d want. One way you can construct this is by adding slack variables and splitting regular variables into a positive and negative piece

x = x_+ - x_-

Ax \ge b \rightarrow Ax +s = b,  s \ge 0

 

The interior point formulation of this is

\min c^T x- \gamma \sum_i \ln(x_i)

Ax = b

The Hessian and gradient are quite simple here

\nabla f = -\frac{\gamma}{x_i}

(H)_{ij} = \delta_{ij} \frac{\gamma}{x_i^2}

The optimum conditions for this are

\nabla (c^T x - \gamma \ln(x))= c - \gamma \frac{1}{x} = 0

Ax=b

 

Now in the above, I’m not sure I got all the signs right, but I did implement it in python. The result seems to be correct and does work. I haven’t tested extensively, YMMV. It’s a useful starting point.

 

 

Musings:

I wanted to build this because I’ve been getting really into mixed integer programming and have been wondering how much getting deep in the guts of the solver might help. Given my domain knowledge of the problems at hand, I have probably quite good heuristics. In addition, I’ve been curious about a paper that has pointed out an interesting relatively unexploited territory, combining machine learning with mixed integer programming https://arxiv.org/pdf/1811.06128

For these purposes, I want a really simple optimization solver.

But this is silly. I should use CLP or OSQP as a black box if I really want to worry about the mixed integer aspect.

MIOSQP is interesting.

It is interesting how the different domains of discrete optimization and search seem to have relatively similar sets of methods. Maybe I’m crazy. Maybe at the loose level I’m gonna talk almost anything is like almost anything else.

Clause learning and Cutting plane addition feel rather similar.

Relaxation to LP and unit propagation are somewhat similar. Or is unit propagation like elimination?

Mixed integer programs build their own heuristics.

Fourier Motzkin and resolution are similar methods. In Fourier motzkin, you eliminate variables in linear inequalities by using algebra to bring that variable by itself on one side of the inequality and then matching up all the <= to all the uses of >= of that variable. There are packages that compute these things. See CDD or Polyhedra.jl

Resolution takes boolean formula. You can eliminate a variable q from a CNF formula by taking all the negated instances \not q and combining them with all positive instances.

Trajectory Optimization of a Pendulum with Mixed Integer Linear Programming

There is a reasonable piecewise linear approximation for the pendulum replacing the the sinusoidal potential with two quadratic potentials (one around the top and one around the bottom). This translates to a triangle wave torque.

Cvxpy curiously has support for Mixed Integer Programming.

Cbc is probably better than GLPK MI. However, GLPK is way easier to get installed. Just brew install glpk and pip install cvxopt.

Getting cbc working was a bit of a journey. I had to actually BUILD Cylp (god forbid) and fix some problems.

Special Ordered Set constraints are useful for piecewise linear approximations. The SOS2 constraints take a set of variables and make it so that only two consecutive ones can be nonzero at a time. Solvers often have built in support for them, which can be much faster than just blasting them with general constraints. I did it by adding a binary variable for every consecutive pair. Then these binary variables suppress the continuous ones. Setting the sum of the binary variables to 1 makes only one able to be nonzero.

 

One downside is that every evaluation of these non linear functions requires a new set of integer and binary variables, which is possibly quite expensive.

For some values of total time steps and step length, the solver can go off the rails and never return.

At the moment, the solve is not fast enough for real time control with CBC (~ 2s). I wonder how much some kind of warm start might or more fiddling with heuristics, or if I had access to the built in SOS2 constraints rather than hacking it in. Also commercial solvers are usually faster. Still it is interesting.

Blue is angle, orange is the applied torque. The torque is running up against the limits I placed on it.

Gettin’ that Robot some Tasty Apples: Solving a simple geometrical puzzle in Z3 python

At work there is a monthly puzzler.

“Design a robot that can pick up all 9 apples arranged on a 3 by 3 rectangular grid, and spaced 1m apart. The robot parts they have are faulty. The robot can only turn three times”

I think the intent of the puzzle is that the robot can drive in forward and reverse, but only actually turn 3 times. It’s not very hard to do by hand. I decided to take a crack at this one using Z3 for funzies. Z3 is an SMT solver. It is capable of solving a interesting wide variety of problems.

I interpret this as “find 4 lines that touch all points in the grid, such that each subsequent line intersects.”

It is fairly easy to directly translate this into a Z3 model.

A couple comments:

If we ask z3 to use only 3 lines, it returns unsat. Try to prove that by hand.

However, If the robot is on the projective plane, it is possible with 3 lines. It only needs to drive to infinity and turn twice. All lines intersect exactly once on the projective plane. How convenient.

The problem only seems somewhat difficult to computerize because of the seemingly infinite nature of geometry. If we only consider the lines that touch at least two points, all possible robot paths becomes extremely enumerable. Is there a proof that we only need these lines?

Another interesting approach might be to note that the points are described by the set of equations x*(x-1)*(x-2)=0 and y*(y-1)*(y-2)=0. I think we could then possibly use methods of nonlinear algebra (Groebner bases) to find the lines. Roughly an ideal containment question? Don’t have this one fully thought out yet. I think z3 might be doing something like this behind the scenes.