Gröbner Bases and Optics

Geometrical optics is a pretty interesting topic. It really is almost pure geometry/math rather than physics.

Huygens principle says that we can compute the propagation of a wave by considering the wavelets produced by each point on the wavefront separately.

In physical optics, this corresponds to the linear superposition of the waves produced at each point by a propagator function \int dx' G(x,x'). In geometrical optics, which was Huygens original intent I think (these old school guys were VERY geometrical), this is the curious operation of taking the geometrical envelope of the little waves produced by each point.

The gist of Huygens principles. Ripped from wikipedia The envelope is an operation on a family of curves. You can approximate it by a finite difference procedure. Take two subsequent curves close together in the family, find their intersection. Keep doing that and the join up all the intersections. This is roughly the approach I took in this post

Taking the envelope of a family of lines. Ripped from wikipedia

You can describe a geometrical wavefront implicitly with an equations \phi(x,y) = 0. Maybe the wavefront is a circle, or a line, or some wacky shape.

The wavelet produced by the point x,y after a time t is described implicitly by d(\vec{x},\vec{x'})^2 - t^2 = (x-x')^2 + (y-y')^2 - t^2 = 0.

This described a family of curves, the circles produced by the different points of the original wavefront. If you take the envelope of this family you get the new wavefront at time t.

How do we do this? Grobner bases is one way if we make \phi a polynomial equation. For today’s purposes, Grobner bases are a method for solving multivariate polynomial equations. Kind of surprising that such a thing even exists. It’s actually a guaranteed terminating algorithm with horrific asymptotic complexity. Sympy has an implementation. For more on Grobner bases, the links here are useful Especially check out the Cox Little O’Shea books

The algorithm churns on a set of multivariate polynomials and spits out a new set that is equivalent in the sense that the new set is equal to zero if and only if the original set was. However, now (if you ask for the appropriate term ordering) the polynomials are organized in such a way that they have an increasing number of variables in them. So you solve the 1-variable equation (easy), and substitute into the 2 variable equation. Then that is a 1-variable equation, which you solve (easy) and then you substitute into the three variable equation, and so on. It’s analogous to gaussian elimination.

So check this out

from sympy import *

x1, y1, x2, y2, dx, dy = symbols('x1, y1, x2, y2, dx, dy')

def dist(x,y,d):
    return x**2 + y**2 - d**2

e1 = dist(x1,y1,2) # the original circle of radius 2
e2 = dist(x1-x2 ,y1 - y2 , 1) # the parametrized wavefront after time 1

# The two envelope conditions.
e3 = diff(e1,x1)*dx + diff(e1,y1)*1
e4 = diff(e2,x1)*dx + diff(e2,y1)*1

envelope = groebner([e1,e2,e3,e4], y1, x1, dx, dy, x2, y2, order='lex')[-1]
plot_implicit(e1, show=False)
plot_implicit(envelope, show = True)

The envelope conditions can be found by introducing two new differential variables dx, and dy. They are constrained to lie tangent to the point on the original circle by the differential equation e3, and then we require that the two subsequent members of the curve family intersect by the equation e4. Hence we get the envelope. Ask for the Grobner basis with that variable ordering gives us an implicit equations for x2, y2 with no mention of the rest if we just look at the last equation of the Grobner basis.

I set arbitrarily dy = 1 because the overall scale of them does not matter, only the direction. If you don’t do this, the final equation is scaled homogenously in dy.

This does indeed show the two new wavefronts at radius 1 and radius 3.

Original circle radius = 2

x1**2 + y1**2 – 4 = 0
Evolved circles found via grobner basis.

(x2**2 + y2**2 – 9)*(x2**2 + y2**2 – 1) = 0

Here’s a different one of a parabola using e1 =  y1 – x1 + x1**2

Original curve y1 – x1 + x1**2 = 0
After 1 time step.

16*x2**6 – 48*x2**5 + 16*x2**4*y2**2 + 32*x2**4*y2 + 4*x2**4 – 32*x2**3*y2**2 – 64*x2**3*y2 + 72*x2**3 + 32*x2**2*y2**3 + 48*x2**2*y2 – 40*x2**2 – 32*x2*y2**3 + 16*x2*y2**2 – 16*x2*y2 – 4*x2 + 16*y2**4 + 32*y2**3 – 20*y2**2 – 36*y2 – 11 = 0

The weird lumpiness here is plot_implicit’s inability to cope, not the actually curve shape Those funky prongs are from a singularity that forms as the wavefront folds over itself.

I tried using a cubic curve x**3 and the grobner basis algorithm seems to crash my computer. 🙁 Perhaps this is unsurprising. ?

I don’t know how to get the wavefront to go in only 1 direction? As is, it is propagating into the past and the future. Would this require inequalities? Sum of squares optimization perhaps?


It’s been suggested on reddit that I’d have better luck using other packages, like Macaulay2, MAGMA, or Singular. Good point

Also it was suggested I use the Dixon resultant, for which there is an implementation in sympy, albeit hidden. Something to investigate

Another interesting angle might be to try to go numerical with a homotopy continuation method with phcpy

or pybertini

Flappy Bird as a Mixed Integer Program

My birds.

Mixed Integer Programming is a methodology where you can specify convex (usually linear) optimization problems that include integer/boolean variables.

Flappy Bird is a game about a bird avoiding pipes.

We can use mixed integer programming to make a controller for Flappy Bird. Feel free to put this as a real-world application in your grant proposals, people.

While thinking about writing a MIP for controlling a lunar lander game, I realized how amenable to mixed integer modeling flappy bird is. Ben and I put together the demo on Saturday. You can find his sister blog post here.

The bird is mostly in free fall, on parabolic trajectories. This is a linear dynamic, so it can directly be expressed as a linear constraint. It can discretely flap to give itself an upward impulse. This is a boolean force variable at every time step. Avoiding the ground and sky is a simple linear constraint. The bird has no control over its x motion, so that can be rolled out as concrete values. Because of this, we can check what pipes are relevant at time points in the future and putting the bird in the gap is also a simple linear constraint.

There are several different objectives one might want to consider and weight. Perhaps you want to save the poor birds energy and minimize the sum of all flaps cvx.sum(flap). Or perhaps you want to really be sure it doesn’t hit any pipes by maximizing the minimum distance from any pipe. Or perhaps minimize the absolute value of the y velocity, which is a reasonable heuristic for staying in control. All are expressible as linear constraints. Perhaps you might want a weighted combo of these. All things to fiddle with.

There is a pygame flappy bird clone which made this all the much more slick. It is well written and easy to understand and modify. Actually figuring out the appropriate bounding boxes for pipe avoidance was finicky. Figuring out the right combo of bird size and pipe size is hard, combined with computer graphics and their goddamn upside down coordinate system.

We run our solver in a model predictive control configuration. Model predictive control is where you roll out a trajectory as an optimization problem and resolve it at every action step. This turns an open loop trajectory solve into a closed loop control, at the expense of needing to solve a perhaps very complicated problem in real time. This is not always feasible.

My favorite mip modeling tool is cvxpy. It gives you vectorized constraints and slicing, which I love. More tools should aspire to achieve numpy-like interfaces. I’ve got lots of other blog posts using this package which you can find in my big post list the side-bar 👀.

The github repo for the entire code is here:

And here’s the guts of the controller:

import cvxpy as cvx
import numpy as np
import matplotlib.pyplot as plt

N = 20 # time steps to look ahead
path = cvx.Variable((N, 2)) # y pos and vel
flap = cvx.Variable(N-1, boolean=True) # whether or not the bird should flap in each step
last_solution = [False, False, False]
last_path = [(0,0),(0,0)]

SKY = 0
GROUND = (512*0.79)-1

def getPipeConstraints(x, y, lowerPipes):
    constraints = []
    for pipe in lowerPipes:
        dist_from_front = pipe['x'] - x - BIRDDIAMETER
        dist_from_back = pipe['x'] - x + PIPEWIDTH
        if (dist_from_front < 0) and (dist_from_back > 0):
            #print(pipe['y'] + BIRDDIAMETER,  pipe['y'] + PIPEGAPSIZE)
            constraints += [y <= (pipe['y'] - BIRDDIAMETER)] # y above lower pipe
            constraints += [y >= (pipe['y'] - PIPEGAPSIZE)] # y below upper pipe
    #if len(constraints) > 0:
    return constraints

def solve(playery, playerVelY, lowerPipes):
    global last_path, last_solution

    pipeVelX = -4
    playerAccY    =   1   # players downward accleration
    playerFlapAcc =  -14   # players speed on flapping

    # unpack variables
    y = path[:,0]

    vy = path[:,1]

    c = [] #constraints
    c += [y <= GROUND, y >= SKY]
    c += [y[0] == playery, vy[0] == playerVelY]

    x = PLAYERX
    xs = [x]
    for t in range(N-1):
        dt = t//10 + 1
        #dt = 1
        x -= dt * pipeVelX
        xs += [x]
        c += [vy[t + 1] ==  vy[t] + playerAccY * dt + playerFlapAcc * flap[t] ]
        c += [y[t + 1] ==  y[t] + vy[t + 1]*dt ]
        c += getPipeConstraints(x, y[t+1], lowerPipes)

    #objective = cvx.Minimize(cvx.sum(flap)) # minimize total fuel use
    objective = cvx.Minimize(cvx.sum(flap) + 10* cvx.sum(cvx.abs(vy))) # minimize total fuel use

    prob = cvx.Problem(objective, c)
        prob.solve(verbose = False, solver="GUROBI")
        last_path = list(zip(xs, y.value))
        last_solution = np.round(flap.value).astype(bool)
        return last_solution[0], last_path
        last_solution = last_solution[1:]
        last_path = [((x-4), y) for (x,y) in last_path[1:]]
        return last_solution[0], last_path

I think it is largely self explanatory but here are some notes. The dt = t//10 + 1 thing is about decreasing our time resolution the further out from the current time step. This increases the time horizon without the extra computation cost. Intuitively modeling accuracy further out in time should matter less. The last_solution stuff is for in case the optimization solver fails for whatever reason, in which case it’ll follow open-loop the last trajectory it got.

Bits and Bobbles

  • We changed the dynamics slightly from the python original to make it easier to model. We found this did not change the feel of the game. The old dynamics were piecewise affine though, so are also modelable using mixed integer programming. . Generally check out the papers coming out of the Tedrake group. They are sweet. Total fanboy over here.
  • The controller as is is not perfect. It fails eventually, and it probably shouldn’t. A bug? Numerical problems? Bad modeling of the pipe collision? A run tends to get through about a hundred pipes before something gets goofy.
  • Since we had access to the source code, we could mimic the dynamics very well. How robust is flappy bird to noise and bad modeling? We could add wind, or inaccurate pipe data.
  • Unions of Convex Region. Giving the flappy bird some x position control would change the nature of the problem. We could easily cut up the allowable regions of the bird into rectangles, and represent the total space as a union of convex regions, which is also MIP representable.
  • Verification – The finite difference scheme used is crude. It is conceivable for the bird to clip a pipe. Since basically we know the closed form of the trajectories, we could verify that the parabolas do not intersect the regions. For funzies, maybe use sum of squares optimization?
  • Realtime MIP. Our solver isn’t quite realtime. Maybe half real time. One might pursue methods to make the mixed integer program faster. This might involve custom branching heuristics, or early stopping. If one can get the solver fast enough, you might run the solver in parallel and only query a new path plan every so often.
  • 3d flappy bird? Let the bird rotate? What about a platformer (Mario) or lunar lander? All are pretty interesting piecewise affine systems.
  • Is this the best way to do this? Yes and no. Other ways to do this might involve doing some machine learning, or hardcoding a controller that monitors the pipe locations and has some simple feedback. You can find some among the forks of FlapPyBird. I have no doubt that you could write these quickly, fiddle with them and get them to work better and faster than this MIP controller. However, for me there is a difference between could work and should work. You can come up with a thousand bizarre schemes that could work. RL algorithms fall in this camp. But I have every reason to believe the MIP controller should work, which makes it easier to troubleshoot.

The Classical Coulomb Gas as a Mixed Integer Quadratic Program

The coulomb gas is a model of electrostatics where you take the discreteness of charge into account. That is what makes it hard compared to the continuous charge problem. Previously, I’ve used mixed integer programming to find lowest energy states of the ising model. This is even more obvious application of mixed integer programming to a physics model.

We ordinarily consider electric charge to be a continuum, but it isn’t. It comes in chunks of the electron charge. Historically, people didn’t even know that for quite a while. It is usually a reasonable approximation for most purposes to consider electric charge to be continuous

If you consider a network of capacitors cooled to the the level that there is not enough thermal energy to borrow to get an electron to jump, the charges on the capacitors will be observably discretized. With a sufficiently slow cooling to this state, the charges should arrange themselves into the lowest energy state.

The coulomb gas model also is of interest due to its connections to the XY model, which I’ve taken a stab at with mixed integer programming before. The coulomb gas models the energy of vortices in that model. I think the connection between the models actually requires a statistical or quantum mechanical context though, whereas we’ve been looking at the classical energy minimization.

We can formulate the classical coulomb gas problem very straightforwardly as a mixed integer quadratic program. We can easily include an externally applied field and a charge conservation constraint if we so desire within the framework.

We write this down in python using the cvxpy library, which has a built in free MIQP solver, albeit not a very good one. Commercial solvers are probably quite a bit better.

import cvxpy as cvx
import numpy as np
#grid size
N = 5
# charge variables
q = cvx.Variable( N*N ,integer=True)

# build our grid
x = np.linspace(0,1,N) 
y = np.linspace(0,1,N) 
X, Y = np.meshgrid(x,y, indexing='ij')
x1 = X.reshape(N,N,1,1)
y1 = Y.reshape(N,N,1,1)
x2 = X.reshape(1,1,N,N)
y2 = Y.reshape(1,1,N,N)
eps = 0.1 / N #regularization factor for self energy. convenience mostly
V = 1. / ((x1-x2)**2 + (y1-y2)**2 + eps**2)** ( 1 / 2)
V = V.reshape( (N*N,N*N) )

U_external = 100 * Y.flatten() # a constant electric field in the Y direction 
energy = cvx.quad_form(q,V) + U_external*q

# charge conservation constraint
prob = cvx.Problem(cvx.Minimize(energy),[cvx.sum(q)==0])
res = prob.solve(verbose=True)


#plotting junk

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D  
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

ax.plot_surface(X, Y, q.value.reshape((N,N)))
A plot of charge in a constant external electric field.

The results seems reasonable. It makes sense for charge to go in the direction of the electric field. Going to the corners makes sense because then like charges are far apart. So this might be working. Who knows.

Interesting places to go with this:

Prof Vanderbei shows how you can embed an FFT to enable making statements about both the time and frequency domain while keeping the problem sparse. The low time/memory N log(N) complexity of the FFT is reflected in the spasity of the resulting linear program.

Here’s a sketch about what this might look like. Curiously, looking at the actual number of nonzeros in the problem matrices, there are way too many. I am not sure what is going on. Something is not performing as i expect in the following code

import cvxpy as cvx
import numpy as np
import scipy.fftpack # import fft, ifft
def swizzle(x,y):
    assert(x.size == y.size)
    N = x.size
    s =  np.exp(-2.j * np.pi * np.arange(N) / N)
    #ret = cvx.hstack( [x + s*y, x - s*y]) 
    return cvx.hstack( [x - s*y, x + s*y]) 

def fft(x):
    N = x.size
    #assert(2**int(log2(N)) == N) # power of 2

    if N == 1:
        return x, []
        y = cvx.reshape(x,(N//2,2))
        c = []
        even, ce = fft(y[:,0])
        c += ce
        odd, co = fft(y[:,1])
        c += co
        z = cvx.Variable(N, complex=True)
        c += [z == swizzle(even,odd)]
        return z, c

N = 256
x = cvx.Variable(N, complex=True)
z, c = fft(x)
v = np.zeros(N) #np.ones(N) #np.random.rand(N)
v[0]= 1
c += [x == v]
prob = cvx.Problem( cvx.Minimize(1), c)
res = prob.solve(verbose=True)
print(scipy.fftpack.fft(v) - z.value)

The equivalent dense DFT:

x = cvx.Variable(N, complex=True)
fred = cvx.Variable(N, complex=True)
c = [fred == np.exp(-2.j * np.pi * np.arange(N).reshape((N,1)) * np.arange(N).reshape((1,N)) / N) * x]
prob = cvx.Problem( cvx.Minimize(1), c)

It would be possible to use a frequency domain solution of the interparticle energy rather than the explicit coulomb law form. Hypothetically this might increase the sparsity of the problem.

It seems very possible to me in a similar manner to embed a fast multipole method or barnes-hut approximation within a MIQP. Introducing explicit charge summary variables for blocks would create a sparse version of the interaction matrix. So that’s fun.

Annihilating My Friend Will with a Python Fluid Simulation, Like the Cur He Is

A color version

As part of my random walk through topics, I was playing with shaders. I switched over to python because I didn’t feel like hacking out a linear solver.

There are a number of different methods for simulating fluids. I had seen Dan Piponi’s talk on youtube where he describes Jos Stam’s stable fluids and thought it made it all seem pretty straightforward. Absolutely PHENOMENAL talk. Check it out! We need to

  • 1. apply forces. I applied a gravitational force proportional to the total white of the image at that point
  • 2. project velocity to be divergence free. This makes it an incompressible fluid. We also may want to project the velocity to be zero on boundaries. I’ve done a sketchy job of that. This requires solving a Laplace equation. A sketch:
    • v_{orig} = v_{incomp} + \nabla w
    • \nabla \cdot v_{incomp}=0
    • \nabla ^2 w = \nabla \cdot v_{orig}. Solve for w
    • v_{incomp}=v_{orig} - \nabla w
  • 3. Advect using interpolation. Advect backwards in time. Use v(x,t+dt) \approx v(x-v(x)*dt,t) rather than v(x,t+dt) \approx v(x,t)+dv(x,t)*dt . This is intuitively related to the fact that backward Euler is more stable than forward Euler. Numpy had a very convenient function for this step

Given those basic ideas, I was flying very much by the seat of my pants. I wasn’t really following any other codes. I made this to look cool. It is not a scientific calculation. I have no idea what the error is like. With a critical eye, I can definitely spot weird oscillatory artifacts. Maybe a small diffusion term would help?

When you solve for the corrections necessary to the velocity to make it incompressible \nabla \cdot v = 0 , add the correction ONLY to the original field. As part of the incompressible solving step, you smooth out the original velocity field some. You probably don’t want that. By adding only the correction to the original field, you maintain the details in the original

When you discretize a domain, there are vertices, edges, and faces in your discretization. It is useful to think about upon which of these you should place your field values (velocity, pressure, electric field etc). I take it as a rule of thumb that if you do the discretization naturally, you are more likely to get a good numerical method. For example, I discretized my velocity field in two ways. A very natural way is on the edges of the graph. This is because velocity is really a stand in for material flux. The x component of the velocity belongs on the x oriented edges of the graph on the y component of velocity on the y oriented edges. If you count edges, this means that they actually in an arrays with different dimensions. There are one less edges than there are vertices.

This grid is 6×4 of vertices, but the vx edges are 6×3, and the vy edges are 5×4. The boxes are a grid 5×3.

For each box, we want to constrain that the sum of velocities coming out = 0. This is the discretization of the \nabla \cdot v = 0 constraint. I’m basing this on my vague recollections of discrete differential geometry and some other things I’ve see. That fields sometimes live on the edges of the discretization is very important for gauge fields, if that means anything to you. I did not try it another way, so maybe it is an unnecessary complication.

Since I needed velocities at the vertices of the grid, I do have a simple interpolation step from the vertices to the edges. So I have velocities being computed at both places. The one that is maintained between iterations lives on the vertices.

At small resolutions the code runs at real time. For the videos I made, it is probably running ~10x slower than real time. I guarantee you can make it better. Good enough for me at the moment. An FFT based Laplace solver would be fast. Could also go into GPU land? Multigrid? Me dunno.

I tried using cvxpy for the incompressibility solve, which gives a pleasant interface and great power of adding constraints, but wasn’t getting good results. i may have had a bug

This is some code just to perform the velocity projection step and plot the field. I performed the projection to 0 on the boundaries using an alternating projection method (as discussed in Piponi’s talk), which is very simple and flexible but inefficient. It probably is a lot more appropriate when you have strange changing boundaries. I could have built the K matrix system to do that too.

The input velocity field is spiraling outwards (not divergence free, there is a fluid source in the center)
We project out the divergence free part of that velocity field, and project it such that the velocity does not point out at the boundary. Lookin good.

Presolving the laplacian matrix vastly sped up each iteration. Makes sense.

Why in gods name does sparse.kron_sum have the argument ordering it does? I had a LOT of trouble with x vs y ordering. np.meshgrid wasn’t working like I though it should. Images might have a weird convention? What a nightmare. I think it’s ok now? Looks good enough anyway.

And here is the code to make the video. I converted to image sequence to an mp4 using ffmpeg

ffmpeg -i ./%06d.jpg will.mp4
import numpy as np
import cv2
from scipy import interpolate
from scipy import ndimage
from scipy import sparse
import scipy.sparse.linalg as linalg # import spsolve

#ffmpeg -i ./%06d.jpg will.mp4

### Setup 

dt = 0.01

img = cv2.imread('will.jpg')
# make image smaller to make run faster if you want
#img = cv2.pyrDown(img)
#img = cv2.pyrDown(img)

Nx = img.shape[0]
Ny = img.shape[1] 

v = np.zeros((Nx,Ny,2))

x = np.linspace(0,1,Nx, endpoint=False) 
y = np.linspace(0,1,Ny, endpoint=False) 
X, Y = np.meshgrid(x,y, indexing='ij')

#v[:,:,0] = -Y + 0.5
#v[:,:,1] = X - 0.5

#### Build necessary derivative and interpolation matrices

def build_grad(N):
    # builds N-1 x N finite difference matrix 
    data = np.array([-np.ones(N), np.ones(N-1)])
    return sparse.diags(data, np.array([0, 1]), shape= (N-1,N))

# gradient operators
gradx = sparse.kron(build_grad(Nx), sparse.identity(Ny-1))
grady = sparse.kron(sparse.identity(Nx-1), build_grad(Ny))

def build_K(N):
    # builds N-1 x N - 1   K second defivative matrix
    data = np.array([-np.ones(N-2), 2*np.ones(N-1), -np.ones(N-2)])
    diags =np.array([-1, 0, 1])
    return sparse.diags(data, diags )

# Laplacian operator . Zero dirichlet boundary conditions
# why the hell is this reversed? Sigh.
K = sparse.kronsum(build_K(Ny),build_K(Nx))
Ksolve = linalg.factorized(K)

def build_interp(N):
    data = np.array([np.ones(N)/2., np.ones(N-1)/2.])
    diags = np.array([0, 1])
    return sparse.diags(data, diags, shape= (N-1,N))
interpy = sparse.kron(sparse.identity(Nx), build_interp(Ny))
interpx = sparse.kron(build_interp(Nx), sparse.identity(Ny))

def projection_pass(vx,vy):
    # alternating projection? Not necessary. In fact stupid. but easy.
    vx[0,:] = 0
    vx[-1,:] = 0
    vy[:,0] = 0
    vy[:,-1] = 0
    vx[0,:] /= 2.
    vx[-1,:] /= 2.
    vy[:,0] /= 2.
    vy[:,-1] /= 2.

    div = + #calculate divergence

    w = Ksolve(div.flatten())#spsolve(K, div.flatten()) #solve potential

for i in range(300):
    #while True: #
    v[:,:,0] += np.linalg.norm(img,axis=2) * dt * 0.001 # gravity force

    # interpolate onto edges
    vx =[:,:,0].flatten()).reshape(Nx,Ny-1)
    vy =[:,:,1].flatten()).reshape(Nx-1,Ny)
    # project incomperessible

    dvx, dvy = projection_pass(vx,vy)

    #interpolate change back to original grid
    v[:,:,0] -=,Ny)
    v[:,:,1] -=,Ny)

    #advect everything
    coords = np.stack( [(X - v[:,:,0]*dt)*Nx, (Y - v[:,:,1]*dt)*Ny], axis=0)
    for j in range(3):
        img[:,:,j] = ndimage.map_coordinates(img[:,:,j], coords, order=5, mode='wrap')
    v[:,:,0] = ndimage.map_coordinates(v[:,:,0], coords, order=5, mode='wrap')
    v[:,:,1] = ndimage.map_coordinates(v[:,:,1], coords, order=5, mode='wrap')


    k = cv2.waitKey(30) & 0xFF
    if k == ord(' '):


Code to produce the velocity graphs above.

import cvxpy as cvx
import numpy as np
from scipy import sparse
from scipy.sparse.linalg import spsolve
import matplotlib.pyplot as plt

Nx = 50
Ny = 30
# velcitites live on the edges
vx = np.zeros((Nx,Ny-1))
vy = np.zeros((Nx-1,Ny))
x = np.linspace(0,1,Nx, endpoint=False) 
y = np.linspace(0,1,Ny, endpoint=False) 
X, Y = np.meshgrid(x,y, indexing='ij')
vx[:,:] = Y[:,1:] - 1 + X[:,1:]
vy[:,:] = -X[1:,:]  + Y[1:,:]

data = np.array([-np.ones(Nx), np.ones(Nx-1)])
diags = np.array([0, 1])
grad = sparse.diags(data, diags, shape= (Nx-1,Nx))

gradx = sparse.kron(grad, sparse.identity(Ny-1))

data = np.array([-np.ones(Ny), np.ones(Ny-1)])
diags = np.array([0, 1])
grad = sparse.diags(data, diags, shape= (Ny-1,Ny))

grady = sparse.kron(sparse.identity(Nx-1), grad)

data = np.array([-np.ones(Nx-2), 2*np.ones(Nx-1), -np.ones(Nx-2)])
diags =np.array([-1, 0, 1])
Kx = sparse.diags(data, diags )

data = np.array([-np.ones(Ny-2), 2*np.ones(Ny-1), -np.ones(Ny-2)])
diags =np.array([-1, 0, 1])
Ky = sparse.diags(data, diags )

K = sparse.kronsum(Ky,Kx)

plt.quiver(X[1:,1:], Y[1:,1:], vx[1:,:] + vx[:-1,:], vy[:,1:] + vy[:,:-1])

for i in range(60):
    div = +
    print("div size", np.linalg.norm(div))
    div = div.reshape(Nx-1,Ny-1)

    w = spsolve(K, div.flatten())

    vx -=,Ny-1)
    vy -=,Ny)
    # alternating projection? Not necessary. In fact stupid. but easy.
    div = +
    print("new div size", np.linalg.norm(div))
    vx[0,:] = 0
    vx[-1,:] = 0
    vy[:,0] = 0
    vy[:,-1] = 0
div = +
print("new div size", np.linalg.norm(div))

plt.quiver(X[1:,1:], Y[1:,1:], vx[1:,:] + vx[:-1,:], vy[:,1:] + vy[:,:-1])

I should give a particle in cell code a try



GregTJ found this post useful and made an even better simulator! Nice

Proving some Inductive Facts about Lists using Z3 python

Z3 is an SMT solver which has a good rep. Here are some excellent tutorials.

SMT stands for satisfiability modulo theories. The exact nature of power of these kinds of solvers has been and is still hazy to me. I have known for a long time that they can slam sudoku or picross or other puzzles, but what about more infinite or logic looking things? I think I may always be hazy, as one can learn and discover more and more encoding tricks to get problems and proofs that you previously thought weren’t solvable into the system. It’s very similar to learning how to encode to linear programming solvers in that respect.

SMT solvers combine a general smart search procedure with a ton of specialized solvers for particular domains, like linear algebra, polynomials, linear inequalities and more.

The search procedure goes by the name DPLL(T). It is an adjustment of the procedure of SAT solvers, which are very powerful and fast. SAT solvers find an assignment of boolean variables in order to make a complicated boolean expression true, or to eventually find that it can never be made true. SAT solvers work on the principle of guessing and deducing. If a OR b needs to be true and we already know a is false, we can deduce b must be true. When the deduction process stalls, we just guess and then backtrack if it doesn’t work out. This is the same process you use manually in solving Sudoku.

The modern era of SAT solvers came when a couple new tricks vastly extended their power. In particular Conflict Driven Clause Learning (CDCL), where when the solver finds itself in a dead end, it figures out the choices it made that put it in the dead end and adds a clause to its formula so that it will never make those choices again.

SMT works by now having the boolean variables of the SAT solver contain inner structure, like the boolean p actually represents the fact x + y < 5. During the search process it can take the pile of booleans that have been set to true and ask a solver (maybe a linear programming solver in this case) whether those facts can all be made true in the underlying theory. This is an extra spice on top of the SAT search.

Something that wasn’t apparent to me at first is how important the theory of uninterpreted formulas is to SMT solvers. It really is their bread and butter. This theory is basically solved by unification, which is the fairly intuitive process of finding assignments to variables to make a set of equations true. If I ask how to make fred(x,4) = fred(7,y), obviously the answer is y=4, x=7. That is unification. Unification is a completely syntax driven way to deduce facts. It starts to give you something quite similar to first order logic.

I was also under the impression that quantifiers \forall, \exists were available but heavily frowned upon. I don’t think this is quite true. I think they are sort of a part of the entire point of the SMT solver now, although yes, they are rather flaky. There are a number of methods to deal with the quantifier, but one common one is to look for a pattern or parts of the subformula, and instantiate a new set of free variables for all of the quantified ones and add the theorem every time the patterns match. This is called E-matching.

Here are a couple tutorials on proving inductive facts in SMT solvers. You kind of have to hold their hand a bit.

SMT solvers queries usually have the flavor of finding something, in this case a counterexample. The idea is that you try to ask for the first counterexample where induction failed. Assuming that proposition P was true for (n-1), find n such that P is not true. If you can’t find it, then the induction has gone through.

And here is some code where I believe I’m showing that some simple list functions like reverse, append, and length have some simple properties like \forall t. rev (rev(t)) == t .

from z3 import *

# Very useful reference

f = Function('f', IntSort(), IntSort())
s = Solver()
#s.add(f(True) == False, f(False) == True)
x = Int('x')
s.add(ForAll([x], f(x) >= x)) #> and < do not seem to be returning

# Rolling my own list data type
# Z3 has a built in which will probably be better?
s = Solver()

L = Datatype("MyList")
L.declare("Cons", ("hd", IntSort()), ("tail", L))
L = L.create()

t = Const('t', L)
u = Const('u', L)

y = Int('y')

rev = Function('reverse', L, L)
app = Function('append', L, L, L)
leng = Function('length', L, IntSort())

#defining my functions. Micro Haskell, BABY
s.add( leng(L.Nil) == 0 )
s.add(  ForAll([u,y],  leng(L.Cons(y,u)) == 1 + leng(u))) #  patterns = [leng(L.Cons(y,u))] 

s.add(  ForAll([u], app(L.Nil, u) == u)) 
s.add(  ForAll([t, u, y] , app(L.Cons(y,t), u)  ==  L.Cons(y, app(t, u ))))

s.add( rev(L.Nil) == L.Nil)
s.add(  ForAll([y,t],  rev(L.Cons(y,t)) == app(rev(t), L.Cons(y, L.Nil))))

print("proving leng(t) >= 0")
#s.add( Or(And(t == L.Cons(y,u),  leng(u) >= 0 ), t == L.Nil))
s.add(  Not(leng(t) >= 0 ))
s.add( Implies(t == L.Cons(L.hd(t), L.tail(t)),  leng(L.tail(t)) >= 0 ))

#s.add( leng(app(L.Nil, u)) == leng(u) )

print("prove length is preserved under app.")
s.add( leng(app(t,u)) != leng(t) + leng(u))
s.add( Implies(t == L.Cons(L.hd(t), L.tail(t)),   leng(app(L.tail(t),u)) == leng(L.tail(t)) + leng(u)  ))


print("reverse preserves length")
#Lemma Could place in clause with the above proof of this theorem
s.add( ForAll([t,u], leng(app(t,u)) == leng(t) + leng(u)   )) #subgoal needed
s.add( leng(rev(t)) != leng(t))
s.add( Implies(t == L.Cons(L.hd(t), L.tail(t)),   leng(rev(L.tail(t))) == leng(L.tail(t)) ))


print("reverse reverse = id")
s.add( ForAll( [t,u], rev(app(t,u)) ==  app(rev(u), rev(t)) ) ) #subgoal needed
s.add( rev(rev(t)) != t )
s.add( Implies(t == L.Cons(L.hd(t), L.tail(t)),   rev(rev(L.tail(t))) == L.tail(t) ))


#s.add(t != L.Nil ) 
#s.add( ForAll([t], rev(L.Cons(y,t)) ==   ) , rev(L.Nil) == L.Nil)
#s.add( leng(L.Cons()) == 0 )
#s.add(  ForAll(y,  leng(L.Cons(y,L.Nil)) == 1 + leng(L.Nil)))
#s.add(  Not( leng(app(t,u))  == leng(t) + leng(u) )) 

# prove length of app + nil is same
#s.add( leng(app(t,L.Nil)) != leng(t))
#s.add( Implies(t == L.Cons(L.hd(t), L.tail(t)),   leng(app(L.tail(t),L.Nil)) == leng(L.tail(t))))

#s.add( app(L.Nil,L.Nil) == L.Nil)
#s.add( app(t,L.Nil) != t)
#s.add( Implies(t == L.Cons(L.hd(t), L.tail(t)),   app(L.tail(t),L.Nil) == L.tail(t)))

#s.add(Or(t == L.Nil , And( t == L.Cons(y, u),   app(u,L.Nil) == u)  ))
#s.add( Implies(t == L.Nil, leng(app(L.Nil, u)) == leng(u) ))
# all of these don't work
#s.add( Implies(u == L.tail(t),  leng(u) >= 0 ))
#s.add( ForAll(u, Implies(t == L.Cons(y,u),  leng(u) >= 0 )))

def induction(freevar, f, construtors?):
    x = Int('x')
    return And(Not(f(x)), 

# controller synthesis
pos = Array(Real, 10) 
for t in range(10):
    s.add(pos[t] == pos[t] +  v * dt)
    s.add(v[t] == v[t] + f(pos[t]) * dt)
    s.add(f(pos[t] <= 1)) # constraints on force
s.add(ForAll([init], Implies(And(init <= 1, pos[0] == init), pos[9] <= 0.1) ))
s.add(Forall[x], f(x) == a * x) # parametrizing. 
#s.set("produce-proofs", True
#s.set("mbqi", False)

A Basic Branch and Bound Solver in Python using Cvxpy

Branch and bound is a useful problem solving technique. The idea is, if you have a minimization problem you want to solve, maybe there is a way to relax the constraints to an easier problem. If so, the solution of the easier problem is a lower bound on the possible solution of the hard problem. If the solution of the easier problem just so happens to also obey the more constrained hard problem, then it must also be the solution to the hard problem. You can also use the lower bound coming from a relaxed problem to prune your search tree for the hard problem. If even the relaxed problem doesn’t beat the current best found, don’t bother going down that branch.

A standard place this paradigm occurs is in mixed integer programming. The relaxation of a binary constraint (either 0 or 1) can be all the values in between (any number between 0 and 1). If this relaxed problem can be expressed in a form amenable to a solver like a linear programming solver, you can use that to power the branch and bound search, also using returned solutions for possible heuristics.

I built a basic version of this that uses cvxpy as the relaxed problem solver. Cvxpy already has much much faster mixed integer solvers baked in (which is useful to make sure mine is returning correct results), but it was an interesting exercise. The real reason I’m toying around is I kind of want the ability to add custom branching heuristics or inspect and maintain the branch and bound search tree, which you’d need to get into the more complicated guts of the solvers bound to cvxpy to get at. Julia might be a better choice.

A somewhat similar (and better) project is which doesn’t use cvxpy explicitly, but does have the branch and bound control in the python layer of the solver. There are also other projects that can use fairly arbitrary solvers like Bonmin

As a toy problem I’m using a knapsack problem where we have objects of different sizes and different values. We want to maximize the value while keeping the total size under the capacity of the bag. This can be phrased linearly like so: \max v \cdot x s.t. \sum_i s_i x_i<= capacity , x \in {0,1}. The basic heuristic I’m using is to branch on variables that are either 0 or 1 in even the relaxed solution. The alternative branch hopefully gets pruned fast.

import cvxpy as cvx
import copy
from heapq import *
import numpy as np
import itertools
counter = itertools.count() 

class BBTreeNode():
    def __init__(self, vars = set(), constraints = [], objective=0, bool_vars=set()):
        self.vars = vars
        self.constraints = constraints
        self.objective = objective
        self.bool_vars = bool_vars
        self.children = []
    def buildProblem(self):
        prob = cvx.Problem(cvx.Minimize(self.objective), self.constraints) #i put Minimize, just so you know that I'm assuming it
        return prob
    def is_integral(self):
        return all([abs(v.value - 1) <= 1e-3 or abs(v.value - 0) <= 1e-3 for v in self.bool_vars])
    def branch(self):
        children = []
        for b in [0,1]:
                n1 = copy.deepcopy(self) #yeesh. Not good performance wise, but is simple implementation-wise
                v = n1.heuristic() #dangerous what if they don't do the same one? I need to do it here though because I need access to copied v.
                n1.constraints.append( v == b ) # add in the new binary constraint
                n1.children = []
                n1.bool_vars.remove(v) #remove binary constraint from bool var set
                n1.vars.add(v) #and add it into var set for later inspection of answer
                #self.children.append(n1)   # eventually I might want to keep around the entire search tree. I messed this up though
        return children
    def heuristic(self):
        # a basic heuristic of taking the ones it seems pretty sure about
        return min([(min(1 - v.value, v.value) , i, v) for i, v in enumerate(self.bool_vars)])[2]
    def bbsolve(self):
        root = self
        res = root.buildProblem().solve()
        heap = [(res, next(counter), root)]
        bestres = 1e20 # a big arbitrary initial best objective value
        bestnode = root # initialize bestnode to the root
        nodecount = 0
        while len(heap) > 0: 
            nodecount += 1 # for statistics
            print("Heap Size: ", len(heap))
            _, _, node = heappop(heap)
            prob = node.buildProblem()
            res = prob.solve()
            print("Result: ", res)
            if prob.status not in ["infeasible", "unbounded"]:
                if res > bestres - 1e-3: #even the relaxed problem sucks. forget about this branch then
                    print("Relaxed Problem Stinks. Killing this branch.")
                elif node.is_integral(): #if a valid solution then this is the new best
                        print("New Best Integral solution.")
                        bestres = res
                        bestnode = node
                else: #otherwise, we're unsure if this branch holds promise. Maybe it can't actually achieve this lower bound. So branch into it
                    new_nodes = node.branch()
                    for new_node in new_nodes:
                        heappush(heap, (res, next(counter), new_node ) )  # using counter to avoid possible comparisons between nodes. It tie breaks
        print("Nodes searched: ", nodecount)      
        return bestres, bestnode

# a simple knapsack problem. we'll want to minimize the total cost of having each of these items, with different sizes.
# Use a random problem instance
N = 20
prices = -np.random.rand(N)
sizes = np.random.rand(N)
x = cvx.Variable(N)
constraints = []
constraints += [x <= 1, 0 <= x] #The relaxation of the binary variable constraint
constraints += [sizes*x <= 5] # total size of knapsack is 5
objective = prices * x
bool_vars = {x[i] for i in range(N)} 
root = BBTreeNode(constraints = constraints, objective= objective, bool_vars = bool_vars)
res, sol = root.bbsolve()
print(sorted(list([(, v.value) for v in sol.bool_vars] + [(, v.value) for v in sol.vars] ) ))

# For comparison let's do the same problem using a built in mixed integer solver.
x = cvx.Variable(N, boolean=True)
constraints = []
constraints += [x <= 1, 0 <= x]
constraints += [sizes*x <= 5]
objective = prices * x
prob = cvx.Problem(cvx.Minimize(objective),constraints)

This is at least solving the problem fairly quickly. It needs better heuristics and to be sped up, which is possible in lots of ways. I was not trying to avoid all performance optimizations. It takes maybe 5 seconds, whereas the cvxpy solver is almost instantaneous.

Nodes searched:  67
[('var0[0]', 0.9999999958228145), ('var0[10]', -1.2718338055950193e-08), ('var0[11]', -1.3726395012104872e-08), ('var0[12]', 0.9999999982326986), ('var0[13]', 0.9999999973744331), ('var0[14]', 0.9999999988156902), ('var0[15]', -1.1908085711772973e-08), ('var0[16]', 0.9999999903780872), ('var0[17]', 0.9999999863334883), ('var0[18]', -1.1481655920777931e-08), ('var0[19]', 0.9999999996667646), ('var0[1]', 0.9999999969549299), ('var0[2]', 0.9999999979596141), ('var0[3]', -9.282428548104736e-09), ('var0[4]', -1.1378022795740783e-08), ('var0[5]', 0.9999999868240312), ('var0[6]', 0.9999999995068807), ('var0[7]', 0.9999999995399617), ('var0[8]', 0.9999999859520627), ('var0[9]', 0.9999999948062767)]
[ 1.00000000e+00  1.00000000e+00  1.00000000e+00 -1.44435650e-12
 -1.88491321e-12  1.00000000e+00  1.00000000e+00  1.00000000e+00
  1.00000000e+00  1.00000000e+00 -7.11338729e-13  1.99240081e-13
  1.00000000e+00  1.00000000e+00  1.00000000e+00 -1.48697107e-12
  1.00000000e+00  1.00000000e+00 -1.75111698e-12  1.00000000e+00]

Edit : I should investigate the Parameter functionality of cvxpy. That would make a make faster version than the one above based on deepcopy. If you made the upper and lower vectors on the binary variables parameters, you could restrict the interval to 0/1 without rebuilding the problem or copying everything.

#rough sketch
b = cvx.Variable(N) 
u = cvx.Parameter(N) 
u.value = np.ones(N)
l = cvx.Parameter(N) 
l.value = np.zeros(N)
constraints += [b <= u, l <= b]
# change l.value and u.value in search loop.

Mixed Integer Programming & Quantization Error

I though of another fun use case of mixed integer programming the other day. The quantization part of a digital to analog converter is difficult to analyze by the techniques taught in a standard signals course (linear analysis, spectral techniques, convolution that sort of thing). The way it is usually done is via assuming the quantization error is a kind of randomized additive noise.

Mixed Integer programming does have the ability to directly encode some questions about this quantization though. We can directly encode the integer rounding relations by putting the constraint that the quantized signal is exactly +-1/2 a quantization interval away from the original signal. Then we can run further analysis on the signals and compare them. For example, I wrote down a quick cosine transform. Then I ask for the worst case signal that leads to the most error on the quantized transform versus the transform of the unquantized signal. My measure of worst case performance was the sum of the difference of the two transforms. I chose this because it is tractable as a mixed integer linear program. Not all reasonable metrics one might want will be easily encodable in a mixed integer framework it seems. Some of them are maximizing over a convex function, which is naughty. (for example trying to maximize the L2 error \sum|x-y|^2 )

In a variant of this, it is also possible to directly encode the digital signal process in terms of logic gate construction and compare that to the intended analog transform, although this will be a great deal more computational expensive.

import cvxpy as cvx
import numpy as np
import matplotlib.pyplot as plt
N = 32
d = 15
x = cvx.Variable(N)
z = cvx.Variable(N, integer=True)
y = z / d # quantized signal. ~31 values between -1 and 1

constraints = []
constraints += [-1 <= y,  y <= 1, -1 <= x, x <= 1]

# roudning constraint. z = round(127*x)
constraints += [-0.5 <= d*x - z, d*x - z <= 0.5] 
# an oppurtnitu for the FFT technique of Vanderbei

n = np.kron(np.arange(N), np.arange(N)).reshape((N,N))
U = np.cos( np.pi * n / N)
kx = U @ x  
ky = U @ y

#hmmmm. Yes. Unfrotunately  I am asking a hard question?
# finding the minimum distortion signal is easy. finding the maximum distortion appears to be hard.
#This is not a convex objective : objective = cvx.Maximize(cvx.sum_squares(kx - ky))
# however, the following linearization does give us a maximally bad signal in a sense.
objective = cvx.Maximize(cvx.sum(kx - ky))

prob = cvx.Problem(objective, constraints)
plt.title("Original Signal")
plt.plot(x.value, label="analog signal")
plt.plot(y.value, label="quantized signal")
plt.title("Error of Transform")
plt.plot(kx.value - ky.value)
plt.title("Cosine transform")
plt.plot(kx.value, label="original signal")
plt.plot(ky.value, label="quantized signal")

This is interesting as a relatively straightforward technique for the analysis of quantization errors.

This also might be an interesting place to use the techniques of Vanderbei . He does a neato trick where he partially embeds the FFT algorithm into an optimization problem by adding auxiliary variables. Despite the expense of adding these variables, it greatly increases the sparsity of the constraint matrices, which will probably be a win. I wonder if one might do something similar with a Fast Multipole Method , Barnes Hut, or Wavelet transform? Seems likely. Would be neat, although I'm not sure what for. I was thinking of simulating the coulomb gas. That seems like a natural choice. Oooh. I should do that.

Solving the XY Model using Mixed Integer Optimization in Python

There are many problems in physics that take the form of minimizing the energy. Often this energy is taken to be quadratic in the field. The canonical example is electrostatics. The derivative of the potential \phi gives the electric field E. The energy is given as \int (|\nabla \phi|^2 + \phi \rho) d^3 x . We can encode a finite difference version of this (with boundary conditions!) directly into a convex optimization modelling language like so.

import cvxpy as cvx
import numpy as np
import matplotlib.pyplot as plt
import scipy.linalg
from mpl_toolkits import mplot3d

N = 10

# building a finite difference matrix. It is rectangle of size N x (N-1). It maps from the vertices of our grid to the lines in between them, where derivatives live.
col = np.zeros(N)
col[0] = -1
col[1] = 1
delta = scipy.linalg.toeplitz(col, np.zeros(N-1)).T

gradx = np.kron(delta, np.eye(N))
grady = np.kron(np.eye(N), delta)

# a variable for our potential
phi = cvx.Variable((N, N))

# vectorization is useful. It flattens out the x-y 2Dness.
phivec = cvx.vec(phi)
gradxvec = gradx.reshape(-1, N*N)
gradyvec = grady.reshape(-1, N*N)

V = cvx.sum_squares(gradxvec * phivec) + cvx.sum_squares(gradyvec * phivec)

constraints = []
# boundary conditions. Dirichlet
constraints += [phi[:,0] == 0, phi[0,:] == 0, phi[:,-1] == 0, phi[-1,:] == 0 ]

# fixed charge density rho
rho = np.zeros((N,N))
rho[N//2,N//2] = 1

# objective is energy
objective = cvx.Minimize(V + cvx.sum(cvx.multiply(rho,phi)))
prob = cvx.Problem(objective, constraints)
res = prob.solve()

# Plotting 
x = np.linspace(-6, 6, N)
y = np.linspace(-6, 6, N)

X, Y = np.meshgrid(x, y)
fig = plt.figure()

ax = plt.axes(projection='3d')
ax.plot_surface(X, Y, phi.value, rstride=1, cstride=1,
                cmap='viridis', edgecolor='none')
The resulting logarithm potential

It is noted rarely in physics, but often in the convex optimization world that the barrier between easy and hard problems is not linear vs. nonlinear, it is actually more like convex vs. nonconvex. Convex problems are those that are bowl shaped, on round domains. When your problem is convex, you can’t get caught in valleys or on corners, hence greedy local methods like gradient descent and smarter methods work to find the global minimum. When you differentiate the energy above, it results in the linear Laplace equations \nabla^2 \phi = \rho. However, from the perspective of solvability, there is not much difference if we replace the quadratic energy with a convex alternative.

def sum_abs(x):
    return cvx.sum(cvx.abs(x))
V = cvx.sum_squares(gradxvec * phivec) + cvx.sum_squares(gradyvec * phivec)
V = cvx.sum(cvx.huber(gradxvec * phivec)) + cvx.sum(cvx.huber(gradyvec * phivec))
V = cvx.pnorm(gradxvec * phivec, 3) + cvx.pnorm(gradyvec * phivec, 3)

a = 1 
dxphi = gradxvec * phivec
dyphi = gradyvec * phivec
V = cvx.sum(cvx.maximum( -a - dxphi, dxphi - a, 0 )) + cvx.sum(cvx.maximum( -a - dyphi, dyphi - a, 0 ))

Materials do actually have non-linear permittivity and permeability, this may be useful in modelling that. It is also possible to consider the convex relaxation of truly hard nonlinear problems and hope you get the echoes of the phenomenology that occurs there.

Another approach is to go mixed integer. Mixed Integer programming allows you to force that some variables take on integer values. There is then a natural relaxation problem where you forget the integer variables have to be integers. Mixed integer programming combines a discrete flavor with the continuous flavor of convex programming. I’ve previously shown how you can use mixed integer programming to find the lowest energy states of the Ising model but today let’s see how to use it for a problem of a more continuous flavor.

As I’ve described previously, in the context of robotics, the non-convex constraint that variables lie on the surface of a circle can be approximated using mixed integer programming. We can mix this fairly trivially with the above to make a global solver for the minimum energy state of the XY model. The XY model is a 2d field theory where the value of the field is constrained to lie on a circle. It is a model of a number of physical systems, such as superconductivity, and is the playground for a number of interesting phenomenon, like the Kosterlitz-Thouless phase transition. Our encoding is very similar to the above except we make two copies of the field phi and we then force them to lie on a circle. I’m trying to factor out the circle thing into my library cvxpy-helpers, which is definitely a work in progress.

import cvxpy as cvx
import numpy as np
import matplotlib.pyplot as plt
import scipy.linalg
from mpl_toolkits import mplot3d
from cvxpyhelpers import cvxpyhelpers as mip

N = 6

# building a finite difference matrix. It is rectangle of size Nx(N-1). It maps from the vertices of our grid to the lines in between them, where derivatives live.
col = np.zeros(N)
col[0] = -1
col[1] = 1
delta = scipy.linalg.toeplitz(col, np.zeros(N-1)).T

gradx = np.kron(delta, np.eye(N))
grady = np.kron(np.eye(N), delta)

# a variable for our potential
phix = cvx.Variable((N, N))
phiy = cvx.Variable((N, N))

# vectorization is useful. It flattens out the x-y 2Dness.
phixvec = cvx.vec(phix)
phiyvec = cvx.vec(phiy)
gradxvec = gradx.reshape(-1, N*N)
gradyvec = grady.reshape(-1, N*N)

def sum_abs(x):
    return cvx.sum(cvx.abs(x))

#V = cvx.sum_squares(gradxvec * phixvec) + cvx.sum_squares(gradyvec * phixvec) + cvx.sum_squares(gradxvec * phiyvec) + cvx.sum_squares(gradyvec * phiyvec) 
V = sum_abs(gradxvec * phixvec) + sum_abs(gradyvec * phixvec) + sum_abs(gradxvec * phiyvec) + sum_abs(gradyvec * phiyvec) 
constraints = []
# coundary conditions. Nice and vortexy.
constraints += [phix[:,0] >= 0.9, phiy[0,1:-1] >= 0.9, phix[:,-1] <= -0.9, phiy[-1,1:-1] <= -0.9 ]

for i in range(N):
    for j in range(N):
        x, y, c =
        constraints += c
        constraints += [phix[i,j] == x]
        constraints += [phiy[i,j] == y]

# fixed charge density rho
rho = np.ones((N,N)) * 0.01
rho[N//2,N//2] = 1

# objective is energy
objective = cvx.Minimize(V + cvx.sum(cvx.multiply(rho,phix)))
prob = cvx.Problem(objective, constraints)
print("solving problem")
res = prob.solve(verbose=True, solver=cvx.GLPK_MI)

# Plotting 
x = np.linspace(-6, 6, N)
y = np.linspace(-6, 6, N)

X, Y = np.meshgrid(x, y)
fig = plt.figure()

plt.quiver(X,Y, phix.value, phiy.value)

Now, this isn't really an unmitigated success as is. I switched to an absolute value potential because GLPK_MI needs it to be linear. ECOS_BB works with a quadratic potential, but it was not doing a great job. The commercial solvers (Gurobi, CPlex, Mosek) are supposed to be a great deal better. Perhaps switching to Julia, with it's richer ecosystem might be a good idea too. I don't really like how the solution of the absolute value potential looks. Also, even at such a small grid size it still takes around a minute to solve. When you think about it, it is exploring a ridiculously massive space and still doing ok. There are hundreds of binary variables in this example. But there is a lot of room for tweaking and I think the approach is intriguing.


  • Can one do steepest descent style analysis for low energy statistical mechanics or quantum field theory?
  • Is the trace of the mixed integer program search tree useful for perturbative analysis? It seems intuitively reasonable that it visits low lying states
  • The Coulomb gas is a very obvious candidate for mixed integer programming. Let the charge variables on each grid point = integers. Then use the coulomb potential as a quadratic energy. The coulomb gas is dual to the XY model. Does this exhibit itself in the mixed integer formalism?
  • Coulomb Blockade?
  • Nothing special about the circle. It is not unreasonable to make piecewise linear approximations or other convex approximations of the sphere or of Lie groups (circle is U(1) ). This is already discussed in particular about SO(3) which is useful in robotic kinematics and other engineering topics.

Edit: /u/mofo69extreme writes:


By absolute value potential, I mean using |del phi| as compared to a more ordinary quadratic |del phi|2.

This is where I'm getting confused. As you say later, you are actually using two fields, phi_x and phi_y. So I'm guessing your potential is the "L1 norm"

|del phi| = |del phi_x| + |del phi_y|

? This is the only linear thing I can think of.

I don't feel that the exact specifics of the XY model actually matter all the much.

You should be careful here though. A key point in the XY model is the O(2) symmetry of the potential: you can multiply the vector (phi_x,phi_y) by a 2D rotation matrix and the Hamiltonian is unchanged. You have explicitly broken this symmetry down to Z_4 if your potential is as I have written above. In this case, the results of the famous JKKN paper and this followup by Kadanoff suggest that you'll actually get a phase transition of the so-called "Ashkin-Teller" universality class. These are actually closely related to the Kosterlitz-Thouless transitions of the XY model; the full set of Ashkin-Teller phase transitions actually continuously link the XY transition with that of two decoupled Ising models.

You should still get an interesting phase transition in any case! Just wanted to give some background, as the physics here is extremely rich. The critical exponents you see will be different from the XY model, and you will actually get an ordered Z_4 phase at low temperatures rather than the quasi-long range order seen in the low temperature phase of the XY model. (You should be in the positive h_4 region of the bottom phase diagram of Figure 1 of the linked JKKN paper.)"

These are some interesting points and references.

2D Robot Arm Inverse Kinematics using Mixed Integer Programming in Cvxpy

Mixed Integer programming is crazy powerful. You can with ingenuity encode many problems into it. The following is a simplification of the ideas appearing in . They do 3d robot arms, I do 2d. I also stick to completely linear approximations.

The surface of a circle is not a convex shape. If you include the interior of a circle it is. You can build a good approximation to the circle as polygons. A polygon is the union of it’s sides, each of which is a line segment. Line sgements are convex set. Unions of convex sets are encodable using mixed integer programming. What I do is sample N regular positions on the surface of a circle. These are the vertices of my polygon. Then I build boolean indicator variables for which segment we are on. Only one of them is be nonzero \sum s_i == 1. If we are on a segment, we are allowed to make positions x that interpolate between the endpoints x_i of that segment x = \lambda_1 x_1 + \lambda_2 x_2, where \lambda_i >= 0 and \sum \lambda=1. These \lambda are only allowed to be nonzero if we are on the segment, so we suppress them with the indicator variables \lambda_i <= s_i + s_{i+1}. That’s the gist of it.

image link

Given a point on the circle (basically sines and cosines of an angle) we can build a 2d rotation matrix R from it. Then we can write down the equations connecting subsequent links on the arm. p_{i+1}=p_{i} +Rl. By using global rotations with respect to the world frame, these equations stay linear. That is a subtle point. p and R are variables, whereas l is a constant describing the geometry of the robot arm. If we instead used rotation matrices connecting frame i to i+1 these R matrices would compound nonlinearly.

All in all, pretty cool!

import cvxpy as cvx
import numpy as np
import matplotlib.pyplot as plt

# builds a N sided polygon approximation of a circle for MIP. It is the union of the segments making up the polygon
# might also be useful to directly encode arcs. for joint constraint limits.
def circle(N):
    x = cvx.Variable()
    y = cvx.Variable()
    l = cvx.Variable(N) #interpolation variables
    segment = cvx.Variable(N,boolean=True) #segment indicator variables, relaxing the boolean constraint gives the convex hull of the polygon
    angles = np.linspace(0, 2*np.pi, N, endpoint=False)
    xs = np.cos(angles) #we're using a VRep
    ys = np.sin(angles)

    constraints = []
    constraints += [x == l*xs, y == l*ys] # x and y are convex sum of the corner points
    constraints += [cvx.sum(l) == 1, l <= 1, 0 <= l] #interpolations variables. Between 0 and 1 and sum up to 1
    constraints += [cvx.sum(segment) == 1] # only one indicator variable can be nonzero

    constraints += [l[N-1] <= segment[N-1] + segment[0]] #special wrap around case
    for i in range(N-1):
        constraints += [l[i] <= segment[i] + segment[i+1]] # interpolation variables suppressed
    return x, y, constraints
x, y, constraints = circle(8)
objective = cvx.Maximize(x-0.8*y)
prob = cvx.Problem(objective, constraints)
res = prob.solve(solver=cvx.GLPK_MI, verbose=True)

# build a 2d rotation matrix using circle
def R(N):    
    constraints = []
    c, s, constraint = circle(N) # get cosines and sines from a circle
    constraints += constraint

    r = cvx.Variable((2,2)) # build rotation matrix
    constraints += [r[0,0] == c, r[0,1] == s] 
    constraints += [r[1,0] == -s, r[1,1] == c]
    return r, constraints
    # np.array([[c , s],                [-s, c]])

#robot linkage of differing arm length
link_lengths = [0.5,0.2,0.3,0.4]
pivots = []
Rs = []
N = 8
constraints = []
origin = np.zeros(2)

p1 = origin
for l in link_lengths:
    R1, c = R(8)    
    constraints += c

    p2 = cvx.Variable(2)
    constraints += [p2 == p1 + R1*np.array([l,0])] # R1 is global rotation with respect to world frame. This is important. It is what makes the encoding linear.

    p1 = p2


end_position = np.array([-0.5, .7])
constraints += [p2 == end_position]

objective = cvx.Maximize(1)
prob = cvx.Problem(objective, constraints)
res = prob.solve(solver=cvx.GLPK_MI, verbose=True)


print(list(map(lambda r: r.value, Rs)))

p1 = origin
for l, r in zip(link_lengths, Rs):
    p2 = p1 + r.value@np.array([l,0])
    plt.plot([p1[0],p2[0]], [p1[1],p2[1]], marker='o'),

    p1 = p2


plt.plot(x.value, label='x')
plt.plot(v.value, label= 'v')
plt.plot(collision.value, label = 'collision bool')

Casadi – Pretty Damn Slick

Casadi is something I’ve been aware of and not really explored much. It is a C++ / python / matlab library for modelling optimization problems for optimal control with bindings to IPOpt and other solvers. It can produce C code and has differentiation stuff. See below for some examples after I ramble.

I’ve enjoyed cvxpy, but cvxpy is designed specifically for convex problems, of which many control problems are not.

Casadi gives you a nonlinear modelling language and easy access to IPOpt, an interior point solver that works pretty good (along with some other solvers, many of which are proprietary however).

While the documentation visually looks very slick I actually found it rather confusing in contents at first. I’m not sure why. Something is off.

You should download the “example pack” folder. Why they don’t have these in html on the webpage is insane to me.

It also has a bunch of helper classes for DAE building and other things. They honestly really put me off. The documentation is confusing enough that I am not convinced they give you much.

The integrator classes give you access to external smart ode solvers from the Sundials suite. They give you good methods for difficult odes and dae (differential algebraic equations, which are ODEs with weird constraints like x^1 + y^1 == 1) Not clear to me if you can plug those in to an optimization, other than by a shooting method.

Casadi can also output C which is pretty cool.

I kind of wondered about Casadi vs Sympy. Sympy has lot’s of general purpose symbolic abilities. Symbolic solving, polynomial smarts, even some differential equation understanding. There might be big dividends to using it. But it is a little harder to get going. I feel like there is an empty space for a mathemtical modelling language that uses sympy as it’s underlying representation. I guess monkey patching sympy expressions into casadi expression might not be so hard. Sympy can also output fast C code. Sympy doesn’t really have any support for sparseness that I know of.

As a side note, It can be useful to put these other languages into numpy if you need extended reshaping abilities. The other languages often stop at matrices, which is odd to me.

Hmm. Casadi actually does have access to mixed integer programs via bonmin (and commercial solvers). That’s interesting. Check out lotka volterra minlp example!topic/casadi-users/8xCHmP7UmpI

The optim interface makes some of this look better. optim.minimize and subject_to. Yeah, this is more similar to the interfaces I’m used to. It avoids the manual unpacking of the solution and the funky feel of making everything into implicit == 0 expressions.

Here is a simple harmonic oscillator example using the more raw casadi interface. x is positive, v is velocity, u is a control force. I’m using a very basic leap frog integration. You tend to have to stack things into a single vector with vertcat when building the final problem.

from casadi import *
import matplotlib.pyplot as plt

g = 9.8
N = 100

x = SX.sym('x',N)
v = SX.sym('v', N)
u = SX.sym('u', N-1)
#theta = SX('theta', N)
#thdot = SX('thetadot', N)

dt = 0.1
constraints = [x[0]-1, v[0]] # expressions that must be zero
for i in range(N-1):
    constraints += [x[i+1] - (x[i] + dt * v[i]) ]
    constraints += [v[i+1] - (v[i] - dt * x[i+1] + u[i] * dt)]

cost = sum([x[i]*x[i] for i in range(N)]) + sum([u[i]*u[i] for i in range(N-1)])

nlp = {'x':vertcat(x,v,u), 'f':cost, 'g':vertcat(*constraints)}
S = nlpsol('S', 'ipopt', nlp)
r = S(lbg=0, ubg=0) # can also give initial solutiuon hint, some other things
x_opt = r['x']
x = x_opt[:N]
v = x_opt[N:2*N]
u = x_opt[2*N:]
#u_opt = r['u']
print('x_opt: ', x_opt)

Let’s use the opti interface, which is pretty slick. Here is a basic cartpole

from casadi import *
import matplotlib.pyplot as plt

g = 9.8
N = 100

opti = casadi.Opti()

x = opti.variable(N)
v = opti.variable(N)
theta = opti.variable(N)
dtheta = opti.variable(N)
u = opti.variable(N-1)

opti.subject_to( u <= 1) 
opti.subject_to( -1 <= u) 
opti.subject_to( x <= 2) 
opti.subject_to( -2 <= x) 
opti.subject_to(x[0] == 0)
opti.subject_to(v[0] == 0)
opti.subject_to(theta[0] == 0)
opti.subject_to(dtheta[0] == 0)

dt = 0.05
for i in range(N-1):
    opti.subject_to( x[i+1] == x[i] + dt * (v[i]))
    opti.subject_to( v[i+1] == v[i] + dt * (x[i+1] + u[i]))
    opti.subject_to( theta[i+1] == theta[i] + dt * (dtheta[i]))
    opti.subject_to( dtheta[i+1] == dtheta[i] + dt * (u[i] * cos(theta[i+1]) - sin(theta[i+1]) ))

opti.minimize( sum1(sin(theta)))

opti.solver("ipopt") #,p_opts, s_opts)
sol = opti.solve()
plt.plot(sol.value(x), label="x")
plt.plot(sol.value(u), label="u")
plt.plot(sol.value(theta), label="theta")
p = opti.parameter()
opti.set_value(p, 3)

Very fast. Very impressive. Relatively readable code. I busted this out in like 15 minutes. IPopt solves the thing in the blink of an eye (about 0.05s self reported). Might be even faster if I warm start it with a good solution, as I would in online control (which may be feasible at this speed). Can add the initial condition as a parameter to the problem

I should try this on an openai gym.

Haskell has bindings to casadi.