Categorical Combinators for Convex Optimization and Model Predictive Control using Cvxpy

We’re gonna pump this well until someone MAKES me stop.

This particular example is something that I’ve been trying to figure out for a long time, and I am pleasantly surprised at how simple it all seems to be. The key difference with my previous abortive attempts is that I’m not attempting the heavy computational lifting myself.

We can take pointful DSLs and convert them into point-free category theory inspired interface. In this case a very excellent pointful dsl for convex optimization: cvxpy. Some similar and related posts converting dsls to categorical form

A convex optimization problem optimizes a convex objective function with constraints that define a convex set like polytopes or balls. They are polynomial time tractable and shockingly useful. We can make a category out of convex optimization problems. We consider some variables to be “input” and some to be “output”. This choice is somewhat arbitrary as is the case for many “relation” feeling things that aren’t really so rigidly oriented.

Convex programming problems do have a natural notion of composition. Check out the last chapter of Rockafeller, where he talks about the convex algebra of bifunctions. Instead of summing over the inner composition variable like in Vect \sum_j A_{ij}B_{jk}, or existentializing like in Rel \{ (a,c) |\exists b. (a,b)\in A, (b,c) \in B \}, we instead minimize over the inner composition variable min_y A(x,y) + B(y,z). These are similar operations in that they all produce bound variables.

The identity morphism is just the simple constraint that the input variables equal to output variables with an objective function of 0. This is an affine constraint, hence convex.

In general, if we ignore the objective part entirely by just setting it to zero, we’re actually working in a very computationally useful subcategory of Rel, ConvexRel, the category of relations which are convex sets. Composition corresponds to an existential operation, which is quickly solvable by convex optimization techniques. In operations research terms, these are feasibility problems rather than optimization problems. Many of the combinators do nothing to the objective.

The monoidal product just stacks variables side by side and adds the objectives and combines the constraints. They really are still independent problems. Writing things in this way opens up a possibility for parallelism. More on that some other day.

We can code this all up in python with little combinators that return the input, output, objective, constraintlist. We need to hide these in inner lambdas for appropriate fresh generation of variables.

Now for a somewhat more concrete example: Model Predictive control of an unstable (linearized) pendulum.

Model predictive control is where you solve an optimization problem of the finite time rollout of a control system online. In other words, you take measurement of the current state, update the constraint in an optimization problem, ask the solver to solve it, and then apply the force or controls that the solver says is the best.

This gives the advantage over the LQR controller in that you can set hard inequality bounds on total force available, or positions where you wish to allow the thing to go. You don’t want your system crashing into some wall or falling over some cliff for example. These really are useful constraints in practice. You can also include possibly time dependent aspects of the dynamics of your system, possibly helping you model nonlinear dynamics of your system.

I have more posts on model predictive control here http://www.philipzucker.com/model-predictive-control-of-cartpole-in-openai-gym-using-osqp/ http://www.philipzucker.com/flappy-bird-as-a-mixed-integer-program/

Here we model the unstable point of a pendulum and ask the controller to find forces to balance the pendulum.

We can interpret the controller in GraphCat in order to produce a diagram of the 10 step lookahead controller. This is an advantage of the categorical style a la compiling to categories

We can also actually run it in model predictive control configuration in simulation.

And some curves. How bout that.

Bits and Bobbles

LazySets https://github.com/JuliaReach/LazySets.jl

ADMM – It’s a “lens”. I’m pretty sure I know how to do it pointfree. Let’s do it next time.

The minimization can be bubbled out to the top is we are always minimizing. If we mix in maximization, then we can’t and we’re working on a more difficult problem. This is similar to what happens in Rel when you have relational division, which is a kind of universal quantification \forall . Mixed quantifier problems in general are very tough. These kinds of problems include games, synthesis, and robustness. More on this some other day.

Convex-Concave programming minimax https://web.stanford.edu/~boyd/papers/pdf/dccp_cdc.pdf https://web.stanford.edu/class/ee364b/lectures/cvxccv.pdf

The minimization operation can be related to the summation operation by the method of steepest descent in some cases. The method of steepest descent approximates a sum or integral by evaluating it at it’s most dominant position and expanding out from there, hence converts a linear algebra thing into an optimization problem. Examples include the connection between statistical mechanics and thermodynamics and classical mechanics and quantum mechanics.

Legendre Transformation ~ Laplace Transformation via steepest descent https://en.wikipedia.org/wiki/Convex_conjugate yada yada, all kinds of good stuff.

Intersection is easy. Join/union is harder. Does MIP help?

def meet(f,g):
   def res():
      x,y,o,c = f()
      x1,y1,o1,c1 = g()
      return x,y,o+o1, c+ c1 + [x ==  x1, y == y1]
   return res

Quantifier elimination

MIP via adjunction

Model Predictive Control of CartPole in OpenAI Gym using OSQP

A continuation of this post http://www.philipzucker.com/osqp-sparsegrad-fast-model-predictive-control-python-inverted-pendulum/

I was having difficulty getting the model predictive control from a previous post working on an actual cartpole. There are a lot more unknown variables in that case and other issues (the thing has a tendency to destroy itself). I was kind of hoping it would just work. So I realized that I should get it working in simulation.

I did not copy the simulation code of the openai cartpole https://github.com/openai/gym/blob/master/gym/envs/classic_control/cartpole.py  which gives some small amount of credence that the MPC might generalize to a real system.

For the sake of honesty, I’m still at the point where my controller freaks out about 1/3 of the time. I do not really understand why.

 

Looks damn good here though huh.

A problem I had for a while was the Length of my pole was off by a factor of 2. It still kind of worked, especially if nearly balanced (although with a lot of oscillation, which in hindsight was a clue something wasn’t tuned in right).

There are some useful techniques for manipulating gym. You can set some parameters from the outside, like starting positions and thresholds. Also you can mangle your way into continuous force control rather than just left right commands (wouldn’t it be cool to use Integer programming for that? Silly, but cool).

There is still a bunch of trash in here from me playing around with parameters. I like to keep it real (and lazy).

import gym
from mpc import MPC
import numpy as np
env = gym.make('CartPole-v0')
env.env.theta_threshold_radians = np.pi * 2
env.env.x_threshold = 5 

start_theta = 0#-np.pi + 0.4# + 0.1#2 * np.pi #np.pi+0.4

mpc = MPC(0.5,0,start_theta,0) 
action = 0
for i_episode in range(1):
    observation = env.reset()
    env.env.state[2] = start_theta - np.pi
    for t in range(500):
        env.render()
        #print(observation)
        #action = env.action_space.sample()
        observation, reward, done, info = env.step(action)
        a = mpc.update(observation[0] + 0.5, observation[1], observation[2]+np.pi, observation[3])
        env.env.force_mag = abs(a) #min(100, abs(a))
        #print(a)
        if a < 0:
        	action = 0
        else:
        	action = 1
        if done:
        	pass
            #print("Episode finished after {} timesteps".format(t+1))
            #print(observation)
            #print(dir(env))
            #break
print(mpc.calcQ().reshape((5,50)))
print(observation)
print(dir(env))

One problem was that originally I had the pole just want to go to pi. But if it swings the other direction or many swings, that is bad and it will freak out. So I changed it to try to go the the current nearest multiple of pi, which helps.

Fiddling with the size of the regulation does have a significant effect and the relative size of regulation for x, v, f, omega. I am doing a lot of that search dumbly. I should probably do some kind of automatic.

Loosening the constraints on v and x seems to help stability.

I think weighting the angle at the end of the episode slightly more helps. That’s why I used linspace for the weight on the angle.

I’ve had a lot of problem with the answer coming back as infeasible from OSQP. I feel like it probably shouldn’t be and that is the solver’s problem?

Two things help: sometimes the cart does go out of the allowable range. The optimization probably will try to go all the way to the boundaries since it is useful. And since there is some mismatch between the actual dynamics and my model, it will go outside. So I heavily reduce the constraints for the first couple time steps. It takes a couple. 4 seems to work ok. It should want to apply forces during those four steps to get it back in range anyhow.

Even then it still goes infeasible sometimes and I don’t know why. So in that case, I reduce the required accuracy to hopefully at least get something that makes sense. That is what the eps_rel stuff is about. Maybe it helps. Not super clear. I could try a more iterative increasing of the eps?

https://groups.google.com/forum/#!topic/osqp/BzEqWQR2dYY

 

import sparsegrad.forward as forward
import numpy as np
import osqp
import scipy.sparse as sparse


class MPC():
	def __init__(self, x0, v0, theta0, thetadot0):
		self.N = 50
		self.NVars  = 5
		T = 8.0
		self.dt = T/self.N
		dt = self.dt
		self.dtinv = 1./dt
		#Px = sparse.eye(N)
		#sparse.csc_matrix((N, N)) 
		# The three deifferent weigthing matrices for x, v, and external force
		reg = sparse.eye(self.N)*0.05
		z = sparse.bsr_matrix((self.N, self.N))
		# sparse.diags(np.arange(N)/N)
		pp = sparse.diags(np.linspace(1,7,self.N)) # sparse.eye(self.N)
		P = sparse.block_diag([reg, 10*reg ,pp, reg, 10*reg]) #1*reg,1*reg])
		#P[N,N]=10
		self.P = P
		THETA = 2
		q = np.zeros((self.NVars, self.N))
		q[THETA,:] = np.pi
		q[0,:] = 0.5
		#q[N,0] = -2 * 0.5 * 10
		q = q.flatten()
		q= -P@q
		#u = np.arr

		self.x = np.random.randn(self.N,self.NVars).flatten()
		#x = np.zeros((N,NVars)).flatten()
		#v = np.zeros(N)
		#f = np.zeros(N)


		#print(f(ad.seed(x)).dvalue)




		A, l, u = self.getAlu(self.x, x0, v0, theta0, thetadot0)
		self.m = osqp.OSQP()
		self.m.setup(P=P, q=q, A=A, l=l, u=u , time_limit=0.1, verbose=False)# , eps_rel=1e-2 #  **settings # warm_start=False, eps_prim_inf=1e-1
		self.results = self.m.solve()
		print(self.results.x)
		for i in range(100):
			self.update(x0, v0, theta0, thetadot0)

	def calcQ(self):
		THETA = 2
		q = np.zeros((self.NVars, self.N))
		thetas = np.reshape(self.x, (self.NVars, self.N))[THETA,:]
		thetas = thetas - thetas % (2*np.pi) + np.pi
		q[THETA,:] = thetas[0] #np.pi #thetas
		q[0,:] = 0.5
		#q[N,0] = -2 * 0.5 * 10
		q = q.flatten()
		q = -self.P@q
		return q

	def update(self, x0, v0,theta0, thetadot0):
		A, l, u = self.getAlu(self.x, x0, v0, theta0, thetadot0)
		#print(A.shape)
		#print(len(A))
		q = self.calcQ()
		self.m.update(q=q, Ax=A.data, l=l, u=u)
		self.results = self.m.solve()
		if self.results.x[0] is not None:
			self.x = np.copy(self.results.x)
			self.m.update_settings(eps_rel=1e-3)
		else:
			#self.x += np.random.randn(self.N*self.NVars)*0.1 # help get out of a rut?
			self.m.update_settings(eps_rel=1.1)
			print("failed")
			return 0
		return self.x[4*self.N+1]



	def constraint(self, var, x0, v0, th0, thd0):
		#x[0] -= 1
		#print(x[0])
		g = 9.8
		L = 1.0
		gL = g * L
		m = 1.0 # doesn't matter
		I = L**2 / 3
		Iinv = 1.0/I
		dtinv = self.dtinv
		N = self.N

		x = var[:N]
		v = var[N:2*N]
		theta = var[2*N:3*N]
		thetadot = var[3*N:4*N]
		a = var[4*N:5*N]
		dynvars = (x,v,theta,thetadot)
		xavg, vavg, thetavg, thdotavg = map(lambda z: (z[0:-1]+z[1:])/2, dynvars)
		dx, dv, dthet, dthdot = map(lambda z: (z[1:]-z[0:-1])*dtinv, dynvars)
		vres = dv - a[1:]
		xres = dx - vavg
		torque = (-gL*np.sin(thetavg) + a[1:]*L*np.cos(thetavg))/2
		thetdotres = dthdot - torque*Iinv
		thetres = dthet - thdotavg

		return x[0:1]-x0, v[0:1]-v0, theta[0:1]-th0, thetadot[0:1]-thd0, xres,vres, thetdotres, thetres
		#return x[0:5] - 0.5



	def getAlu(self, x, x0, v0, th0, thd0):
		N = self.N
		gt = np.zeros((2,N))
		gt[0,:] = -0.1 # 0.15 # x is greaer than 0.15
		gt[1,:] = -3 #-1 #veclotu is gt -1m/s
		#gt[4,:] = -10
		control_n = max(3, int(0.1 / self.dt)) # I dunno. 4 seems to help
		#print(control_n)

		gt[:,:control_n] = -100
		#gt[1,:2] = -100
		#gt[1,:2] = -15
		#gt[0,:3] = -10
		gt = gt.flatten()
		lt = np.zeros((2,N))
		lt[0,:] = 1 #0.75 # x less than 0.75
		lt[1,:] = 3 #1 # velocity less than 1m/s
		#lt[4,:] = 10
		lt[:,:	control_n] = 100
		#lt[1,:2] = 100
		#lt[0,:3] = 10
		#lt[1,:2] = 15
		
		lt = lt.flatten()

		z = sparse.bsr_matrix((N, N))
		ineqA = sparse.bmat([[sparse.eye(N),z,z,z,z],[z,sparse.eye(N),z,z,z]]) #.tocsc()
		#print(ineqA.shape)
		#print(ineqA)
		cons = self.constraint(forward.seed_sparse_gradient(x), x0, v0, th0, thd0)
		A = sparse.vstack(map(lambda z: z.dvalue, cons)) #  y.dvalue.tocsc()
		#print(A.shape)
		totval = np.concatenate(tuple(map(lambda z: z.value, cons)))
		temp = A@x - totval


		A = sparse.vstack((A,ineqA)).tocsc()

		#print(tuple(map(lambda z: z.value, cons)))
		#print(temp.shape)
		#print(lt.shape)
		#print(gt.shape)
		u = np.concatenate((temp, lt))
		l = np.concatenate((temp, gt))
		return A, l, u





 

 

Pytorch Trajectory Optimization 3: Plugging in the Hessian

I plugged in the hessian extraction code for using newton steps.

When I profiled it using the oh so convenient https://github.com/rkern/line_profiler I found almost all of my time was spent in the delLy.backwards step. For each hessian I needed to run this B (the band width) times and each time cost ~0.6ms. For the entire run to converge took about 70 iterations and 1000 runs of this backwards step, which came out to 0.6 seconds. It is insane, but actually even calculating the band of the hessian costs considerably more time than inverting it.

So to speed this up, I did a bizarre thing. I replicated the entire system B times. Then I can get the entire hessian band in a single call to backwards. remarkably, although B ~ 15 this only slowed backwards down by 3x. This is huge savings actually, while obviously inefficient. The entire program has gone down from 1.1s to 0.38s, roughly a 3x improvement. All in all, this puts us at 70/0.38 ~ 185 Hz for a newton step. Is that good enough? I could trim some more fat. The Fast MPC paper http://web.stanford.edu/~boyd/papers/fast_mpc.html says we need about ~5 iterations to tune up a solution, this would mean running at 40Hz. I think that might be ok.

Since the hessian is hermitian it is possible to use roughly half the calls (~B/2), but then actually extracting the hessian is not so simple. I haven’t figured out a way to comfortably do such a thing yet. I think I could figure out the first column and then subtract (roughly some kind of gaussian elimination procedure).

It has helped stability to regularize everything with a surprising amount of weight in the cost. I guess since I anticipate all values being in the range of -10,10, maybe this makes sense.

Now I need to try not using this augmented Lagrangian method and just switching to a straight newton step.

Edit: Ooh. Adding a simple backtracking line search really improves stability.

import torch
import matplotlib.pyplot as plt
import numpy as np
import torch.optim
from scipy import linalg

N = 100
T = 10.0
dt = T/N
NVars = 4
NControls = 1
batch = (NVars+NControls)*3
def getNewState():
 #we 're going to also pack f into here
 x = torch.zeros(batch,N,NVars+NControls, requires_grad=True) 

 #f = torch.zeros(batch, N-1, requires_grad=True) 

 l = torch.zeros(batch, N-1,NVars, requires_grad=False) 
 return x, l

def calc_loss(x, l , rho, prox=0): # l,
 #depack f, it has one less time point 
 cost = 0.1*torch.sum(x**2)
 #cost += prox * torch.sum((x - x.detach())**2)
 f = x[:,1:,:NControls]
 #leftoverf = x[:,0,:NControls]
 x = x[:,:,NControls:]

 delx = (x[:,1:,:] - x[:, :-1,:]) / dt

 xbar = (x[:,1:,:] + x[:, :-1,:]) / 2
 dxdt = torch.zeros(batch, N-1,NVars)
 THETA = 2
 THETADOT = 3
 X = 0
 V = 1
 dxdt[:,:,X] = xbar[:,:,V]
 #print(dxdt.shape)
 #print(f.shape)
 dxdt[:,:,V] = f[:,:,0]
 dxdt[:,:,THETA] = xbar[:,:,THETADOT] 
 dxdt[:,:,THETADOT] = -torch.sin(xbar[:,:,THETA]) + f[:,:,0]*torch.cos(xbar[:,:,THETA])




 xres = delx - dxdt

 lagrange_mult = torch.sum(l * xres)


 #cost = torch.sum((x+1)**2+(x+1)**2, dim=0).sum(0).sum(0)
 #cost += torch.sum((f+1)**2, dim=0).sum(0).sum(0)
 #cost += 1
 penalty = rho*torch.sum( xres**2)

 
 #cost += 1.0*torch.sum((abs(x[:,:,THETA]-np.pi)), dim=1) 
 #cost = 1.0*torch.sum((x[:,:,:]-np.pi)**2 )
 cost += 1.0*torch.sum((x[:,:,THETA]-np.pi)**2 * torch.arange(N) / N )
 cost += 0.5*torch.sum(f**2)
 #cost = 1.0*torch.sum((x[:,:,:]-np.pi)**2 )
 #cost = cost1 + 1.0
 #cost += 0.01*torch.sum(x**2, dim=1).sum(0).sum(0)
 #xlim = 3
 #cost += 0.1*torch.sum(-torch.log(xbar[:,:,X] + xlim) - torch.log(xlim - xbar[:,:,X]))
 #cost += 0.1*torch.sum(-torch.log(xbar[:,:,V] + 1) - torch.log(1 - xbar[:,:,V]),dim=1)
 #cost += (leftoverf**2).sum()




 #total_cost = cost + lagrange_mult + penalty 

 return cost, penalty, lagrange_mult, xres

def getFullHess(): #for experimentation
 pass

def getGradHessBand(loss, B, x):
 #B = bandn
 delL0, = torch.autograd.grad(loss, x, create_graph=True)
 delL = delL0[:,1:,:].view(B,-1) #remove x0
 print("del ", delL[:,:10])
 #hess = torch.zeros(B,N-1,NVars+NControls, requires_grad=False).view(B,B,-1)
 y = torch.zeros(B,N-1,NVars+NControls, requires_grad=False).view(B,-1)
 
 #y = torch.eye(B).view(B,1,B)
 #print(y.shape)
 for i in range(B):
 #y = torch.zeros(N-1,NVars+NControls, requires_grad=False).view(-1)
 y[i,i::B]=1
 #print(y[:,:2*B])
 print(y.shape)
 print(delL.shape)
 delLy = torch.sum(delL * y)
 #print(delLy)
 
 
 delLy.backward() #(i != B-1)
 #torch.autograd.grad(loss, x, create_graph=True)
 #print(hess.shape)
 #print(x.grad.shape)
 #hess[i,:] = x.grad[:,1:,:].view(-1) #also remove x0
 #print(hess[i,:])
 #print(x.grad) 
 
 #print(hess)
 nphess = x.grad[:,1:,:].view(B,-1).detach().numpy()# .view(-1)# hess.detach().numpy()
 #print(nphess[:,:4])
 #print(nphess)
 for i in range(B):
 nphess[:,i::B] = np.roll(nphess[:,i::B], -i+B//2, axis=0)
 print(nphess[:,:4])
 #hessband = removeX0(nphess[:B//2+1,:])
 #grad = removeX0(delL.detach().numpy())
 return delL.detach().numpy()[0,:], nphess #hessband

x, l = getNewState()
rho = 0.1
prox = 0.0
for j in range(10):
 while True:
 try:
 cost, penalty, lagrange_mult, xres = calc_loss(x, l, rho, prox)
 #print(total_cost)
 print("hey now")
 #print(cost)
 total_cost = cost + lagrange_mult + penalty
 #total_cost = cost
 gradL, hess = getGradHessBand(total_cost, (NVars+NControls)*3, x)
 #print(hess)
 #print(hess.shape)
 gradL = gradL.reshape(-1)
 #print(gradL.shape)

 #easiest thing might be to put lagrange mutlipleirs into x.
 #Alternatively, use second order step in penalty method.
 bandn = (NVars+NControls)*3//2
 print(hess.shape)
 print(gradL.shape)
 dx = linalg.solve_banded((bandn,bandn), hess, gradL) #
 x.grad.data.zero_()
 #print(hess)
 #print(hess[bandn:,:])
 #dx = linalg.solveh_banded(hess[:bandn+1,:], gradL, overwrite_ab=True)
 newton_dec = np.dot(dx,gradL)
 #df0 = dx[:NControls].reshape(-1,NControls)
 dx = dx.reshape(1,N-1,NVars+NControls)
 
 with torch.no_grad():
 x[:,1:,:] -= torch.tensor(dx)
 print(x[:,:5,:])
 #print(x[:,0,NVars:].shape)
 #print(df0.shape)
 costval = cost.detach().numpy()
 #break
 if newton_dec < 1e-10*costval:
 break
 except np.linalg.LinAlgError:
 print("LINALGERROR")
 prox += 0.1
 #break
 #print(x)
 with torch.no_grad():
 l += 2 * rho * xres
 rho = rho * 2 #+ 0.1
#print(x)
#plt.subplot(131)
plt.plot(xres[0,:,0].detach().numpy(), label='Xres')
plt.plot(xres[0,:,1].detach().numpy(), label='Vres')
plt.plot(xres[0,:,2].detach().numpy(), label='THeres')
plt.plot(xres[0,:,3].detach().numpy(), label='Thetadotres')

plt.legend(loc='upper right')
plt.figure()
#plt.subplot(132)
plt.plot(x[0,:,1].detach().numpy(), label='X')
plt.plot(x[0,:,2].detach().numpy(), label='V')
plt.plot(x[0,:,3].detach().numpy(), label='Theta')
plt.plot(x[0,:,4].detach().numpy(), label='Thetadot')
plt.plot(x[0,:,0].detach().numpy(), label='F')
#plt.plot(cost[0,:].detach().numpy(), label='F')
plt.legend(loc='upper right')
#plt.figure()
#plt.subplot(133)
#plt.plot(costs)


plt.show()

figure_repl_version

figure_repl_resid