Model Predictive Control of CartPole in OpenAI Gym using OSQP

A continuation of this post

I was having difficulty getting the model predictive control from a previous post working on an actual cartpole. There are a lot more unknown variables in that case and other issues (the thing has a tendency to destroy itself). I was kind of hoping it would just work. So I realized that I should get it working in simulation.

I did not copy the simulation code of the openai cartpole  which gives some small amount of credence that the MPC might generalize to a real system.

For the sake of honesty, I’m still at the point where my controller freaks out about 1/3 of the time. I do not really understand why.


Looks damn good here though huh.

A problem I had for a while was the Length of my pole was off by a factor of 2. It still kind of worked, especially if nearly balanced (although with a lot of oscillation, which in hindsight was a clue something wasn’t tuned in right).

There are some useful techniques for manipulating gym. You can set some parameters from the outside, like starting positions and thresholds. Also you can mangle your way into continuous force control rather than just left right commands (wouldn’t it be cool to use Integer programming for that? Silly, but cool).

There is still a bunch of trash in here from me playing around with parameters. I like to keep it real (and lazy).

import gym
from mpc import MPC
import numpy as np
env = gym.make('CartPole-v0')
env.env.theta_threshold_radians = np.pi * 2
env.env.x_threshold = 5 

start_theta = 0#-np.pi + 0.4# + 0.1#2 * np.pi #np.pi+0.4

mpc = MPC(0.5,0,start_theta,0) 
action = 0
for i_episode in range(1):
    observation = env.reset()
    env.env.state[2] = start_theta - np.pi
    for t in range(500):
        #action = env.action_space.sample()
        observation, reward, done, info = env.step(action)
        a = mpc.update(observation[0] + 0.5, observation[1], observation[2]+np.pi, observation[3])
        env.env.force_mag = abs(a) #min(100, abs(a))
        if a < 0:
        	action = 0
        	action = 1
        if done:
            #print("Episode finished after {} timesteps".format(t+1))

One problem was that originally I had the pole just want to go to pi. But if it swings the other direction or many swings, that is bad and it will freak out. So I changed it to try to go the the current nearest multiple of pi, which helps.

Fiddling with the size of the regulation does have a significant effect and the relative size of regulation for x, v, f, omega. I am doing a lot of that search dumbly. I should probably do some kind of automatic.

Loosening the constraints on v and x seems to help stability.

I think weighting the angle at the end of the episode slightly more helps. That’s why I used linspace for the weight on the angle.

I’ve had a lot of problem with the answer coming back as infeasible from OSQP. I feel like it probably shouldn’t be and that is the solver’s problem?

Two things help: sometimes the cart does go out of the allowable range. The optimization probably will try to go all the way to the boundaries since it is useful. And since there is some mismatch between the actual dynamics and my model, it will go outside. So I heavily reduce the constraints for the first couple time steps. It takes a couple. 4 seems to work ok. It should want to apply forces during those four steps to get it back in range anyhow.

Even then it still goes infeasible sometimes and I don’t know why. So in that case, I reduce the required accuracy to hopefully at least get something that makes sense. That is what the eps_rel stuff is about. Maybe it helps. Not super clear. I could try a more iterative increasing of the eps?!topic/osqp/BzEqWQR2dYY


import sparsegrad.forward as forward
import numpy as np
import osqp
import scipy.sparse as sparse

class MPC():
	def __init__(self, x0, v0, theta0, thetadot0):
		self.N = 50
		self.NVars  = 5
		T = 8.0
		self.dt = T/self.N
		dt = self.dt
		self.dtinv = 1./dt
		#Px = sparse.eye(N)
		#sparse.csc_matrix((N, N)) 
		# The three deifferent weigthing matrices for x, v, and external force
		reg = sparse.eye(self.N)*0.05
		z = sparse.bsr_matrix((self.N, self.N))
		# sparse.diags(np.arange(N)/N)
		pp = sparse.diags(np.linspace(1,7,self.N)) # sparse.eye(self.N)
		P = sparse.block_diag([reg, 10*reg ,pp, reg, 10*reg]) #1*reg,1*reg])
		self.P = P
		THETA = 2
		q = np.zeros((self.NVars, self.N))
		q[THETA,:] = np.pi
		q[0,:] = 0.5
		#q[N,0] = -2 * 0.5 * 10
		q = q.flatten()
		q= -P@q
		#u = np.arr

		self.x = np.random.randn(self.N,self.NVars).flatten()
		#x = np.zeros((N,NVars)).flatten()
		#v = np.zeros(N)
		#f = np.zeros(N)


		A, l, u = self.getAlu(self.x, x0, v0, theta0, thetadot0)
		self.m = osqp.OSQP()
		self.m.setup(P=P, q=q, A=A, l=l, u=u , time_limit=0.1, verbose=False)# , eps_rel=1e-2 #  **settings # warm_start=False, eps_prim_inf=1e-1
		self.results = self.m.solve()
		for i in range(100):
			self.update(x0, v0, theta0, thetadot0)

	def calcQ(self):
		THETA = 2
		q = np.zeros((self.NVars, self.N))
		thetas = np.reshape(self.x, (self.NVars, self.N))[THETA,:]
		thetas = thetas - thetas % (2*np.pi) + np.pi
		q[THETA,:] = thetas[0] #np.pi #thetas
		q[0,:] = 0.5
		#q[N,0] = -2 * 0.5 * 10
		q = q.flatten()
		q = -self.P@q
		return q

	def update(self, x0, v0,theta0, thetadot0):
		A, l, u = self.getAlu(self.x, x0, v0, theta0, thetadot0)
		q = self.calcQ()
		self.m.update(q=q,, l=l, u=u)
		self.results = self.m.solve()
		if self.results.x[0] is not None:
			self.x = np.copy(self.results.x)
			#self.x += np.random.randn(self.N*self.NVars)*0.1 # help get out of a rut?
			return 0
		return self.x[4*self.N+1]

	def constraint(self, var, x0, v0, th0, thd0):
		#x[0] -= 1
		g = 9.8
		L = 1.0
		gL = g * L
		m = 1.0 # doesn't matter
		I = L**2 / 3
		Iinv = 1.0/I
		dtinv = self.dtinv
		N = self.N

		x = var[:N]
		v = var[N:2*N]
		theta = var[2*N:3*N]
		thetadot = var[3*N:4*N]
		a = var[4*N:5*N]
		dynvars = (x,v,theta,thetadot)
		xavg, vavg, thetavg, thdotavg = map(lambda z: (z[0:-1]+z[1:])/2, dynvars)
		dx, dv, dthet, dthdot = map(lambda z: (z[1:]-z[0:-1])*dtinv, dynvars)
		vres = dv - a[1:]
		xres = dx - vavg
		torque = (-gL*np.sin(thetavg) + a[1:]*L*np.cos(thetavg))/2
		thetdotres = dthdot - torque*Iinv
		thetres = dthet - thdotavg

		return x[0:1]-x0, v[0:1]-v0, theta[0:1]-th0, thetadot[0:1]-thd0, xres,vres, thetdotres, thetres
		#return x[0:5] - 0.5

	def getAlu(self, x, x0, v0, th0, thd0):
		N = self.N
		gt = np.zeros((2,N))
		gt[0,:] = -0.1 # 0.15 # x is greaer than 0.15
		gt[1,:] = -3 #-1 #veclotu is gt -1m/s
		#gt[4,:] = -10
		control_n = max(3, int(0.1 / self.dt)) # I dunno. 4 seems to help

		gt[:,:control_n] = -100
		#gt[1,:2] = -100
		#gt[1,:2] = -15
		#gt[0,:3] = -10
		gt = gt.flatten()
		lt = np.zeros((2,N))
		lt[0,:] = 1 #0.75 # x less than 0.75
		lt[1,:] = 3 #1 # velocity less than 1m/s
		#lt[4,:] = 10
		lt[:,:	control_n] = 100
		#lt[1,:2] = 100
		#lt[0,:3] = 10
		#lt[1,:2] = 15
		lt = lt.flatten()

		z = sparse.bsr_matrix((N, N))
		ineqA = sparse.bmat([[sparse.eye(N),z,z,z,z],[z,sparse.eye(N),z,z,z]]) #.tocsc()
		cons = self.constraint(forward.seed_sparse_gradient(x), x0, v0, th0, thd0)
		A = sparse.vstack(map(lambda z: z.dvalue, cons)) #  y.dvalue.tocsc()
		totval = np.concatenate(tuple(map(lambda z: z.value, cons)))
		temp = A@x - totval

		A = sparse.vstack((A,ineqA)).tocsc()

		#print(tuple(map(lambda z: z.value, cons)))
		u = np.concatenate((temp, lt))
		l = np.concatenate((temp, gt))
		return A, l, u




We have been fighting a problem for weeks. The Serial port was just not reliable, it had sporadic. The problem ended up being a surprising thing, we were using threading to receive the messages nd checking for limit switches. It is not entirely clear why but this was totally screwing up the serial port update in an unpredictable manner. Yikes. What a disaster.

After that though smoooooooth sailing.

With a slight adaptation of the previous Openai gym LQR cartpole code and a little fiddling with parameters we have a VERY stable balancer. We removed the back reaction of the pole dynamics on the cart itself for simplicity. This should be accurate when the pole vastly.

We did find that the motor is exactly velocity control in steady state with a linear response. There is a zero point offset (you need to ask for 100 out of 2046 before you get any movement at all).

We’ll see where we can get with the Lyapunov control next time.

from sabretooth_command import CartCommand
from cart_controller import CartController
from encoder_analyzer import EncoderAnalyzer
import time
import numpy as np
import scipy.linalg as linalg
lqr = linalg.solve_continuous_are

ports = list(
for p in ports:
    if "Sabertooth" in p.description:
       sabreport = p.device
       ardPort = p.device

print("Initilizing Commander")
comm = CartCommand(port= sabreport) #"/dev/ttyACM1")
print("Initilizing Analyzer")
analyzer = EncoderAnalyzer(port=ardPort) #"/dev/ttyACM0")
print("Initializing Controller.")
cart = CartController(comm, analyzer)
print("Starting Zero Routine")

gravity = 9.8
mass_pole = 0.1
length = 0.5

moment_of_inertia = (1./3.) * mass_pole * length**2

A = np.array([
    [0,0,length * mass_pole * gravity / (2 * moment_of_inertia) ,0]
B = np.array([0,1,0,length * mass_pole / (2 * moment_of_inertia)]).reshape((4,1))
Q = np.diag([1.0, 1.0, 1.0, 0.01])
R = np.array([[0.001]])

P = lqr(A,B,Q,R)
Rinv = np.linalg.inv(R)
K =,, P))
def ulqr(x):
	x1 = np.copy(x)
	x1[2] = np.sin(x1[2] + np.pi)
	return, x1)

command_speed = 0
last_time = time.time()
while True:
	observation = cart.analyzer.getState()
	x,x_dot,theta,theta_dot = observation
	a = ulqr(np.array([(x-500)/1000,x_dot/1000,theta-0.01,theta_dot]))
	t = time.time() 
	dt = t - last_time
	last_time = t
	command_speed += 1 * a[0] * dt
	#command_speed -= (x - 500) * dt * 0.001 * 0.1
	#command_speed -= x_dot * dt * 0.001 * 0.5
	cart.setSpeedMmPerS(1000 *command_speed)
	print("theta {}\ttheta_dot {}\taction {}\tspeed {}".format(theta, theta_dot, a, command_speed))