Flappy Bird as a Mixed Integer Program

Mixed Integer Programming is a methodology where you can specify convex (usually linear) optimization problems that include integer/boolean variables.

Flappy Bird is a game about a bird avoiding pipes.

We can use mixed integer programming to make a controller for Flappy Bird. Feel free to put this as a real-world application in your grant proposals, people.

While thinking about writing a MIP for controlling a lunar lander game, I realized how amenable to mixed integer modeling flappy bird is. Ben and I put together the demo on Saturday. You can find his sister blog post here.

The bird is mostly in free fall, on parabolic trajectories. This is a linear dynamic, so it can directly be expressed as a linear constraint. It can discretely flap to give itself an upward impulse. This is a boolean force variable at every time step. Avoiding the ground and sky is a simple linear constraint. The bird has no control over its x motion, so that can be rolled out as concrete values. Because of this, we can check what pipes are relevant at time points in the future and putting the bird in the gap is also a simple linear constraint.

There are several different objectives one might want to consider and weight. Perhaps you want to save the poor birds energy and minimize the sum of all flaps cvx.sum(flap). Or perhaps you want to really be sure it doesn’t hit any pipes by maximizing the minimum distance from any pipe. Or perhaps minimize the absolute value of the y velocity, which is a reasonable heuristic for staying in control. All are expressible as linear constraints. Perhaps you might want a weighted combo of these. All things to fiddle with.

There is a pygame flappy bird clone which made this all the much more slick. It is well written and easy to understand and modify. Actually figuring out the appropriate bounding boxes for pipe avoidance was finicky. Figuring out the right combo of bird size and pipe size is hard, combined with computer graphics and their goddamn upside down coordinate system.

We run our solver in a model predictive control configuration. Model predictive control is where you roll out a trajectory as an optimization problem and resolve it at every action step. This turns an open loop trajectory solve into a closed loop control, at the expense of needing to solve a perhaps very complicated problem in real time. This is not always feasible.

My favorite mip modeling tool is cvxpy. It gives you vectorized constraints and slicing, which I love. More tools should aspire to achieve numpy-like interfaces. I’ve got lots of other blog posts using this package which you can find in my big post list the side-bar 👀.

The github repo for the entire code is here: https://github.com/philzook58/FlapPyBird-MPC

And here’s the guts of the controller:

I think it is largely self explanatory but here are some notes. The dt = t//10 + 1 thing is about decreasing our time resolution the further out from the current time step. This increases the time horizon without the extra computation cost. Intuitively modeling accuracy further out in time should matter less. The last_solution stuff is for in case the optimization solver fails for whatever reason, in which case it’ll follow open-loop the last trajectory it got.

Bits and Bobbles

• We changed the dynamics slightly from the python original to make it easier to model. We found this did not change the feel of the game. The old dynamics were piecewise affine though, so are also modelable using mixed integer programming. http://groups.csail.mit.edu/robotics-center/public_papers/Marcucci18.pdf . Generally check out the papers coming out of the Tedrake group. They are sweet. Total fanboy over here.
• The controller as is is not perfect. It fails eventually, and it probably shouldn’t. A bug? Numerical problems? Bad modeling of the pipe collision? A run tends to get through about a hundred pipes before something gets goofy.
• Since we had access to the source code, we could mimic the dynamics very well. How robust is flappy bird to noise and bad modeling? We could add wind, or inaccurate pipe data.
• Unions of Convex Region. Giving the flappy bird some x position control would change the nature of the problem. We could easily cut up the allowable regions of the bird into rectangles, and represent the total space as a union of convex regions, which is also MIP representable.
• Verification – The finite difference scheme used is crude. It is conceivable for the bird to clip a pipe. Since basically we know the closed form of the trajectories, we could verify that the parabolas do not intersect the regions. For funzies, maybe use sum of squares optimization?
• Realtime MIP. Our solver isn’t quite realtime. Maybe half real time. One might pursue methods to make the mixed integer program faster. This might involve custom branching heuristics, or early stopping. If one can get the solver fast enough, you might run the solver in parallel and only query a new path plan every so often.
• 3d flappy bird? Let the bird rotate? What about a platformer (Mario) or lunar lander? All are pretty interesting piecewise affine systems.
• Is this the best way to do this? Yes and no. Other ways to do this might involve doing some machine learning, or hardcoding a controller that monitors the pipe locations and has some simple feedback. You can find some among the forks of FlapPyBird. I have no doubt that you could write these quickly, fiddle with them and get them to work better and faster than this MIP controller. However, for me there is a difference between could work and should work. You can come up with a thousand bizarre schemes that could work. RL algorithms fall in this camp. But I have every reason to believe the MIP controller should work, which makes it easier to troubleshoot.

Casadi is something I’ve been aware of and not really explored much. It is a C++ / python / matlab library for modelling optimization problems for optimal control with bindings to IPOpt and other solvers. It can produce C code and has differentiation stuff. See below for some examples after I ramble.

I’ve enjoyed cvxpy, but cvxpy is designed specifically for convex problems, of which many control problems are not.

Casadi gives you a nonlinear modelling language and easy access to IPOpt, an interior point solver that works pretty good (along with some other solvers, many of which are proprietary however).

While the documentation visually looks very slick I actually found it rather confusing in contents at first. I’m not sure why. Something is off.

It also has a bunch of helper classes for DAE building and other things. They honestly really put me off. The documentation is confusing enough that I am not convinced they give you much.

The integrator classes give you access to external smart ode solvers from the Sundials suite. They give you good methods for difficult odes and dae (differential algebraic equations, which are ODEs with weird constraints like x^1 + y^1 == 1) Not clear to me if you can plug those in to an optimization, other than by a shooting method.

Casadi can also output C which is pretty cool.

I kind of wondered about Casadi vs Sympy. Sympy has lot’s of general purpose symbolic abilities. Symbolic solving, polynomial smarts, even some differential equation understanding. There might be big dividends to using it. But it is a little harder to get going. I feel like there is an empty space for a mathemtical modelling language that uses sympy as it’s underlying representation. I guess monkey patching sympy expressions into casadi expression might not be so hard. Sympy can also output fast C code. Sympy doesn’t really have any support for sparseness that I know of.

As a side note, It can be useful to put these other languages into numpy if you need extended reshaping abilities. The other languages often stop at matrices, which is odd to me.

Hmm. Casadi actually does have access to mixed integer programs via bonmin (and commercial solvers). That’s interesting. Check out lotka volterra minlp example

The optim interface makes some of this look better. optim.minimize and subject_to. Yeah, this is more similar to the interfaces I’m used to. It avoids the manual unpacking of the solution and the funky feel of making everything into implicit == 0 expressions.

Here is a simple harmonic oscillator example using the more raw casadi interface. x is positive, v is velocity, u is a control force. I’m using a very basic leap frog integration. You tend to have to stack things into a single vector with vertcat when building the final problem.

Let’s use the opti interface, which is pretty slick. Here is a basic cartpole https://web.casadi.org/blog/opti/

Very fast. Very impressive. Relatively readable code. I busted this out in like 15 minutes. IPopt solves the thing in the blink of an eye (about 0.05s self reported). Might be even faster if I warm start it with a good solution, as I would in online control (which may be feasible at this speed). Can add the initial condition as a parameter to the problem

I should try this on an openai gym.

Model Predictive Control of CartPole in OpenAI Gym using OSQP

A continuation of this post http://www.philipzucker.com/osqp-sparsegrad-fast-model-predictive-control-python-inverted-pendulum/

I was having difficulty getting the model predictive control from a previous post working on an actual cartpole. There are a lot more unknown variables in that case and other issues (the thing has a tendency to destroy itself). I was kind of hoping it would just work. So I realized that I should get it working in simulation.

I did not copy the simulation code of the openai cartpole https://github.com/openai/gym/blob/master/gym/envs/classic_control/cartpole.py  which gives some small amount of credence that the MPC might generalize to a real system.

For the sake of honesty, I’m still at the point where my controller freaks out about 1/3 of the time. I do not really understand why.

Looks damn good here though huh.

A problem I had for a while was the Length of my pole was off by a factor of 2. It still kind of worked, especially if nearly balanced (although with a lot of oscillation, which in hindsight was a clue something wasn’t tuned in right).

There are some useful techniques for manipulating gym. You can set some parameters from the outside, like starting positions and thresholds. Also you can mangle your way into continuous force control rather than just left right commands (wouldn’t it be cool to use Integer programming for that? Silly, but cool).

There is still a bunch of trash in here from me playing around with parameters. I like to keep it real (and lazy).

One problem was that originally I had the pole just want to go to pi. But if it swings the other direction or many swings, that is bad and it will freak out. So I changed it to try to go the the current nearest multiple of pi, which helps.

Fiddling with the size of the regulation does have a significant effect and the relative size of regulation for x, v, f, omega. I am doing a lot of that search dumbly. I should probably do some kind of automatic.

Loosening the constraints on v and x seems to help stability.

I think weighting the angle at the end of the episode slightly more helps. That’s why I used linspace for the weight on the angle.

I’ve had a lot of problem with the answer coming back as infeasible from OSQP. I feel like it probably shouldn’t be and that is the solver’s problem?

Two things help: sometimes the cart does go out of the allowable range. The optimization probably will try to go all the way to the boundaries since it is useful. And since there is some mismatch between the actual dynamics and my model, it will go outside. So I heavily reduce the constraints for the first couple time steps. It takes a couple. 4 seems to work ok. It should want to apply forces during those four steps to get it back in range anyhow.

Even then it still goes infeasible sometimes and I don’t know why. So in that case, I reduce the required accuracy to hopefully at least get something that makes sense. That is what the eps_rel stuff is about. Maybe it helps. Not super clear. I could try a more iterative increasing of the eps?

OSQP and Sparsegrad: Fast Model Predictive Control in Python for an inverted pendulum

OSQP is a quadratic programming solver that seems to be designed for control problems. It has the ability to warm start, which should make it faster.

I heard about it in this Julia talk

They have some really cool technology over there in Julia land.

The problem is setup as a sequential quadratic program. I have a quadratic cost to try to bring the pendulum to an upright position.

The equations of motion are roughly

$\ddot{\theta}I=-mgL\sin(\theta)+mfL\cos(\theta)$

$\ddot{x}=f$

$I=\frac{1}{3}mL^2$

We don’t include the back reaction of the pole on the cart. It is basically irrelevant for our systems and overly complicated for no reason. The interesting bit is the nonlinear dynamics and influence of the cart acceleration.

I write down obeying the equations of motion as a linear constraint between timesteps. I use sparsegrad to get a sparse Jacobian of the equations of motion. The upper and lower (l and u) bounds are set equal to make an equality constraint.

Our setup has a finite track about a meter long. Our motors basically are velocity control and they can go about +-1m/s. Both of these can be encapsulated as linear constraints on the position and velocity variables. $l \le Ax \le u$

$\phi(x)=0$

$\phi(x) \approx \phi(x_0) + \partial \phi(x_0) (x - x_0)$

$A= \partial \phi(x_0)$

$l=u=\partial \phi(x_0) x_0 - \phi_0(x_0)$

Similarly for finding the quadratic cost function in terms of the setpoint $x_s$. $\frac{1}{2}x^T P x + q^Tx= \frac{1}{2}(x-x_s)P(x-x_s)$ Expanding this out we get

$q = - Px_s$

I run for multiple iterations to hopefully converge to a solution (it does). The initial linearization might be quite poor, but it gets better.

The OSQP program seems to be able to run in under 1ms. Nice! Initial tests of our system seem to be running at about 100Hz.

Could probably find a way to make those A matrices faster than constructing them entirely anew every time. We’ll see if we need that.

I very inelegantly smash all the variables into x. OSQP and scipy sparse don’t support multiIndex objects well, best as I can figure. Vectors should be single dimensioned and matrices 2 dimensioned.

Still to be seen if it works on hardware. I was already having infeasibility problems. Adjusting the regularization on P seemed to help.

PyTorch Trajectory Optimization Part 2: Work in Progress

I actually have been plotting the trajectories, which is insane that I wasn’t already doing in part 1. There was clearly funky behavior.

Alternating the Lagrange multiplier steps and the state variable steps seems to have helped with convergence. Adding a cost to the dynamical residuals seems to have helped clean them up also.

I should attempt some kind of analysis rather than making shit up. Assuming quadratic costs (and dynamics), the problem is tractable. The training procedure is basically a dynamical system.

Changed the code a bit to use more variables. Actually trying the cart pole problem now. The results seem plausible. A noisy but balanced dynamical residual around zero. And the force appears to flip it’s direction as the pole crosses the horizontal.

Polyak’s step length

The idea is that if you know the optimal value you’re trying to achieve, that gives you a scale of gradient to work with. Not as good as a hessian maybe, but it’s somethin’. If you use a gradient step of $x + (f-f*)\frac{\nabla f}{|\nabla f|^2}$ it at least has the same units as x and not f/x. In some simple models of f, this might be exactly the step size you’d need. If you know you’re far away from optimal, you should be taking some big step sizes.

The Polyak step length has not been useful so far. Interesting idea though.

Problems:

1. The step size is ad hoc.
2. Lagrange multiplier technique does not seem to work
3. Takes a lot of steps
4. diverges
5. seems to not be getting an actual solution
6. Takes a lot of iterations

On the table: Better integration scheme. Hermite collocation?

Be more careful with scaling, what are the units?

mutiplier smoothing. Temporal derivative of lagrange multiplier in cost?

alternate more complete solving steps

huber on theta position cost. Square too harsh? Punishes swing away too much?

more bullshit optimization strats as alternatives to grad descent

weight sooner more than later. Care more about earlier times since want to do model predictive control

Just solve eq of motion don’t worry about control as simpler problem

Pole up balancing

logarithm squeezing method – nope

The lambda * x model of lagrange mulitplier. Leads to oscillation

Damping term?

This learning rate is more like a discretization time step than a decay parameter. Well the product of both actually.

Heat equation model. Kind of relaxing everything into place

______________________________

Switched to using pytorch optimizers. Adam seems to work the best. Maybe 5x as fast convergence as my gradient descent. Adagrad and Adadelta aren’t quite as good. Should still try momentum. Have to reset the initial conditions after every iteration. A better way? Maybe pass x0 in to calc_loss separately?

Switched over to using the method of multipliers http://www.cs.cmu.edu/~pradeepr/convexopt/Lecture_Slides/Augmented-lagrangian.pdf

The idea is to increase the quadratic constraint cost slowly over time, while adjusting a Lagrange mutiplier term to compensate also. Seems to be working better. The scheduling of the increase is still fairly ad hoc.

The left is residuals of obeying the equations of motion, the middle is the force and trajectories themselves and the right is cost vs iteration time. Not entirely clear that a residual of 0.04 is sufficient. Integrated over time this could be an overly optimistic error of 0.2 ish I’d guess. That is on the edge of making me uncomfortable. Increase rho more? Also that force schedule seems funky and overly complex. Still, improvement from before. Feels like we’re cookin’ with gas

CartPole WORKIN’ BOYEEE

We have been fighting a problem for weeks. The Serial port was just not reliable, it had sporadic. The problem ended up being a surprising thing, we were using threading to receive the messages nd checking for limit switches. It is not entirely clear why but this was totally screwing up the serial port update in an unpredictable manner. Yikes. What a disaster.

After that though smoooooooth sailing.

With a slight adaptation of the previous Openai gym LQR cartpole code and a little fiddling with parameters we have a VERY stable balancer. We removed the back reaction of the pole dynamics on the cart itself for simplicity. This should be accurate when the pole vastly.

We did find that the motor is exactly velocity control in steady state with a linear response. There is a zero point offset (you need to ask for 100 out of 2046 before you get any movement at all).

We’ll see where we can get with the Lyapunov control next time.