Some Notes on Drake: A Robotic Control ToolBox

Dumping these notes out there as is. Some material is out of date. Hope they’re useful.

What is Drake?

From the Drake Website (

“Drake (“dragon” in Middle English) is a C++ toolbox started by the Robot Locomotion Group at the MIT Computer Science and Artificial Intelligence Lab (CSAIL). The development team has now grown significantly, with core development led by the Toyota Research Institute. It is a collection of tools for analyzing the dynamics of our robots and building control systems for them, with a heavy emphasis on optimization-based design/analysis.

While there are an increasing number of simulation tools available for robotics, most of them function like a black box: commands go in, sensors come out. Drake aims to simulate even very complex dynamics of robots (e.g. including friction, contact, aerodynamics, …), but always with an emphasis on exposing the structure in the governing equations (sparsity, analytical gradients, polynomial structure, uncertainty quantification, …) and making this information available for advanced planning, control, and analysis algorithms. Drake provides interfaces to high-level languages (MATLAB, Python, …) to enable rapid-prototyping of new algorithms, and also aims to provide solid open-source implementations for many state-of-the-art algorithms. Finally, we hope Drake provides many compelling examples that can help people get started and provide much needed benchmarks. We are excited to accept user contributions to improve the coverage.”

Drake is a powerful toolkit for the control of dynamical systems, and I hope I lower some of the barrier to entry with this post. Be forewarned, Drake changes quickly with time, and some of the following may be out of date (especially the rigid body trees) or ill advised. Use your judgement.

Getting Drake

Drake may be installed from binaries or source. Both may be gotten here:

Using Bazel

Drake uses Bazel as a build tool. Bazel is an open-source build and test tool similar to Make, Maven, and Gradle.

There are three commands that you need to know:

  • bazel build
  • bazel run
  • bazel query

Query is very useful for investigating available binaries within the examples folder and elsewhere.

The notation // is used to refer to the build’s main directory. This corresponds to the drake folder.
... is the notation for everything in the subdirectory


  • bazel build //... will build everything in the project.
  • To query all the binaries available for tools run bazel query //tools/...

Using Pydrake

Drake can be run natively in C++, or by using its MATLAB, python, or Julia bindings. This manual will be focusing on using Drake in python. By default pydrake will not be in the path. You can put pydrake into the path after building by running the following line or adding it to your .bashrc

export PYTHONPATH=~/drake/drake-build/install/lib/python2.7/site-packages:${PYTHONPATH}

The following line will import everything in Drake into the python namespace.

from pydrake.all import *

Drake is only compatible with python 2.7. If your default system install is python3, you may need to explicitly run the command python2.
The following message may be thrown if you inadvertently use python3.

Traceback (most recent call last):
File "", line 2, in 
from pydrake.all import


ImportError: dynamic module does not define module export function (PyInit__common_py)

Pydrake itself has generated documentation available here:

This is an extremely useful resource.

A very useful tool for exploring and confirming the Drake functionality and syntax is the python REPL. From the python REPL or within your code, it is useful to inspect an object using either help(obj) or print(dir(obj)) which will print a list of all the properties available on your object.

Drake resources


Besides the source code itself, the most accurate and up to date information is available in searchable form on the Doxygen page. This is a useful reference, but can be overwhelming.

Underactuated Robotics textbook and examples

Drake has a textbook being built alongside it by Russ Tedrake.

In particular the python source used to generate some of the examples in the book is worth examining.

The full course is available online both on Edx and more recent versions on Youtube.

Examples directory.

A useful source of use cases for drake can be found in the examples directory within the Drake project. As of October 2018 the contents include:

  • Cartpole – The cartpole is a classic control system consisting of a pendulum attached to a linearly actuated cart. In this directory you can find examples
  • Pendulum
  • Double Pendulum
  • Acrobot – The Acrobot is a double pendulum system actuated at the shoulder
  • Kinova Jaco Arm – A commercially available robotic arm
  • Kuka Iiwa Arm – A commerically available robotic arm
  • Particles
  • Bouncing Ball
  • Contact Model – Contains bowling pins, a gripper, and sliding bricks demonstrating Drakes ability to simulate contact dynamics
  • Rimless Wheel – A very simple model of a walking robot
  • Compass Gait – A slightly less simple model of a walking robot
  • Atlas – Examples concerning the Atlas humanoid robot
  • zmp
  • quadcopter
  • and others

Additional usage examples for pydrake can be found in the drake/bindings/pydrake/test folder.

Periscope Tutorials

There is a set of Jupyter notebook based tutorials for some basic Drake functionality in python.

Drake Concepts


For dynamic simulation, Drake exposes a Lego block interface for building complex systems out of smaller pieces, a paradigm similar to Simulink and other modeling software.
Objects possess input and output ports. One wires up input ports to output ports to build composite systems.

To build a simple forward simulation, construct a builder object. Then add all subsystems to the builder object. Explicitly connected input and output ports together. One possible cause of crashes may be leaving ports unconnected.

Once the entire system has been built, a Simulator object can be constructed from it. You may select an integration scheme and set initial conditions by getting a context from the Simulator object. The context holds state information.

There following excerpt from shows how to define a simple system and simulate it.

from import VectorSystem

# Define the system.
class SimpleContinuousTimeSystem(VectorSystem):
def __init__(self):
0, # Zero inputs.
1) # One output.
self._DeclareContinuousState(1) # One state variable.

# xdot(t) = -x(t) + x^3(t)
def _DoCalcVectorTimeDerivatives(self, context, u, x, xdot):
xdot[:] = -x + x**3

# y(t) = x(t)
def _DoCalcVectorOutput(self, context, u, x, y):
y[:] = x

import matplotlib.pyplot as plt
from pydrake.all import (DiagramBuilder, SignalLogger, Simulator)
from continuous_time_system import *

# Create a simple block diagram containing our system.
builder = DiagramBuilder()
system = builder.AddSystem(SimpleContinuousTimeSystem())
logger = builder.AddSystem(SignalLogger(1))
builder.Connect(system.get_output_port(0), logger.get_input_port(0))
diagram = builder.Build()

# Create the simulator.
simulator = Simulator(diagram)

# Set the initial conditions, x(0).
state = simulator.get_mutable_context().get_mutable_continuous_state_vector()

# Simulate for 10 seconds.

# Plot the results.


Rigid Body Trees

Edit : I think the Rigid Body Trees interface is deprecated. Use Multi Body Trees.

There are two complementary perspectives to take of the degrees of freedom of a robot, intrinsic coordinates and extrinsic coordinates. The intrinsic coordinates have one variable per degree of freedom of the robot. A common example is a set of joint angles. The dynamics are simply expressed in the intrinsic coordinates and can derived using Lagrangian mechanics. The extrinsic coordinates specify the spatial locations and orientation of frames attached to the robot. These are called frames. These spatial coordinates may be constrained in a relationship by a rigid mounting or gearing, so there may be more extrinsic frames available than intrinsic coordinates. Extrinsic coordinates are particularly pertinent for discussions of geometry, contact, and external forces.

Drake uses the URDF (Universal Robot Description Format) format. This is a common robot format originating in the ROS community for which you can find models of many commercial robots online. It is an XML based format with visualization, inertial, and collision tags.

This is an example URDF for a pendulum cart system.

Example: Pendulum URDF








The RigidBodyTree is a Drake class that describes both the intrinsic and extrinsic properties of a linkage. RigidBodyTrees may be built from a URDF file, some of which are packaged inside of Drake. For example, the Jaco arm URDF is available packaged with Drake.

jaco_urdf = FindResourceOrThrow(
tree = RigidBodyTree(jaco_urdf, FloatingBaseType.kFixed)

The full properties of the RigidBodyTree class are available at , but it will be useful to highlight some of the most commonly used functionality.

You can probe the RigidBodyTree for useful properties about the linkage, for example the number of intrinsic coordinates, or the number of bodies in the tree.


For many operations, one needs to perform the forward dynamics to build a kinematic cache that needs to be supplied later.

q = np.zeros(tree.get_num_positions())
v = np.zeros(tree.get_num_velocities())
cache = tree.doKinematics(q,v)

Drake describes the dynamics in terms of the intrinsic coordinates. The robot manipulator equations have the common form (See )

M(q)\ddot{q} + C(q,\dot{q})\dot{q} = \tau_g(q) + Bu

where q is the state vector, M is the inertia matrix, C captures Coriolis forces, and \tau_g is the gravity vector. The matrix B maps control inputs u into generalized forces.

External forces on the tree are described as wrenches. Wrenches are an object that combines forces and torques. In Drake, they are specified by 6 dimensional vectors. From the Drake docs:
“A column vector consisting of one wrench (spatial force) = [r X f; f], where f is a force (translational force) applied at a point P and r is the position vector from a point O (called the “moment center”) to point P.”

Drake will also compute the quantities in the manipulator equation. For example, to compute the term C(q,v,f_{ext}) in the manipulator equations with no externally applied wrenches.

bodies = [tree.get_body(j) for j in range(tree.get_num_bodies())]
no_wrench = { body : np.zeros(6) for body in bodies}
print(tree.dynamicsBiasTerm(cache, no_wrench))

Drake can be asked to compute the other terms in the manipulator equation as well. Drake can compute the center of mass of the entire tree.


Drake provides a couple of useful mappings between the intrinsic and extrinsic coordinates.

First, given a set of internal coordinates, Drake can transform these into frames, which may be expressed in yet another coordinate system.

The extrinsic frames can be expressed as a function of the intrinsic coordinates

X_i = f_i(x_j)

tree.relativeTransform(cache, 0, 9)

Returns a 4\times 4 transformation matrix between body 0 and 9 of the tree. The upper 3\times 3 block corresponds to a rotation matrix, and the last column a translation vector of the frame.

The Jacobian of this mapping,

J_{ij} = \frac{\partial f_i}{\partial x_j}

is useful for translating externally described small displacements, velocities, forces, and torques, into the equivalent terms for the intrinsic coordinates.

The time derivative or differential of a frame will possess a linear and angular velocity. These come packed in a 6 dimensional vector called the Twist.

The geometric Jacobian function returns the Jacobian in a sparse format. It returns a tuple of a m\times 6 matrix and a 1\times m vector of indices to which coordinates these correspond.
The function takes the id of three different frames. In this example, it computes the differential of the transformation from frame 0 to frame 9 expressed in frame 3.

tree.geometricJacobian(cache, 0, 9, 3)

The dense matrix can be constructed by

jsparse, inds = tree.geometricJacobian(cache, 0, 9, 0)
jdense = np.zeros((tree.get_num_positions(), 6))
jdense[inds, :] = jsparse


A Mathematical Introduction to Robotic Manipulation –


The Drake visualizer may be found in the tools folder. The Drake visualizer needs to be running before any applications that needs to communicate with it are started. To get the drake visualizer running run the following command from the drake directory

bazel run //tools:drake_visualizer

May need to run commands here to make LCM work and visualizer not crash immediately

sudo ifconfig lo multicast
sudo route add -net netmask dev lo

Kinova ROS package

The Kinova Jaco arm is easily communicated to through the Kinova ROS package

The package as of October 2018 supports Ubuntu 14.04 and ROS Indigo. The package includes bindings to the Jaco SDK for reading sensor data and giving motion commands. The package supports a variety of control modes, including torque control.

To update functionality to ubuntu 16.04 and ROS Kinetic you may wish to pull from this external branch

or try the beta branch

The Gazebo simulator seems to have a great deal of trouble.
One suggestion is that you have to add a small nonzero size parameter to your URDF files.

The topics made available by the ROS Package are


ROS Intercommunication

The communication stack used by Drake is Lightweight Communications and Marshalling (LCM). For interconnection to the rest of the ROS ecosystem, this adds a layer of friction.

One option is to use the lcm_to_ros project

This is a generator for building republishers of messages going between ROS and LCM

Another option is to build custom subscribers and publishers as in the following example.

Example: Connecting Drake Visualizer to External Jaco Arm

In another terminal turn on the Jaco driver

roslaunch kinova_bringup kinova_robot.launch kinova_robotType:=j2n6s300

Also get the drake visualizer running from the drake directory before running the script

bazel run //tools:drake_visualizer

import rospy
from pydrake.all import *
import numpy as np
from sensor_msgs.msg import JointState
jaco_urdf = FindResourceOrThrow("drake/manipulation/models/jaco_description/urdf/j2n6s300.urdf")
tree = RigidBodyTree(jaco_urdf , FloatingBaseType.kFixed)

builder = DiagramBuilder()

lc = DrakeLcm()
vis = DrakeVisualizer(tree, lc)
robot = builder.AddSystem(RigidBodyPlant(tree))
publisher = builder.AddSystem(vis)
builder.Connect(robot.get_output_port(0), publisher.get_input_port(0))

diagram = builder.Build()
simulator = Simulator(diagram)
context = simulator.get_mutable_context()
state = context.get_mutable_continuous_state_vector()

q = None
v = None
f = None
def callback(data):
        global q, v, f
        q = np.array(data.position)
        v = np.array(data.velocity)
        f = np.array(data.effort)
        # Some of the coordinates are offset between the Kinova Ros package and the Drake model
        q[1] = np.pi + q[1]
        q[2] = np.pi + q[2]

rospy.init_node('listener', anonymous=True)
rospy.Subscriber("/j2n6s300_driver/out/joint_state", JointState, callback)

rate = rospy.Rate(4)
while not rospy.is_shutdown():
        state = context.get_mutable_continuous_state_vector()
The Drake Visualizer

Example: Crumpling Jaco

A simple example of the simulation capabilities is the simulation of an unactuated Jaco arm.

import rospy
from pydrake.all import *
import numpy as np
from sensor_msgs.msg import JointState
jaco_urdf = FindResourceOrThrow("drake/manipulation/models/jaco_description/urdf/j2n6s300.urdf")
tree = RigidBodyTree(jaco_urdf , FloatingBaseType.kFixed)

builder = DiagramBuilder()

lc = DrakeLcm()
vis = DrakeVisualizer(tree, lc)
robot = builder.AddSystem(RigidBodyPlant(tree))
publisher = builder.AddSystem(vis)
builder.Connect(robot.get_output_port(0), publisher.get_input_port(0))
force = builder.AddSystem(ConstantVectorSource(np.zeros(9)))
builder.Connect(force.get_output_port(0), robot.get_input_port(0))

diagram = builder.Build()
simulator = Simulator(diagram)
context = simulator.get_mutable_context()
state = context.get_mutable_continuous_state_vector()

Optimization Solvers

Drake provides a common interface to many optimization solvers. Because of the tight integration, Drake has the capability to directly build the equations of motion for a system into a form the solver can comprehend.

The Mathematical Program class provides a high level interface to the different solvers. This class can be constructed as an object on it’s own or as returned by other helper classes such as trajectory optimization builders.

Drake provides a symbolic expression modeling language in which to describe constraints and costs.

from pydrake.all import *
import numpy as np

m = MathematicalProgram()

xs = m.NewContinuousVariables(2, 'x')
c = np.array([-1,-1])
m.AddCost( c[0]*xs[0] + c[1]*xs[1] )

m.AddLinearConstraint(xs[0] == 0)
m.AddLinearConstraint(3*xs[1] + xs[0] <= -1)


There are a large variety of solvers out there for problems of different structure.

A Mathematical Program is generally of the form

\min f(x) \   s.t.
g(x) \ge 0
h(x) = 0

One of the oldest and most studied class of these programs are known as Linear Programs which has linear cost and constraints. This mathematical program has very efficient solvers available for it.

A large class of optimization problems that are tractable are known as Convex Optimization problems. The cost functions must be bowl shaped (convex), the inequality constraints must define convex regions and the equality constraints must be affine (linear + offset). For this class, gradient descent roughly works and refinements of gradient descent like Newton's method work even better. Convexity implies that greedy local moves are also acceptable global moves and there are no local minima or tendril regions to get caught in.

The common reference for convex optimization is the textbook by Boyd and Vandenberghe freely available at There is also an accompanying video course available online.

Subclasses of Convex programming may have solvers tuned to them. Important subclasses include:

  • Linear Programming - Linear objective, linear equality, and linear inequality constraints.
  • Quadratic Programming - Linear Programming + quadratic objective
  • Second Order Cone Programming
  • Semidefinite Programming - Optimization allowing for the constraint of a SemiDefinite matrix. This means the matrix is constrained to have all nonnegative eigenvalues or equivalently the quadratic form it defines q^T Xq is non-negative for all possible vectors q.
  • Sum of Squares Programming - Optimization over polynomials with the constraint that the polynomial can be written as a sum of squares, a manifestly positive form.

Many problems cannot be put into this form. If the inherent nature of the problem at hand requires it, you may choose to use a nonlinear programming solver or Mixed Integer Programming Solver.

The solvers Ipopt and Snopt are nonlinear programming solvers. Ipopt is an open source and Snopt is proprietary. These solvers use local convex approximations to the problem to heuristically drive the solution to a local minimum.

Mixed Integer Linear Programming is Linear Programming with additional ability to require variables to take on integer values. This additional constraint type takes the problem from polynomial time solvable to NP-complete. A surprisingly large number of discrete and continuous optimization problems can be encoded into this framework. Internally, these solvers use linear programming solvers to guide the discrete search.

Part of the art and fun of the subject comes from manipulating your problem into a form amenable to powerful available solvers and theory.

Hans Mittelmann at the University of Arizona maintains benchmarks for a variety of optimization tools. A rule of thumb is that commercial solvers perform better than open source solvers.

Available Solvers in Drake

  • Mosek - Mosek is a proprietary optimization solver package offering solvers for
    • Linear.
    • Conic quadratic.
    • Semi-definite (Positive semi-definite matrix variables).
    • Quadratic and quadratically constrained.
    • General convex nonlinear.
    • Mixed integer linear, conic and quadratic.
  • Gurobi - Gurobi is a proprietary optimization solver package offering solvers for
    • linear programming solver (LP)
    • mixed-integer linear programming solver (MILP)
    • mixed-integer quadratic programming solver (MIQP)
    • quadratic programming solver (QP)
    • quadratically constrained programming solver (QCP)
    • mixed-integer quadratically constrained programming solver (MIQCP)
  • Snopt - Snopt is a nonlinear optimization solver using sequential convex optimization (SQP)
  • Ipopt - Ipopt is a open source nonlinear optimization solver using sequential convex optimization (SQP)
  • Operator Splitting Quadratic Program (OSQP) - Open source quadratic programming package
  • ik
  • LCP
  • dReal - An SMT solver for reals

Automatic Differentiation

Automatic Differentiation is the capability to have derivatives computed alongside code that computes the values. It is largely based upon application of the chain rule. There are two modes, forward and reverse mode.

Forward mode is the simplest to describe. Functions can be overloaded so that they take a dual number, a value combined in a tuple with it's derivative information. As the value of a function is computed, the Jacobian of that function is matrix multiplied on the derivative concurrently.

Drake exposes automatic differentiation capability for manual use

from pydrake.all import *
import numpy as np

# second parameter initializes forward mode derivative information
x = AutoDiffXd(4.0, np.array([1,1]))
f = x * x * x

More importantly Drake uses automatic differentiation internally to marshal symbolic problems into forms acceptable to external solvers and to calculate the various Jacobians we've already seen.

Trajectory Optimization

Trajectory optimization is a framework in which one uses Mathematical programming to solve optimal trajectory problems. The input to the system
is considered to be a decision variable in a Mathematical programming problem.

The combination of dynamical system modeling, automatic differentiation, and bindings to mathematical programming solvers makes Drake an excellent platform for trajectory optimization.

In Direct Collocation, both the path and the input variables are discretized along time. The trajectories are described by cubics and force curves are described by piecewise linear. One way of performing direct collocation is to take the path position at a discrete number of time points and make a decision variable for each. The discretized equations of motion become constraints that neighboring time points have to obey. One may then inject any other desired requirements (staying inside some safe region for example) as additional constraints.

The Drake docs state:
DirectCollocation implements the approach to trajectory optimization as described in C. R. Hargraves and S. W. Paris. Direct trajectory optimization using nonlinear programming and collocation. J Guidance, 10(4):338-342, July-August 1987. It assumes a first-order hold on the input trajectory and a cubic spline representation of the state trajectory, and adds dynamic constraints (and running costs) to the midpoints as well as the knot points in order to achieve a 3rd order integration accuracy.

Drake provides a useful interface for talking about trajectories. For both describing the cost function and the constraint functions, you want them to apply at all time steps. You can ask drake for variables representing the position or forces at all time steps. Drake will then build the appropriate mathematical program applying the cost of constraint at all time steps.

ctx = robot.CreateDefaultContext()
dircol = DirectCollocation(robot, ctx, 10, 0.01, 0.1) # timesteps, minimum time step, max timestep

state = dircol.state
u = dircol.input
t = dircol.time

dircol.AddConstraintToAllKnotPoints(u[0] >= 0)

You may want to select the initial and final state specifically to specify goals and initial conditions

init = dircol.initial_state
final = dircol.final_state


Example: Trajectory Optimization of a Pendulum

This example comes from the Underactuated Robotics textbook source

import math
import numpy as np
import matplotlib.pyplot as plt

from pydrake.examples.pendulum import (PendulumPlant, PendulumState)
from pydrake.all import (DirectCollocation, PiecewisePolynomial,
#from visualizer import PendulumVisualizer

plant = PendulumPlant()
context = plant.CreateDefaultContext()

dircol = DirectCollocation(plant, context, num_time_samples=21,
                            minimum_timestep=0.2, maximum_timestep=0.5)


torque_limit = 3.0  # N*m.
u = dircol.input()
dircol.AddConstraintToAllKnotPoints(-torque_limit <= u[0])
dircol.AddConstraintToAllKnotPoints(u[0] <= torque_limit)

initial_state = PendulumState()
# More elegant version is blocked on drake #8315:
# dircol.AddLinearConstraint(dircol.initial_state()
#                               == initial_state.get_value())

final_state = PendulumState()
# dircol.AddLinearConstraint(dircol.final_state() == final_state.get_value())

R = 10  # Cost on input "effort".

initial_x_trajectory = \
        PiecewisePolynomial.FirstOrderHold([0., 4.],
result = dircol.Solve() 
assert(result == SolutionResult.kSolutionFound) 
x_trajectory = dircol.ReconstructStateTrajectory() 
#vis = PendulumVisualizer() 
#ani = vis.animate(x_trajectory, repeat=True) 
x_knots = np.hstack([x_trajectory.value(t) for t in                                          
                      x_trajectory.end_time(), 100)]) 
plt.plot(x_knots[0, :], x_knots[1, :]) 

Example Application: Estimating end-effector force from Jaco motor torques

End effector forces become part of the equations of motion.

The geometric Jacobian transforms the extrinsic force into intrinsic coordinates. It is in general a rectangular matrix, since the number of extrinsic coordinates does not need to match the number of intrinsic coordinates.

A force f applied to the end effector appears linearly in the manipulator equations as J^Tf. This will be canceled by the torques of the motors during static or quasi-static movement. Hence, we can can determinethe external force from the motor torques if we assume it is the only external force at play.

The pseudo-inverse is the best possible solution to an overdetermined set of linear equations, in the least squares sense. We use this operation due to the non square nature of the Jacobian.

\min (Jx - X)^2 \rightarrow J^T Jx = J^T X

The following script prints both the end effector force as supplied by the Jaco SDK and the force as computed by Drake from the internal motor torques.

import rospy
from pydrake.all import *
import numpy as np
from sensor_msgs.msg import JointState
from geometry_msgs.msg import WrenchStamped
urdf_tree = FindResourceOrThrow("drake/manipulation/models/jaco_description/urdf/j2n6s300.urdf")
tree = RigidBodyTree( urdf_tree,
bodies = [tree.get_body(j) for j in range(tree.get_num_bodies())]
no_wrench = { body : np.zeros(6)  for body in bodies}

body_n = 9 #end_effector body number

def end_effector_force(q, v, u):
        cache = tree.doKinematics(q,v)
        # returns the external torque term for only gravity and no other external wrenches
        Cgrav = tree.dynamicsBiasTerm(cache, no_wrench )
        C = u - Cgrav # subtract off gravity of robot itself
        print(u - Cgrav)
        j = tree.geometricJacobian(cache, 0 , body_n, 0) # returns a tuple of 6 by n matrix in j[0] and the indices to which it applied in j[1]
        jjt =[0] ,j[0].T) # 
        Wext = np.linalg.solve(jjt,[0],  C[j[1]])) # C[j[1]] is cryptic numpy sub indexing for densifying the Jacobian
        return Wext[3:]

q = None
v = None
f = None
fext = np.zeros(3)

def callback(data):
        global q, v, f, initu, start
        q = np.array(data.position)
        v = np.array(data.velocity)
        q[1] = np.pi + q[1] #The drake conventions and the kinova conventions for angles are off
        q[2] = np.pi + q[2]
        f = np.array(data.effort)

def wrenchcallback(data):
        global fext
        fext[0] = data.wrench.force.x
        fext[1] = data.wrench.force.y
        fext[2] = data.wrench.force.z

rospy.init_node('listener', anonymous=True)
rospy.Subscriber("/j2n6s300_driver/out/joint_state", JointState, callback)
rospy.Subscriber("/j2n6s300_driver/out/tool_wrench", WrenchStamped, wrenchcallback)

rate = rospy.Rate(1)
while not rospy.is_shutdown():

Why I (as of June 22 2019) think Haskell is the best general purpose language (as of June 22 2019)

Me and my close friends have been interested in starting a project together and I suggested we use Haskell. I do not think the suggestion was received well or perhaps in seriousness. I have a tendency to joke about almost everything and have put forward that we use many interesting but not practical languages in the same tone that I suggest Haskell. This was a tactical mistake. I find myself in despair at the idea I can’t convince my personal friends, who are curious and intellectual people, to use Haskell on a fresh start web project we have complete control over. What hope do I have in the world at large? This brain dump post is meant for them. My suggestion to use Haskell is not just me being an asshole, although that does make it more fun for me. I will now try to explain in all seriousness and in all the honesty that I can muster what my opinions on languages are and why I have them.

Pragmatically can you start a new project using a language you don’t know? This is a problem. A project always has some intrinsic difficulty. Not all projects will survive an extra layer of unnecessary speedbump. But when and how are you supposed to learn new languages? Never is one answer. I disagree with this answer. In this case we have the leg up that I do know Haskell. Perhaps this is a downside in that it will be extra frustrating? It is also easy for me to ask, as using Haskell is not a burden for me, I have already sunk the cost, but a massive learning burden for others.

Is Haskell actually practical for a web application? Short answer: yes. Expect pain though. If your web application is so simple you could rip it out in 100 lines of python, this is such a simple project that it is a good opportunity to learn something new. If it will become large and complex, then I believe Haskell does shine, keeping complexity under control. I base this upon the professional experience making a web application using a Haskell-Purescript stack. For honesty, it wasn’t all good. I recall ripping my hair out threading shit through monad stacks to where it need to go. Yet on the whole, it kept the project sane. I also believe this based on the word of mouth that I believe but could be just cultish ramblings.

I believe that truly dominating properties of Haskell appear in large complex projects. This is difficult to prove in any other way except empirically and the experiments will be so wildly uncontrolled in terms of the project and people involved that no conclusions can truly be drawn. And yet I have faith, and think that personal experience validates this opinion to myself at least. We have to live this life even though truth does not exist. Choices and opinions must be made.

For programs that are going to be a single manageable file and written in one night, it doesn’t matter much what you use in terms of being choked on your own code. At this scale, I still think Haskell is enjoyable and interesting though. Haskell was my doorway into the world of computer science as I now understand it. I hope there are more doorways.

Things about me that may be different from you. Decide for yourself if these aspects of me make our opinions fundamentally incompatible.

  • I do have a talent and a like for practically oriented mathematical topics (computational methods, linear algebra, formal methods, calculus, geometric algebra, projective geometry, optimization, etc.). I actually have very little taste at all for mathematical topics that I see no purpose to.
  • I do have some desire and taste for esoterica for its own purpose. I cannot exactly characterize what makes some topics acceptable to me and others not.
  • A hard learning curve is not necessarily a downside for me. I enjoy the challenge if it is overcomable and worth it on the other side.
  • I like weird and different. That is a positive for me, but a negative for many. I might just be a millennial hipster idiot.
  • I would LOVE to find a language I think is better than Haskell. I would LOVE to abandon Haskell. Perhaps this already makes me odd. Perhaps I think many people don’t consider the differences between languages to be worth making the switch and the wasted knowledge. The people with this opinion may or may not have tried enough languages.
  • I have “drank the koolaid”. I do read what comes out of the Haskell and functional programming community and have a tendency to believe things without strong empirical backing.
  • I have been more deeply entwined with Haskell than any other language. Perhaps if I had reached a level of familiarity in another language I’d be as fervent about that one? I don’t believe this is the case.
  • While I desire performance, I consciously curb this desire. I am strongly in favor of cleanly written, clear, principled code. This is of course a weighted judgement. I will probably use a 100x performance gain for a 2x decrease in clarity or reusability. This is a result of the problem domains and scale that have interested me in the past and that I anticipate in the future. I STRONGLY believe the world at large over values performance at the expense of other good qualities of code. Or at least over optimizes early.

A Biased List of Pros and Cons

What do I find bad about C++.

  • The feature set is huge and difficult to understand
  • Extreme amounts of legacy features that will be even recommended if you read legacy documentation. Kitchen sink.
  • The language appears to be almost entirely built out of footguns.
  • The syntax is too verbose.
  • I have a distaste for mutation.
  • I have a distaste for object oriented programming

What do I find good about C++

  • It is commonly used. Large community.
  • It is possible for it to be very fast
  • Kitchen Sink. You can find almost any feature you want here.
  • The high level goals of the mind-leaders sound good.
  • Very interesting projects are written in C++. HPC things. Scientific computation.
  • Template metaprogramming seems very powerful, but arcane.

What I find bad about Java

  • The syntax puts me off as being incredibly verbose
  • Extreme object oriented focus
  • Corporate/Enterprise feel. I am an iconoclast and see myself as a reasonable but slightly rebellious character. Java in my mind brings images of cubicles and white walls. Perhaps this is not fair.

What do I find good about Java

  • ?

I’m joking. Sorry Java. But I also kind of mean it. Yes, there are positive aspects to Java.

What I like about python

  • Very commonly used and understood. Perhaps the lingua franca
  • Incredible library ecosystem
  • Numpy and scipy in particular are marvels of the modern age.
  • Syntax is basically imperative pseudo-code
  • I am personally very familiar with it
  • python is the easiest to use language I know.

What I dislike about python

  • Python has no natural tendency for correctness due to the free wheeling dynamically typed character. This is patched up with testing, opt-in type systems.
  • I don’t know how to grow as a pythonista. The skill curve flattens out. For some this may be a positive.
  • The main way of building new data types is the class system. I think this is ungainly, overly verbose, and not always a good conceptual fit.
  • Despite being among the most succinct of common imperative languages, I still think it ends up being too verbose.
  • It is slow. This is a negative, although not high on my priorities.

What is bad about Haskell

  • Very difficult learning curve. Let’s get real. Haskell is a very confusing programming language to get started in for common programming tasks.
  • functional programming is weirder than imperative programming.
  • The monad paradigm, even once learned is ungainly. Tracking multiple effects is a pain that does not exist in most languages.
  • The pain is up front. It is easy to get a sketch of what you want ripped out in python faster than in Haskell (for me). If you want a web server, command line tool, optimization problem, curve fitter, I can rip all of these out faster in python than I can in Haskell. As a psychological thing, this feels awful. For a small scale project, unless toying with Haskell itself or one of its domain expertises like implementing DSLs, python is the easier and correct choice. Python is a scripting language. I’d make the switch at two screens worth of code.
  • I think laziness is confusing and easy to shoot yourself with.
  • Haskell is not the fastest language, although faster than python.
  • Concern for there not being jobs and interestingly on the converse side, no people to hire. There is a catch-22 there. There is a set of people that would KILL for a Haskell job.
  • There are vocal smug assholes who use Haskell and push it. I am smug, I hope only mildly an asshole.

What I find good about Haskell

  • The number one reason is that there is something ephemeral that I just like. I am not naturally inclined to analyze such things. When I like a movie or don’t like it, it just happens, and if forced to explain why, I’ll bullshit.
  • Errors and bugs are to a shocking degree caught at compile time. More than any other language I have experience with does Haskell code run correctly without hidden bugs if it compiles. I am not claiming this is absolute but it is all the more incredible the degree to which it is true since it didn’t literally have to be this way. I have done major rewiring of a data structure in a project. The compiler guided my hand to exactly the positions that needed to be adjusted. Could C++ do this? Yes, to some degree. I believe that the tendencies of C++ programming make it less satisfactory on this point.
  • Types are an incredible design tool. I find designing the types of my program to be an extremely enjoyable and helpful activity. Much more so than box and wire class and interface diagrams. A good function name and a type signature basically entirely constrains the behavior of a function. Types can be quickly and completely be given to the compiler and machine enforced.
  • The pain that Monads cause are pains you should be feeling. They are the pain of explicitness which I 70% choose over the pain of not knowing what the fuck a function might do, and not enabling the compiler to enforce that. If something is capable of mutating state, it should say so in goddamn huge purple letters.
  • Haskell is more than fast enough. It isn’t that even people don’t care. The Haskell community at large cares a lot more for performance than I do, and I reap the dividends. The people in charge of the compiler and the main libraries are goddamn wizards who’s work I get to benefit from. This is true of all languages perhaps.
  • Laziness is very cool. At the beginning I thought it was INCREDIBLY awesome and inconceivable that I could manipulate an infinite list.
  • The way of specifying new data types is so succinct and so incredible. Pattern matching is SO GODDAMN good for some tasks.
  • Haskell has a paradigm of small combinators. It is common to build a sequence of very small natural functions for the domain and then build larger and larger things out of them. This is good for reusability and conceptual clearness.
  • Extreme preference for Immutability. As part of keeping what you must keep in your head or remember small while programming, immutability is an insane win. You think you know what you want now. You know you could just tweak this variable here, make a special variable over here. You can reason about how to make this all work and be correct now. You will not in a month. Your coworkers will mess it all up too.
  • Haskell code is generic by default. This allows the same code to be reused in many situations
  • The standard typeclass hierarchy is extremely well thought out and powerful. To some degree it is unnecessary in other languages. The difference between Functor, Applicative, Monad, and Traversable makes little sense in languages with unconstrained mutations and effects.
  • Haskell paradigms are inspired by mathematics, and I have great faith in mathematics. The concepts behind Haskell feel closer to discoveries rather than inventions. Imperative programming speaks in a language formed for the accidental nature of the machines we have. Functional programming is a language closer to mathematics, which I believe is closer to how the human mind works, and closer to what the problem at hand actually is.
  • Complexity scales. It is my belief, perhaps unverified, that as a project grows larger, the insane miserable churn and mental overhead grows slower in a Haskell project than in other languages. I do not have strong empirical evidence to this assertion. Word of mouth (of Haskellers).
  • The ceiling on Haskell is extremely high. You can continue to learn and get better, gaining more and more power. I do not currently see the end of this.
  • When I do reach for some new library, I am very often impressed by how thoughtfully built it is. Haskell itself is INCREDIBLY thoughtfully built.
  • The haskell community is very excited and they have many things to teach. There are many Haskellers out there who are very welcoming, kind, and intelligent.
  • Haskell does have some cache. I am not immune to wanting to seem smart. If the world thought that only idiots use Haskell, that would offput me some. That the world things that only impractical ivory tower smug weenies use Haskell does offput me, although I perhaps embrace it belligerently.
  • The Haskell library ecosystem is strong. Less strong than python, but much better than some of the other languages that intrigue my eye. There is functionality somewhere for most common tasks.
  • Haskell is used industrially. I live in an echo chamber, but by and large the industrial users of Haskell sing its praises.

For context, this is my programming journey:

I first learned BASIC in middle school. I wrote a computer game in Visual Basic. I toyed with the TI-83. I went to a summer camp in high school where I learned the extreme rudiments of C++. Did a small web business with some friends. Did really bad work with big dreams. I took a fairly poor Java course in high school. I learned MATLAB in college. I doinked around with Arduino level C projects. I learned python in grad school I think. I think I got far more proficient at python than any other language before. At this point, I was on board for object oriented programming, albeit not at a deep level (design patterns or ilk), just as light organizational principle. Did some Android projects, really didn’t like Java there. Did a small business with my friend and got deep in Javascript. As our web application got bigger, this really start to hurt. Errors all over the place. Unknown junk in objects. Who has access to what? We tried our darndest to read and follow best practices as we could find them, but it felt like we sinking in quicksand. More and more effort for more and more bugs and less and less progress. It was around this era that I first even heard of Haskell. I was intrigued immediately for some reason, maybe for just how weird it was. It took probably 2 years of forgetting about it and going back to tutorials every 6 months. I didn’t necessarily KNOW that this was a thing that I wanted, I just found it interesting. Currently I am fairly proficient at some things in Haskell (unfortunately I am more proficient at esoterica than more practical things). I have had a professional job writing Haskell, and it seems like my like of Haskell is working out very well for me professionally. Passion in all forms is powerful.

My history may not be as convincing as someone who spent 20 years as a professional C++ dev and then switched, but I have at least experienced different paradigms and languages and found Haskell the most to my liking.

Random Thoughts

I am now trying to push myself into comfort in Julia, Rust, Agda, Coq, OCaml, all languages I feel show promise for different reasons. To my knowledge Haskell is a better choice than these as a general purpose tool for pragmatic reasons. Haskell’s library ecosystem is strong and performance is good. These are points against agda and coq. Julia has a focus on scientific programming.

Rust might be a good compromise. I consider it a trojan horse for useful programming language features that I associate with functional languages. It claims to performance and being an acceptable systems level language, which appeals to some. The syntax does not scare anyone off. The library ecosystem seems good and the community strong. I did find myself missing Haskell features in Rust though, am personally much less familiar with it. I think the design of Rust weights more to performance than is warranted in most applications and has less descriptive and abstraction power than Haskell, qualities that I prioritize. This opinion is not strongly held.

What makes Haskells types any better or worse than C? At the beginning many of the features of Haskell seem like magic. But as time has worn on, I can see how the features can be emulated with just some slight syntactic and conceptual overhead in other languages. This slight overhead is enough though. A language is more than just it’s syntax. It is also its idioms. It is also they way it makes people think. Language is almost a prerequisite for thought. You cannot even conceive of the ways there are to express yourself before learning.

What exactly makes a language “good”? This is a question poorly phrased enough to have no answer. Excel can be an excellent language for many tasks. Easy to learn. Very powerful. Yet, it is not considered a good general purpose programming language. Library ecosystem is extremely important. Specialized languages are often the best choice for special problem domains, at the expense of learning them, or eventually finding incompatibility of what you want from what they designed for.

What makes abstractions “good”. Why do I have queasiness about object oriented-programmming. Java, I think basically. I, overeagerly have gone down the road of trying to design deep subclass hierarchies, which is not OO at it’s best. Zebra is a Quadruped is an Animal is Alive kind of stuff. I believe object oriented in an interesting principle. I hear about SmallTalk and Common Lisp doing object oriented right and I am duly intrigued. There has been some recent work in Haskell about how to do objects in a way aesthetically compatible with Haskell. I think object oriented has been over used and abused. I think it is a poor conceptual fit for many situations. I think it tends to make non reusable code. I think the form that it takes in C++ and Java development is arcane horseshit.

I deserve almost no opinion about Java or C++, having not done sufficient that much in them. Yet, I must state my opinions, take them as you will, for I do in fact hold them strongly. I have worked on a network simulator and a robotics framework in C++, but begrudgingly. I have done a very small amount of Java development for a personal project and some Processing sketches. My coworker was a 10 year professional Java dev before switching to Scala then Haskell. He despises Java now. Highly anecdotal, and he is a similar iconoclastic character like me. Nevertheless, this also informs my opinion. I have been reading Bjarne Stroustrup’s book (his stated goals and what he claims C++ achieves are admirable and one can’t argue he hasn’t changed the world) and actually find C++ rather interesting, especially in the sense that many projects that I find interesting are written in C++, I just don’t want to myself work in the language.

Haskell love:

Haskell Criticism (perhaps warranted) Ah Hacker News. Always a sunny worldview.

Hacker news discussion of this post:

A Basic Branch and Bound Solver in Python using Cvxpy

Branch and bound is a useful problem solving technique. The idea is, if you have a minimization problem you want to solve, maybe there is a way to relax the constraints to an easier problem. If so, the solution of the easier problem is a lower bound on the possible solution of the hard problem. If the solution of the easier problem just so happens to also obey the more constrained hard problem, then it must also be the solution to the hard problem. You can also use the lower bound coming from a relaxed problem to prune your search tree for the hard problem. If even the relaxed problem doesn’t beat the current best found, don’t bother going down that branch.

A standard place this paradigm occurs is in mixed integer programming. The relaxation of a binary constraint (either 0 or 1) can be all the values in between (any number between 0 and 1). If this relaxed problem can be expressed in a form amenable to a solver like a linear programming solver, you can use that to power the branch and bound search, also using returned solutions for possible heuristics.

I built a basic version of this that uses cvxpy as the relaxed problem solver. Cvxpy already has much much faster mixed integer solvers baked in (which is useful to make sure mine is returning correct results), but it was an interesting exercise. The real reason I’m toying around is I kind of want the ability to add custom branching heuristics or inspect and maintain the branch and bound search tree, which you’d need to get into the more complicated guts of the solvers bound to cvxpy to get at. Julia might be a better choice.

A somewhat similar (and better) project is which doesn’t use cvxpy explicitly, but does have the branch and bound control in the python layer of the solver. There are also other projects that can use fairly arbitrary solvers like Bonmin

As a toy problem I’m using a knapsack problem where we have objects of different sizes and different values. We want to maximize the value while keeping the total size under the capacity of the bag. This can be phrased linearly like so: \max v \cdot x s.t. \sum_i s_i x_i<= capacity , x \in {0,1}. The basic heuristic I’m using is to branch on variables that are either 0 or 1 in even the relaxed solution. The alternative branch hopefully gets pruned fast.

import cvxpy as cvx
import copy
from heapq import *
import numpy as np
import itertools
counter = itertools.count() 

class BBTreeNode():
    def __init__(self, vars = set(), constraints = [], objective=0, bool_vars=set()):
        self.vars = vars
        self.constraints = constraints
        self.objective = objective
        self.bool_vars = bool_vars
        self.children = []
    def buildProblem(self):
        prob = cvx.Problem(cvx.Minimize(self.objective), self.constraints) #i put Minimize, just so you know that I'm assuming it
        return prob
    def is_integral(self):
        return all([abs(v.value - 1) <= 1e-3 or abs(v.value - 0) <= 1e-3 for v in self.bool_vars])
    def branch(self):
        children = []
        for b in [0,1]:
                n1 = copy.deepcopy(self) #yeesh. Not good performance wise, but is simple implementation-wise
                v = n1.heuristic() #dangerous what if they don't do the same one? I need to do it here though because I need access to copied v.
                n1.constraints.append( v == b ) # add in the new binary constraint
                n1.children = []
                n1.bool_vars.remove(v) #remove binary constraint from bool var set
                n1.vars.add(v) #and add it into var set for later inspection of answer
                #self.children.append(n1)   # eventually I might want to keep around the entire search tree. I messed this up though
        return children
    def heuristic(self):
        # a basic heuristic of taking the ones it seems pretty sure about
        return min([(min(1 - v.value, v.value) , i, v) for i, v in enumerate(self.bool_vars)])[2]
    def bbsolve(self):
        root = self
        res = root.buildProblem().solve()
        heap = [(res, next(counter), root)]
        bestres = 1e20 # a big arbitrary initial best objective value
        bestnode = root # initialize bestnode to the root
        nodecount = 0
        while len(heap) > 0: 
            nodecount += 1 # for statistics
            print("Heap Size: ", len(heap))
            _, _, node = heappop(heap)
            prob = node.buildProblem()
            res = prob.solve()
            print("Result: ", res)
            if prob.status not in ["infeasible", "unbounded"]:
                if res > bestres - 1e-3: #even the relaxed problem sucks. forget about this branch then
                    print("Relaxed Problem Stinks. Killing this branch.")
                elif node.is_integral(): #if a valid solution then this is the new best
                        print("New Best Integral solution.")
                        bestres = res
                        bestnode = node
                else: #otherwise, we're unsure if this branch holds promise. Maybe it can't actually achieve this lower bound. So branch into it
                    new_nodes = node.branch()
                    for new_node in new_nodes:
                        heappush(heap, (res, next(counter), new_node ) )  # using counter to avoid possible comparisons between nodes. It tie breaks
        print("Nodes searched: ", nodecount)      
        return bestres, bestnode

# a simple knapsack problem. we'll want to minimize the total cost of having each of these items, with different sizes.
# Use a random problem instance
N = 20
prices = -np.random.rand(N)
sizes = np.random.rand(N)
x = cvx.Variable(N)
constraints = []
constraints += [x <= 1, 0 <= x] #The relaxation of the binary variable constraint
constraints += [sizes*x <= 5] # total size of knapsack is 5
objective = prices * x
bool_vars = {x[i] for i in range(N)} 
root = BBTreeNode(constraints = constraints, objective= objective, bool_vars = bool_vars)
res, sol = root.bbsolve()
print(sorted(list([(, v.value) for v in sol.bool_vars] + [(, v.value) for v in sol.vars] ) ))

# For comparison let's do the same problem using a built in mixed integer solver.
x = cvx.Variable(N, boolean=True)
constraints = []
constraints += [x <= 1, 0 <= x]
constraints += [sizes*x <= 5]
objective = prices * x
prob = cvx.Problem(cvx.Minimize(objective),constraints)

This is at least solving the problem fairly quickly. It needs better heuristics and to be sped up, which is possible in lots of ways. I was not trying to avoid all performance optimizations. It takes maybe 5 seconds, whereas the cvxpy solver is almost instantaneous.

Nodes searched:  67
[('var0[0]', 0.9999999958228145), ('var0[10]', -1.2718338055950193e-08), ('var0[11]', -1.3726395012104872e-08), ('var0[12]', 0.9999999982326986), ('var0[13]', 0.9999999973744331), ('var0[14]', 0.9999999988156902), ('var0[15]', -1.1908085711772973e-08), ('var0[16]', 0.9999999903780872), ('var0[17]', 0.9999999863334883), ('var0[18]', -1.1481655920777931e-08), ('var0[19]', 0.9999999996667646), ('var0[1]', 0.9999999969549299), ('var0[2]', 0.9999999979596141), ('var0[3]', -9.282428548104736e-09), ('var0[4]', -1.1378022795740783e-08), ('var0[5]', 0.9999999868240312), ('var0[6]', 0.9999999995068807), ('var0[7]', 0.9999999995399617), ('var0[8]', 0.9999999859520627), ('var0[9]', 0.9999999948062767)]
[ 1.00000000e+00  1.00000000e+00  1.00000000e+00 -1.44435650e-12
 -1.88491321e-12  1.00000000e+00  1.00000000e+00  1.00000000e+00
  1.00000000e+00  1.00000000e+00 -7.11338729e-13  1.99240081e-13
  1.00000000e+00  1.00000000e+00  1.00000000e+00 -1.48697107e-12
  1.00000000e+00  1.00000000e+00 -1.75111698e-12  1.00000000e+00]

Edit : I should investigate the Parameter functionality of cvxpy. That would make a make faster version than the one above based on deepcopy. If you made the upper and lower vectors on the binary variables parameters, you could restrict the interval to 0/1 without rebuilding the problem or copying everything.

#rough sketch
b = cvx.Variable(N) 
u = cvx.Parameter(N) 
u.value = np.ones(N)
l = cvx.Parameter(N) 
l.value = np.zeros(N)
constraints += [b <= u, l <= b]
# change l.value and u.value in search loop.

Mixed Integer Programming & Quantization Error

I though of another fun use case of mixed integer programming the other day. The quantization part of a digital to analog converter is difficult to analyze by the techniques taught in a standard signals course (linear analysis, spectral techniques, convolution that sort of thing). The way it is usually done is via assuming the quantization error is a kind of randomized additive noise.

Mixed Integer programming does have the ability to directly encode some questions about this quantization though. We can directly encode the integer rounding relations by putting the constraint that the quantized signal is exactly +-1/2 a quantization interval away from the original signal. Then we can run further analysis on the signals and compare them. For example, I wrote down a quick cosine transform. Then I ask for the worst case signal that leads to the most error on the quantized transform versus the transform of the unquantized signal. My measure of worst case performance was the sum of the difference of the two transforms. I chose this because it is tractable as a mixed integer linear program. Not all reasonable metrics one might want will be easily encodable in a mixed integer framework it seems. Some of them are maximizing over a convex function, which is naughty. (for example trying to maximize the L2 error \sum|x-y|^2 )

In a variant of this, it is also possible to directly encode the digital signal process in terms of logic gate construction and compare that to the intended analog transform, although this will be a great deal more computational expensive.

import cvxpy as cvx
import numpy as np
import matplotlib.pyplot as plt
N = 32
d = 15
x = cvx.Variable(N)
z = cvx.Variable(N, integer=True)
y = z / d # quantized signal. ~31 values between -1 and 1

constraints = []
constraints += [-1 <= y,  y <= 1, -1 <= x, x <= 1]

# roudning constraint. z = round(127*x)
constraints += [-0.5 <= d*x - z, d*x - z <= 0.5] 
# an oppurtnitu for the FFT technique of Vanderbei

n = np.kron(np.arange(N), np.arange(N)).reshape((N,N))
U = np.cos( np.pi * n / N)
kx = U @ x  
ky = U @ y

#hmmmm. Yes. Unfrotunately  I am asking a hard question?
# finding the minimum distortion signal is easy. finding the maximum distortion appears to be hard.
#This is not a convex objective : objective = cvx.Maximize(cvx.sum_squares(kx - ky))
# however, the following linearization does give us a maximally bad signal in a sense.
objective = cvx.Maximize(cvx.sum(kx - ky))

prob = cvx.Problem(objective, constraints)
plt.title("Original Signal")
plt.plot(x.value, label="analog signal")
plt.plot(y.value, label="quantized signal")
plt.title("Error of Transform")
plt.plot(kx.value - ky.value)
plt.title("Cosine transform")
plt.plot(kx.value, label="original signal")
plt.plot(ky.value, label="quantized signal")

This is interesting as a relatively straightforward technique for the analysis of quantization errors.

This also might be an interesting place to use the techniques of Vanderbei . He does a neato trick where he partially embeds the FFT algorithm into an optimization problem by adding auxiliary variables. Despite the expense of adding these variables, it greatly increases the sparsity of the constraint matrices, which will probably be a win. I wonder if one might do something similar with a Fast Multipole Method , Barnes Hut, or Wavelet transform? Seems likely. Would be neat, although I'm not sure what for. I was thinking of simulating the coulomb gas. That seems like a natural choice. Oooh. I should do that.

Solving the XY Model using Mixed Integer Optimization in Python

There are many problems in physics that take the form of minimizing the energy. Often this energy is taken to be quadratic in the field. The canonical example is electrostatics. The derivative of the potential \phi gives the electric field E. The energy is given as \int (|\nabla \phi|^2 + \phi \rho) d^3 x . We can encode a finite difference version of this (with boundary conditions!) directly into a convex optimization modelling language like so.

import cvxpy as cvx
import numpy as np
import matplotlib.pyplot as plt
import scipy.linalg
from mpl_toolkits import mplot3d

N = 10

# building a finite difference matrix. It is rectangle of size N x (N-1). It maps from the vertices of our grid to the lines in between them, where derivatives live.
col = np.zeros(N)
col[0] = -1
col[1] = 1
delta = scipy.linalg.toeplitz(col, np.zeros(N-1)).T

gradx = np.kron(delta, np.eye(N))
grady = np.kron(np.eye(N), delta)

# a variable for our potential
phi = cvx.Variable((N, N))

# vectorization is useful. It flattens out the x-y 2Dness.
phivec = cvx.vec(phi)
gradxvec = gradx.reshape(-1, N*N)
gradyvec = grady.reshape(-1, N*N)

V = cvx.sum_squares(gradxvec * phivec) + cvx.sum_squares(gradyvec * phivec)

constraints = []
# boundary conditions. Dirichlet
constraints += [phi[:,0] == 0, phi[0,:] == 0, phi[:,-1] == 0, phi[-1,:] == 0 ]

# fixed charge density rho
rho = np.zeros((N,N))
rho[N//2,N//2] = 1

# objective is energy
objective = cvx.Minimize(V + cvx.sum(cvx.multiply(rho,phi)))
prob = cvx.Problem(objective, constraints)
res = prob.solve()

# Plotting 
x = np.linspace(-6, 6, N)
y = np.linspace(-6, 6, N)

X, Y = np.meshgrid(x, y)
fig = plt.figure()

ax = plt.axes(projection='3d')
ax.plot_surface(X, Y, phi.value, rstride=1, cstride=1,
                cmap='viridis', edgecolor='none')
The resulting logarithm potential

It is noted rarely in physics, but often in the convex optimization world that the barrier between easy and hard problems is not linear vs. nonlinear, it is actually more like convex vs. nonconvex. Convex problems are those that are bowl shaped, on round domains. When your problem is convex, you can’t get caught in valleys or on corners, hence greedy local methods like gradient descent and smarter methods work to find the global minimum. When you differentiate the energy above, it results in the linear Laplace equations \nabla^2 \phi = \rho. However, from the perspective of solvability, there is not much difference if we replace the quadratic energy with a convex alternative.

def sum_abs(x):
    return cvx.sum(cvx.abs(x))
V = cvx.sum_squares(gradxvec * phivec) + cvx.sum_squares(gradyvec * phivec)
V = cvx.sum(cvx.huber(gradxvec * phivec)) + cvx.sum(cvx.huber(gradyvec * phivec))
V = cvx.pnorm(gradxvec * phivec, 3) + cvx.pnorm(gradyvec * phivec, 3)

a = 1 
dxphi = gradxvec * phivec
dyphi = gradyvec * phivec
V = cvx.sum(cvx.maximum( -a - dxphi, dxphi - a, 0 )) + cvx.sum(cvx.maximum( -a - dyphi, dyphi - a, 0 ))

Materials do actually have non-linear permittivity and permeability, this may be useful in modelling that. It is also possible to consider the convex relaxation of truly hard nonlinear problems and hope you get the echoes of the phenomenology that occurs there.

Another approach is to go mixed integer. Mixed Integer programming allows you to force that some variables take on integer values. There is then a natural relaxation problem where you forget the integer variables have to be integers. Mixed integer programming combines a discrete flavor with the continuous flavor of convex programming. I’ve previously shown how you can use mixed integer programming to find the lowest energy states of the Ising model but today let’s see how to use it for a problem of a more continuous flavor.

As I’ve described previously, in the context of robotics, the non-convex constraint that variables lie on the surface of a circle can be approximated using mixed integer programming. We can mix this fairly trivially with the above to make a global solver for the minimum energy state of the XY model. The XY model is a 2d field theory where the value of the field is constrained to lie on a circle. It is a model of a number of physical systems, such as superconductivity, and is the playground for a number of interesting phenomenon, like the Kosterlitz-Thouless phase transition. Our encoding is very similar to the above except we make two copies of the field phi and we then force them to lie on a circle. I’m trying to factor out the circle thing into my library cvxpy-helpers, which is definitely a work in progress.

import cvxpy as cvx
import numpy as np
import matplotlib.pyplot as plt
import scipy.linalg
from mpl_toolkits import mplot3d
from cvxpyhelpers import cvxpyhelpers as mip

N = 6

# building a finite difference matrix. It is rectangle of size Nx(N-1). It maps from the vertices of our grid to the lines in between them, where derivatives live.
col = np.zeros(N)
col[0] = -1
col[1] = 1
delta = scipy.linalg.toeplitz(col, np.zeros(N-1)).T

gradx = np.kron(delta, np.eye(N))
grady = np.kron(np.eye(N), delta)

# a variable for our potential
phix = cvx.Variable((N, N))
phiy = cvx.Variable((N, N))

# vectorization is useful. It flattens out the x-y 2Dness.
phixvec = cvx.vec(phix)
phiyvec = cvx.vec(phiy)
gradxvec = gradx.reshape(-1, N*N)
gradyvec = grady.reshape(-1, N*N)

def sum_abs(x):
    return cvx.sum(cvx.abs(x))

#V = cvx.sum_squares(gradxvec * phixvec) + cvx.sum_squares(gradyvec * phixvec) + cvx.sum_squares(gradxvec * phiyvec) + cvx.sum_squares(gradyvec * phiyvec) 
V = sum_abs(gradxvec * phixvec) + sum_abs(gradyvec * phixvec) + sum_abs(gradxvec * phiyvec) + sum_abs(gradyvec * phiyvec) 
constraints = []
# coundary conditions. Nice and vortexy.
constraints += [phix[:,0] >= 0.9, phiy[0,1:-1] >= 0.9, phix[:,-1] <= -0.9, phiy[-1,1:-1] <= -0.9 ]

for i in range(N):
    for j in range(N):
        x, y, c =
        constraints += c
        constraints += [phix[i,j] == x]
        constraints += [phiy[i,j] == y]

# fixed charge density rho
rho = np.ones((N,N)) * 0.01
rho[N//2,N//2] = 1

# objective is energy
objective = cvx.Minimize(V + cvx.sum(cvx.multiply(rho,phix)))
prob = cvx.Problem(objective, constraints)
print("solving problem")
res = prob.solve(verbose=True, solver=cvx.GLPK_MI)

# Plotting 
x = np.linspace(-6, 6, N)
y = np.linspace(-6, 6, N)

X, Y = np.meshgrid(x, y)
fig = plt.figure()

plt.quiver(X,Y, phix.value, phiy.value)

Now, this isn't really an unmitigated success as is. I switched to an absolute value potential because GLPK_MI needs it to be linear. ECOS_BB works with a quadratic potential, but it was not doing a great job. The commercial solvers (Gurobi, CPlex, Mosek) are supposed to be a great deal better. Perhaps switching to Julia, with it's richer ecosystem might be a good idea too. I don't really like how the solution of the absolute value potential looks. Also, even at such a small grid size it still takes around a minute to solve. When you think about it, it is exploring a ridiculously massive space and still doing ok. There are hundreds of binary variables in this example. But there is a lot of room for tweaking and I think the approach is intriguing.


  • Can one do steepest descent style analysis for low energy statistical mechanics or quantum field theory?
  • Is the trace of the mixed integer program search tree useful for perturbative analysis? It seems intuitively reasonable that it visits low lying states
  • The Coulomb gas is a very obvious candidate for mixed integer programming. Let the charge variables on each grid point = integers. Then use the coulomb potential as a quadratic energy. The coulomb gas is dual to the XY model. Does this exhibit itself in the mixed integer formalism?
  • Coulomb Blockade?
  • Nothing special about the circle. It is not unreasonable to make piecewise linear approximations or other convex approximations of the sphere or of Lie groups (circle is U(1) ). This is already discussed in particular about SO(3) which is useful in robotic kinematics and other engineering topics.

Edit: /u/mofo69extreme writes:


By absolute value potential, I mean using |del phi| as compared to a more ordinary quadratic |del phi|2.

This is where I'm getting confused. As you say later, you are actually using two fields, phi_x and phi_y. So I'm guessing your potential is the "L1 norm"

|del phi| = |del phi_x| + |del phi_y|

? This is the only linear thing I can think of.

I don't feel that the exact specifics of the XY model actually matter all the much.

You should be careful here though. A key point in the XY model is the O(2) symmetry of the potential: you can multiply the vector (phi_x,phi_y) by a 2D rotation matrix and the Hamiltonian is unchanged. You have explicitly broken this symmetry down to Z_4 if your potential is as I have written above. In this case, the results of the famous JKKN paper and this followup by Kadanoff suggest that you'll actually get a phase transition of the so-called "Ashkin-Teller" universality class. These are actually closely related to the Kosterlitz-Thouless transitions of the XY model; the full set of Ashkin-Teller phase transitions actually continuously link the XY transition with that of two decoupled Ising models.

You should still get an interesting phase transition in any case! Just wanted to give some background, as the physics here is extremely rich. The critical exponents you see will be different from the XY model, and you will actually get an ordered Z_4 phase at low temperatures rather than the quasi-long range order seen in the low temperature phase of the XY model. (You should be in the positive h_4 region of the bottom phase diagram of Figure 1 of the linked JKKN paper.)"

These are some interesting points and references.