System Identification of a Pendulum with scikit-learn

System identification is the name of the problem of finding the dynamics of a physical system. It is an old topic. For some interesting material on system identification, check out Steve Brunton’s video here

We’ve been building a raspberry pi controlled pendulum to be controlled from the internet and the problem came up of trying to get a simulation to match the physical pendulum.

Look at ‘er go. Energy shaping for the win.

We weighed the pendulum and calculated the torque due to gravity mg\frac{l}{2} \sin(\theta) (you can think of it as the full force of gravity acting on the level arm of the center of the pole \frac{l}{2}\sin(\theta)) and moment of inertia of a rod about it’s end \frac{1}{3}m L^2.

However, It is difficult to estimate the torque supplied by the motor. Motors have surprisingly complicated behavior. It is also difficult from first principles to estimate damping or friction terms.

There are a couple different experimental stratagems for a pendulum. One thing we tried was setting the pendulum on it’s side and setting the motor duty cycle to different values. From this you can fit a parabola to those curves and get a acceleration constant for the different motor settings. Experimentally speaking, it seemed roughly linear acceleration to motor PWM duty cycle.

Another stratagem is to take resonance curves for the pendulum. Try exciting it with different sinusoidal torques at a sweep of frequencies. From this curve you can recover a resonance frequency and damping coefficients.

These all make sense as kind of ersatz methods. We’re taking our intuitive understanding of the system and other results from simpler or related systems and combining them together.

An interesting alternative approach to the above is to drive the pendulum with a random torque and then fit a parameterized model of the equations of motion to the observed acceleration. The model should include at the least the gravity \beta_1 sin(\theta) term, motor torque term \beta_2 u, and a damping terms \beta_3 \dot{\theta}. A simple start is \alpha = \beta_1 sin(\theta) + \beta_2 u + \beta_3 \dot{\theta} . This is a linear model with respect to the coefficients \beta_i and can be solved by least squares.

I’ve come to appreciate sci-kit learn for fitting. It doesn’t have the hottest most high falutin’ fads, but it’s got a lot of good algorithms in it that just work, are damn easy to use, and are easy to swap different possibilities out of there. Even though I know how to more manually set up a least squares system or solve a LASSO problem via cvxpy, it makes it really easy and clean. I’ve started reaching for it for fast attacks on fitting problems.

We mocked out our interface to behave similarly to an OpenAI gym interface. Because of this, the observations already have the cosine and sine terms that might be of interest and the angular velocity value that would be used for a simple damping term \beta \dot{\theta}.

import gym
import time
import numpy as np
env = gym.make('pendulum-v0')
observation = env.reset()

action = 0
dt = 0.05
obs = []
rews = []
actions = []
for i in range(1000):

    # A random walk for actions.
    # we need the actions to be slow changing enough to see trends
    # but fast enough to see interesting behavior
    # tune this by hand 
    action += np.random.randn() * dt
    action = max( min(action, 2 ), -2) 
    observation, reward, done, info = env.step([action])

obs = np.array(obs) # obs includes thetadot, cos(theta), sin(theta). A good start.
actions = np.array(actions) # the pwm value used

# data to predict alpha from. Each row is a data point from one time step.
X = np.hstack( (obs[:-1, :] , actions[:-1].reshape(-1,1)) ) 

alphas = (obs[1:,2] - obs[:-1,2]  ) / dt  #angular acceleration

# feel free to swap in LASSO or other regressors
from sklearn.linear_model import LinearRegression 

# fit the observed angular acceleration as a function of X
reg = LinearRegression().fit(X, alphas)
print(f"intercept : {reg.intercept_},  coeffs : {reg.coef_} ")

The number that came out for gravity term matched the number calculated from first principles by within 10%. Not bad!

A thing that is nice about this approach is that one is able to add terms into the dynamics for which we don’t have good intuitive models to compare to like your good ole Physics I Coulombic friction term F \propto -sign(v)  or nonlinearities.

A Smattering of Physics in Sympy

Sympy is fun. I’ve been enjoying trying out some simple physics problems and seeing what kind of fun angles sympy brings to the table. It does pretty good on concrete problems, not so good at abstract derivations.


Ah such fond memories! In high school, I was taught by Ric Thompson “the big four”.

x_f = x_i + v_i t + \frac{1}{2} a t^2

v_f = v_i + a t

v_f^2 = v_i^2 + 2 a d

d = \frac{v_f + v_i}{2} t

The equations are of course, overcomplete. They are all implied by \frac{d^2}{dt^2}x = a, but even with only algebra then second two are derivable from the first two.

Of course a natural way of deriving the equations is to solve one equation for a variable and substitute it into the other equation. sympy makes this pretty dang easy.

from sympy import *
t,a,d,vf,vi = symbols("t a d vf vi")
e1 = Eq(d , vi * t + 1/2 * a * t ** 2)
tsub = solve(Eq(vf , vi + a * t),t)[0]
print(tsub) # This is assuming a is nonzero though.
(vf - vi)/a
Eq(d, 0.5*vf**2/a - 0.5*vi**2/a)

However, there is a more automated approach.

It turns out that a decent chunk of physics equations are or can be well approximated by a system of polynomial equations. There are systematic methods that are guaranteed to solve the problem (albeit maybe not in the lifetime of the universe).

A grobner basis is an equivalent set of polynomial equations that has useful properties. For some simple purposes, all you need to know is that if you give the variables you want to eliminate first, the Groebner basis will contain equations without those variable. Here we specify t as one to eliminate, so we get an equation without t in it

G = groebner(  [vi * t + 1/2 * a * t ** 2 - d,  
                vi + a * t - vf] , 
                 t,vf,d,a,vi  )
for e in G:
-2.0*d + 1.0*t*vf + 1.0*t*vi
1.0*a*t - 1.0*vf + 1.0*vi
-2.0*a*d + 1.0*vf**2 - 1.0*vi**2

I’ve actually been pleasantly surprised at how many physics problems reduce ultimately to systems of polynomial constraints. Energy and momentum conservation are polynomial constraints (classical feynman diagrams kind of). Special relativity questions can be reduced to polynomial constraints using the proper time.

#elephant problem
# elephants give birth at 21 months. On a rocket at velocity v
# how long T until you see it give birth? 
tau , t1, t2, x1, v, c, T = symbols("tau t1 t2 c1 v c T")

eqs = [
    tau**2 - (t1**2 - x1**2 / c**2), # proper time
    x1 - v * t1, # distance away
    c * t2 - x1, # time for light to travel back
    T - t1 - t2, # total time
    tau - 21 # proper time is 21 months

groebner(eqs, tau , t1, t2, x1, v, T)

Lagrangian Mechanics

The Structure and Interpretation of Classical Mechanics is an interesting book.

It points out that notation we use is extremely imprecise and implicit. This is a source of great confusion.

A great benefit of programming up such examples is that it makes explicit (sometimes painfully so) steps that were implicit before.

In the Euler Lagrange equation, first partially differentiates considering q and \dot{q} to be independent parameters. Then a substitution is makde for a function x(t) and then we procede with a differentiation with respect to time.

# simple harmonic oscillator lagrangian style
m, k = symbols("m k", real = True, positive=True)
v, q = symbols("v q")
K = Rational(1,2) * m * v ** 2 #kinetic energy
V = Rational(1,2) * k * q ** 2 # potential energy
L =  K - V  #Lagrangian
F = diff(L,q) # force
p = diff(L,v) # momentum

x_ = Function("x")
t = symbols("t")

x = x_(t)

subst = { v : diff(x,t),
         q : x} # replacement to turn q into a function x

# euler-lagrange equations of motion
eq = F.subs( subst ) - diff( p.subs(subst)  , t )
dsolve(eq) # general solution cosine and sines

Here’s an analogous thing for a pendulum

#simple harmonic oscillator lagrangian style
m, g, L = symbols("m g L", real = True, positive=True)
theta, w = symbols("theta omega")
K = Rational(1,2) * m * (w * L) ** 2 #kinetic energy
V = - Rational(1,2) * m * g * L * cos(theta) # potential energy. angle is defined as 0 = hanging down
L =  K - V  #Lagrangian
F = diff(L,theta) # force
p = diff(L,w) # momentum

x_ = Function("theta")
t = symbols("t")

x = x_(t)

subst = { w : diff(x,t),
         theta : x} # replacement to turn q into a function x

# euler-lagrange equations of motion
eq = F.subs( subst ) - diff( p.subs(subst)  , t )

Another place where an implicit stated substitution is absolutely vital is in the Legendre transform going from the Lagrangian to the Hamiltonian.

# legendre transformation to hamiltonian
p = symbols( "p" )
H_ = p * v - L # hamiltonian but we haven't solved out v yet
v_of_pq = solve(diff(H_, v), v)[0] # set derivative to 0 to solve for v.
H = simplify(H_.subs(v, v_of_pq )) # substitue back in. Here is the actual hamiltonian

Statistical Mechanics

Sympy can do Gaussian integrals! How convenient. It can also do power series expansions. And differentiate. So it takes the drudgery out of some simple calculations

# ideal gas partition function
beta, m, V, N, kb, T  = symbols("beta m V N k_b T", real=True, positive=True)
p = symbols("p", real=True)
Z = integrate( exp( - beta * Rational(1,2) * p ** 2 / m ), (p,-oo,oo))**(3*N) * V**N #partition function
def avg_energy(Z):
    return - diff(ln(Z), beta).subs(beta, 1/ kb / T)
print(avg_energy(Z)) #
F = (-ln(Z) / beta).subs(beta, 1 / kb / T) #helmholtz free energy
S = diff(F , T) # sentropy is derivative of helmholtz wrt T
S # the functional dependence on T and V I think is correct
P = -diff(F , V) # pressure is - derivative of V
# Neato
# hamrmonic oscillator partition function
beta, m, k = symbols("beta m k ", real=True, positive=True)
p, x = symbols("p x", real=True)
E = R(1,2) * p ** 2 / m  + R(1,2) * k * x ** 2
Z = integrate( integrate( exp( - beta * E ), (p,-oo,oo)) , (x,-oo, oo))**N 

Perturbation theory of the partition function of an anharmonic oscillator. Pretty easy. It is interesting to note that this is the very simplest schematic of how perturbation theory can be approached for quantum field theory.

# pertubration theory of anharmonic oscillator
beta, m, k, g = symbols("beta m k g ", real=True, positive=True)
p, x = symbols("p x", real=True)
E = Rational(1,2) * ( p ** 2 / m  +  k * x ** 2) + g * x ** 4
series(exp( - beta * E ), g).removeO()
Z = integrate( integrate( series(exp( - beta * E ), g, n=2).removeO(), (p,-oo,oo)) , (x,-oo, oo))
simplify(diff(-ln(Z),beta)) #E
simplify(diff(-ln(Z),k)/beta) #<x**> 

Other things that might be interesing : 2 oscillators, A chain of oscillators, virial expansion

Thermo and Legendre Tranformations

Thermodynamics is a poorly communicated topic. Which variables remain in expressions and what things are held constant when differentiating are crucial and yet poorly communicated and the notation is trash. Sympy helps make some things explicit. It’s fun.

u,s,t,p,v,n,r = symbols("u s t p v n r")

du,ds,dt,dp,dv = symbols("du ds dt dp dv")
# taylor series in stuff?

e1 = p * v - n * r * t
e2 = u - Rational(3 , 2) * n * r * t

state = [  (u,du), (s,ds), (t,dt) , (p,dp) , (v,dv) ]

def differential(e):
    return sum( [ diff(e,x) * dx  for x,dx in state]   )

de1 = differential(e1 )
de2 = differential(e2 )

e3 = du - (t * ds - p * dv)

eqs = [e1,e2,de1,de2,e3]
G = groebner( eqs, u , du,  t, dt, p, dp, v, dv,  ds )
for e in G:
R = Rational
U,S,T,P,V,N, k = symbols("U S T P V N k")

cv = R(3,2) * N * k
e1 = U - cv * T
e2 = P * V - N * k * T
e3 = S - cv * ln(T) + N * k * ln(V)

elim = [P,T]
Ps = solve([e1,e2,e3], P)
es = [ e.subs(Ps) for e in [e1,e2,e3] ]
Ts = solve(e3, T)[0]
es = [  e.subs(T,Ts) for e in es ]
Usv = solve(es[0],U)[0]
psv = diff(Usv,V)
tsv = diff( Usv , S )

#solve(es[0], V)

Hsv = Usv + P * V  # enthalpy and legendre trnasformation
Vps = solve(diff(Hsv, V) , V) 
H =  Hsv.subs(V, Vps[0]) 

There are so many other things!

What about a Van Der Waals equation? Optics (geometrical and wave, paraxial ~ Schrodinger, fourier optics), GR (exterior derivatives ) , Quantum (wave matching problems. What can we do about hydrogen? WKB, QHE) rutherford scattering, Weiss mean field, canonical transformations, Rotations. Clebsh-Gordon coefficients

Computational Category Theory in Python III: Monoids, Groups, and Preorders

Parts 1 and 2 are found here and here

From one perspective, categories are just another algebraic structure, like groups, monoids and rings. They are these abstract things that have some abstract equational axioms and operations. They are the next stop on our magnificent category journey.

A monoid is a thing that has an associative operation with a unit. Addition and 0 make numbers a monoid. Multiplication and 1 are a separate monoid for numbers. Concatenation and empty lists make lists a monoid. Union and empty set make sets a monoid. We can encode this in python like so:

class PlusIntMonoid(int):
def mplus(self,b):
return self + b
def mzero():
return 0
class TimesIntMonoid(int):
def mplus(self,b):
return self * b
def mzero():
return 1
class ListMonoid(list):
def mplus(self,b):
return self + b
def mzero():
return []
class UnionMonoid(set):
def mplus(self,b):
return self.union(b)
def mzero():
return set()
ListMonoid([1,2]).mplus(ListMonoid([1,2])) # [1,2,1,2]
UnionMonoid({1,2}).mplus(UnionMonoid({1,4})) # {1,2,4}
TimesIntMonoid(3).mplus(TimesIntMonoid.mzero()) # 3
view raw hosted with ❤ by GitHub

What does this have to do with categories? Well, if some thing is a category, it obeys the axioms that define what it means to be a category. It has morphisms and objects. The morphisms compose if head meets tail on an object. There are always identity morphism.

The morphisms in a category with 1 object automatically obey the monoid axioms. In this case, the category axioms imply the monoid axioms. Everything composes because there is only one object. It’s a kind of degenerate case where we are not using the partiality of the composition operator. There is automatically a unit for composition because the identity morphism is a unit. Composition is already required to be associative. Boom. The thing is a monoid.

Continuing with our representation from previous posts, we make a python class for each category. An instance of this class is a morphism in this category. If you ask for the domain or codomain of any morphism, you always get back () because it is a single object category. Compare these classes with the above classes.

class PlusIntCat(int):
def compose(self,b):
return self + b
def idd():
return 0
def dom(self):
return () # always return (), the only object
def cod(self):
return ()
class TimesIntCat(int):
def compose(self,b):
return self * b
def idd():
return 1
def dom(self):
return ()
def cod(self):
return ()
class ListCat(int):
def compose(self,b):
return self + b
def idd():
return []
def dom(self):
return ()
def cod(self):
return ()
class UnionSetCat(set):
def compose(self,b):
return self.union(b)
def idd(self,b):
return set()
def dom(self):
return ()
def cod(self):
return ()
PlusIntCat(3).compose(PlusIntCat.idd()) # 3
view raw hosted with ❤ by GitHub

Some monoids are also groups if there is a natural inverse operation. The integers are a group under addition where the negative gives you the inverse. Some aren’t though. The natural numbers (0,1,2…) aren’t a group under addition though.

Similarly groups can be thought of as a category with one object, with the additional requirement that every morphism is invertible, that there is always a f^{-1} such that f \circ f^{-1} = id.

Sympy has groups in it. We can make a wrapper of that functionality that looks like a categorical interface. To match our pattern of using python classes to represent categories, it is convenient to do the slightly uncommon thing of making a class definition generator function fp_group_cat. Every time you call this function, it makes a different class and a different category. I have only here wrapped the finitely presented group functionality, but there are also free groups, permutation groups, and named groups available in sympy.

from sympy.combinatorics.free_groups import free_group, vfree_group, xfree_group
from sympy.combinatorics.fp_groups import FpGroup, CosetTable, coset_enumeration_r
def fp_group_cat(G, catname):
# A Category generator that turns a finitely presented group into categorical python class
Cat = type(catname, (), vars(G))
def cat_init(self,a):
self.f = a
Cat.__init__ = cat_init
Cat.compose = lambda self,b : G.reduce(self.f * b.f)
Cat.dom = lambda : ()
Cat.cod = lambda : ()
Cat.idd = lambda : Cat(G.identity)
return Cat
F, a, b = free_group("a, b")
G = FpGroup(F, [a**2, b**3, (a*b)**4])
MyCat = fp_group_cat(G, "MyCat")
view raw hosted with ❤ by GitHub

Many objects, at most one arrow per pair: Preorders

We can simplify the power of a category in a different direction. Instead of having only 1 object, we’ll have few arrows.

A category with many objects but at most a single morphism between a pair of them obeys the axioms of a preorder. In categorical terminology this is sometimes called a thin category Any actual order like like \le on numbers is also a preorder, but preorders have slightly weaker requirements. Here is a categorical representation of the ordering on integers (although really the same implementation will work for any python type that implements <= and == )

class IntOrderCat():
def __init__(self, dom, cod):
assert(dom <= cod)
self.cod = cod
self.dom = dom
self.f = ()
def idd(n):
return IntOrderCat(n,n)
def compose(f,g):
assert( f.dom == g.cod )
return IntOrderCat( g.dom, f.cod )
def __repr__(self):
return f"[{self.dom} <= {self.cod}]"
# our convention for the order of composition feels counterintuitive here.
IntOrderCat(3,5).compose(IntOrderCat(2,3)) # [2 <= 5]
IntOrderCat.idd(3) # [3 <= 3]
view raw hosted with ❤ by GitHub

An example of a partial order is the subset relationship, which we can represent using python sets. This is an important but perhaps confusing example. Haven’t we already defined FinSet? Yes, but these are different categories. In FinSet, morphisms are functions. In SubSetCat a morphisms is the subset relationship (of which there can either be one or not one). They just plain are not the same thing even though there are sets in the mix for both. The situation is made even more confusing by the fact that the subset relationship can be talked about indirectly inside FinSet using monic morphisms, which have as their image the subset of interest.

class SubSetCat():
def __init__(self,dom,cod):
assert( dom.issubset(cod))
self.cod = cod
self.dom = dom
def compose(f,g):
assert(f.dom == g.cod)
return SubSetCat(g.dom, f.cod)
def idd(s):
return SubSetCat(s,s)
def __repr__(self):
return f"[{self.dom} <= {self.cod}]"
SubSetCat( {1,2,3} , {1,2,3,7} ).compose(SubSetCat( {1,2} , {1,2,3} )) # [{1, 2} <= {1, 2, 3, 7}]
view raw hosted with ❤ by GitHub

Preorders are related to directed acyclic graphs (DAG), the directed graphs that have no loops. If you give me a DAG, there is a preorder that is generated by that DAG. Exercise for the reader (AKA I’m lazy): Can you turn a Networkx DAG into a category?


This is nice and all just to explain categories in terms of some perhaps more familiar concepts. It feels a little ho-hum to me. We are not getting really any benefit from the concept of a category from this post. However, the examples of monoids, groups and preorders are always something you should think about when presented when a new categorical concept, because it probably reduces to something more familiar in this case. In addition, mappings to/from these simple objects to more complicated categories can be very interesting.

The methods of computational group theory are intriguing. It seems like some of them should extend to category theory. See this book by RFC Walters for example A very interesting book in other ways too. (Thanks to Evan Patterson for the tip)

Next time I think we’ll talk about finite categories and the finite Yoneda lemma.

Artwork courtesy of David

Edit: Hacker News discussion:

Computational Category Theory in Python II: Numpy for FinVect

Linear algebra seems to be the place where any energy you put in to learning it seems to pay off massively in understanding other subjects and applications. It is the beating heart of numerical computing. I can’t find the words to overstate the importance of linear algebra.

Here’s some examples:

  • Least Squares Fitting – Goddamn is this one useful.
  • Quadratic optimization problems
  • Partial Differential Equations – Heat equations, electricity and magnetism, elasticity, fluid flow. Differential equations can be approximated as finite difference matrices acting on vectors representing the functions you’re solving for.
  • Linear Dynamical systems – Solving, frequency analysis, control, estimation, stability
  • Signals – Filtering, Fourier transforms
  • Quantum mechanics – Eigenvalues for energy, evolving in time, perturbation theory
  • Probability – Transition matrices, eigenvectors for steady state distributions.
  • Multidimensional Gaussian integrals – A canonical model in quantum mechanics and probability because they are solvable in closed form. Gaussian integrals are linear algebra in disguise. Their solution is describable in terms of the matrices and vectors in the exponent. More on this another day.

Where does category theory come in to this?

On one side, exploring what categorical constructions mean concretely and computationally in linear algebra land helps explain the category theory. I personally feel very comfortable with linear algebra. Matrices make me feel good and safe and warm and fuzzy. You may or may not feel the same way depending on your background.

In particular, understanding what the categorical notion of a pullback means in the context of matrices is the first time the concept clicked for me thanks to discussions with James Fairbanks and Evan Patterson.

But the other direction is important too. A categorical interface to numpy has the promise of making certain problems easier to express and solve. It gives new tools for thought and programming. The thing that seems the most enticing to me about the categorical approach to linear algebra is that it gives you a flexible language to discuss gluing together rectangular subpieces of a numerical linear algebra problem and it gives a high level algebra for manipulating this gluing. Down this road seems to be an actionable, applicable, computational, constructible example of open systems.

Given how important linear algebra is, given that I’ve been tinkering and solving problems (PDEs, fitting problems, control problems, boundary value problems, probabilistic dynamics, yada yada ) using numpy/scipy for 10 years now and given that I actually have a natural reluctance towards inscrutable mathematics for its own sake, I hope that lends some credence to when I say that there really is something here with this category theory business.

It frankly boggles my mind that these implementations aren’t available somewhere already! GAH!

Uh oh. I’m foaming. I need to take my pills now.


The objects in the category FinVect are the vector spaces. We can represent a vector space by its dimensionality n (an integer). The morphisms are linear maps which are represented by numpy matrices. ndarray.shape basically tells you what are the domain and codomain of the morphism. We can get a lot of mileage by subclassing ndarray to make our FinVect morphisms. Composition is matrix multiplication (which is associative) and identity morphisms are identity matrices. We’ve checked our category theory boxes.

import numpy as np
import scipy
import scipy.linalg
class FinVect(np.ndarray):
def __new__(cls, input_array, info=None):
# Input array is an already formed ndarray instance
# We first cast to be our class type
obj = np.asarray(input_array).view(cls)
assert(len(obj.shape) == 2) # should be matrix
# add the new attribute to the created instance
# Finally, we must return the newly created object:
return obj
def dom(self):
''' returns the domain of the morphism. This is the number of columns of the matrix'''
return self.shape[1]
def cod(self):
''' returns the codomain of the morphism. This is the numer of rows of the matrix'''
return self.shape[0]
def idd(n):
''' The identity morphism is the identity matrix. Isn't that nice? '''
return FinVect(np.eye(n))
def compose(f,g):
''' Morphism composition is matrix multiplication. Isn't that nice?'''
return f @ g
view raw hosted with ❤ by GitHub

A part of the flavor of category theory comes from taking the focus away from the objects and putting focus on the morphisms.

One does not typically speak of the elements of a set, or subsets of a set in category theory. One takes the slight indirection of using the map whose image is that subset or the element in question when/if you need to talk about such things.

This actually makes a lot of sense from the perspective of numerical linear algebra. Matrices are concrete representations of linear maps. But also sometimes we use them as data structures for collections of vectors. When one wants to describe a vector subspace concretely, you can describe it either as the range of a matrix or the nullspace of a matrix. This is indeed describing a subset in terms of a mapping. In the case of the range, we are describing the subspace as all possible linear combinations of the columns \lambda_1 c_1 + \lambda_2 c_2 + ... . It is a matrix mapping from the space of parameters \lambda to the subspace (1 dimension for each generator vector / column). In the case of the nullspace it is a matrix mapping from the subspace to the space of constraints (1 dimension for each equation / row).

The injectivity or surjectivity of a matrix is easily detectable as a question about its rank. These kinds of morphisms are called monomorphisms and epimorphisms respectively. They are characterized by whether you can “divide” out by the morphism on the left or on the right. In linear algebra terms, whether there is a left or right inverse to these possibly rectangular, possibly ill-posed matrices. I personally can never remember which is which (surf/ing, left/right, mono/epi) without careful thought, but then again I’m an ape.

def monic(self):
''' Is mapping injective.
In other words, maps every incoming vector to distinct outgoing vector.
In other words, are the columns independent.
In other words, does `self @ g == self @ f` imply `g == f` forall g,f '''
return np.linalg.matrix_rank(self) == self.dom
def epic(self):
''' is mapping surjective? '''
return np.linalg.matrix_rank(self) == self.cod
view raw hosted with ❤ by GitHub

Some categorical constructions are very simple structural transformation that correspond to merely stacking matrices, shuffling elements, or taking transposes. The product and coproduct are examples of this. The product is an operation that takes in 2 objects and returns a new object, two projections \pi_1 \pi_2 and a function implementing the universal property that constructs f from f_1 f_2.

The diagram for a product

Here is the corresponding python program. The matrix e (called f in the diagram. Sorry about mixed conventions. ) is the unique morphism that makes those triangles commute, which is checked numerically in the assert statements.

def product(a,b):
''' The product takes in two object (dimensions) a,b and outputs a new dimension d , two projection morphsisms
and a universal morphism.
The "product" dimension is the sum of the individual dimensions (funky, huh?)
d = a + b # new object
p1 = FinVect(np.hstack( [np.eye(a) , np.zeros((a,b))] ))
p2 = FinVect(np.hstack( [np.zeros((b,a)), np.eye(b) ] ))
def univ(f,g):
assert(f.dom == g.dom) # look at diagram. The domains of f and g must match
e = np.vstack(f,g)
# postconditions. True by construction.
assert( np.allclose(p1 @ e , f )) # triangle condition 1
assert( np.allclose(p2 @ e , g ) ) # triangle condition 2
return e
return d, p1, p2, univ
view raw hosted with ❤ by GitHub

The coproduct proceeds very similarly. Give it a shot. The coproduct is more similar to the product in FinVect than it is in FinSet.

The initial and terminal objects are 0 dimensional vector spaces. Again, these are more similar to each other in FinVect than they are in FinSet. A matrix with one dimension as 0 really is unique. You have no choice for entries.

def terminal():
''' terminal object has unique morphism to it '''
return 0, lambda x : FinVect(np.ones((0, x)))
def initial():
''' the initial object has a unique morphism from it
The initial and final object are the same in FinVect'''
return 0, lambda x : FinVect(np.ones((x, 0)))
view raw hosted with ❤ by GitHub

Where the real meat and potatoes lives is in the pullback, pushout, equalizer, and co-equalizer. These are the categorical constructions that hold equation solving capabilities. There is a really nice explanation of the concept of a pullback and the other above constructions here .

Vector subspaces can be described as the range of the matrix or the nullspace of a matrix. These representations are dual to each other in some sense. RN=0. Converting from one representation to the other is a nontrivial operation.

In addition to thinking of these constructions as solving equations, you can also think of a pullback/equalizer as converting a nullspace representation of a subspace into a range representation of a subspace and the coequalizer/pushout as converting the range representation into a nullspace representation.

The actual heart of the computation lies in the scipy routine null_space and orth. Under the hood these use an SVD decomposition, which seems like the most reasonable numerical approach to questions about nullspaces. (An aside: nullspaces are not a very numerical question. The dimensionality of a nullspace of a collection of vectors is pretty sensitive to perturbations. This may or may not be an issue. Not sure. )

Let’s talk about how to implement the pullback. The input is the two morphisms f and g. The output is an object P, two projections p1 p2, and a universal property function that given q1 q2 constructs u. This all seems very similar to the product. The extra feature is that the squares are required to commute, which corresponds to the equation f p_1 = g p_2 and is checked in assert statements in the code. This is the equation that is being solved. Computationally this is done by embedding this equation into a nullspace calculation \begin{bmatrix} F & -G \end{bmatrix} \begin{bmatrix} x  \\ y \end{bmatrix} = 0. The universal morphism is calculated by projecting q1 and q2 onto the calculated orthogonal basis for the nullspace. Because q1 and q2 are required to be in a commuting square with f and g by hypothesis, their columns live in the nullspace of the FG stacked matrix. There is extra discussion with James and Evan and some nice derivations as mentioned before here

def pullback(f,g):
assert( f.cod == g.cod ) # Most have common codomain
# horizontally stack the two operations.
null = scipy.linalg.null_space( np.hstack([f,-g]) )
d = null.shape[1] # dimension object of corner of pullback
p1 = FinVect(null[:f.dom, :])
p2 = FinVect(null[f.dom:, :])
def univ(q1,q2):
# preconditions
assert(q1.dom == q2.dom )
assert( np.allclose(f @ q1 , g @ q2 ) ) # given a new square. This means u,v have to inject into the nullspace
u = null.T @ np.vstack([q1,q2]) # calculate universal morphism == p1 @ u + p2 @ v
# postcondition
assert( np.allclose(p1 @ u, q1 )) # verify triangle 1
assert( np.allclose(p2 @ u, q2 ) ) # verify triangle 2
return u
# postcondition
assert( np.allclose( f @ p1, g @ p2 ) ) # These projections form a commutative square.
return d, p1, p2, univ
view raw hosted with ❤ by GitHub

The equalizer, coequalizer, and pushout can all be calculated similarly. A nice exercise for the reader (AKA I’m lazy)!


I think there are already some things here for you to chew on. Certainly a lot of polish and filling out of the combinators is required.

I am acutely aware that I haven’t shown any of this being used. So I’ve only shown the side where the construction helps teach us category theory and not entirely fulfilled the lofty promises I set in the intro. I only have finite endurance. I’m sure the other direction, where this helps us formulate problems, will show up on this blog at some point. For what I’m thinking, it will be something like this post but in a different pullback-y style. Mix together FinSet and FinVect. Something something decorated cospans?

One important thing is we really need to extend this to affine maps rather than linear maps (affine maps allow an offset y = Ax + b. This is important for applications. The canonical linear algebra problem is Ax=b which we haven’t yet shown how to represent.

One common approach to embed the affine case in the linear case is to use homogenous coordinates.

Alternatively, we could make a new python class FinAff that just holds the b vector as a separate field. Which approach will be more elegant is not clear to me at the moment.

Another very enticing implementation on the horizon is a nice wrapper for compositionally calculating gaussian integrals + linear delta functions. Gaussian integrals + delta functions are solved by basically a minimization problem over the exponent. I believe this can be formulated by describing the linear subspace we are in as a span over the input and output variables, associating a quadratic form with the vertex of the span. You’ll see.


Blah Blah Blah

Whenever I write a post, I just let it flow, because I am entranced by the sound of my own keyboard clackin’. But it would deeply surprise me if you are as equally entranced, so I take sections out that are just musings and not really on the main point. So let’s toss em down here if you’re interested.

I like to draw little schematic matrices sometimes so I can visually see with dimensions match with which dimensions.

Making the objects just the dimension is a structural approach and you could make other choices. It may also make sense to not necessarily identify two vector spaces of the same dimensionality. It is nonsensical to consider a vector of dog’s nose qualities to be interchangeable with a vector of rocket ship just because they both have dimensionality 7.

High Level Linear Algebra

Linear algebra already has some powerful high level abstractions in common use.

Numpy indexing and broadcasting can sometimes be a little cryptic, but it is also very, very powerful. You gain both concision and speed.

Matrix notation is the most commonly used “pointfree” notation in the world. Indexful expressions can be very useful, but the calculus of matrices lets us use intuition built about algebraic manipulation of ordinary numbers to manipulate large systems of equations in a high level way. There are simple rules governing matrix inverse, transpose, addition, multiplication, identity.

Another powerful notion in linear algebra is that of block matrices. Block matrices are the standard high level notation to talk about subpieces of a numerical linear algebra problem. For example, you might be solving the heat equation on two hunks of metal attached at a joint. It is natural to consider this system in block form with the off diagonal blocks corresponding to the coupling of the two hunks.

One does not typically speak of the elements of a set, or subsets of a set in category theory. One takes the slight indirection of using the map whose image is that subset or the element in question when/if you need to talk about such things. This pays off in a couple ways. There is a nice minimalism in that you don’t need a whole new entity (python class, data structure, what have you) when you already have morphisms lying around. More importantly though the algebraic properties of what it means to be an element or subset are more clearly stated and manipulated in this form. On the flipside, given that we often return to subset or element based thinking when we’re confused or explaining something to a beginner shows that I think it is a somewhat difficult game to play.

The analogy is that a beginner will often write for loops for a numpy calculation that an expert knows how to write more concisely and efficiently using broadcasting and vectorization. And sometimes the expert just can’t figure out how to vectorize some complicated construction and defeatedly writes the dirty feeling for loop.

What about in a language where the for loops are fast, like Julia? Then isn’t the for loop version just plain better, since any beginner can read and write it and it runs fast too? Yes, I think learning some high level notation or interface is a harder sell here. Nevertheless, there is utility. High level formulations enable optimizing compilers to do fancier things. They open up opportunities for parallelism. They aid reasoning about code. See query optimization for databases. Succinctness is surprising virtue in and of itself.

Aaron Hsu (who is an APL beast) said something that has stuck with me. APL has a reputation for being incredibly unscrutable. It uses characters you can’t type, each of which is a complex operation on arrays. It is the epitome of concision. A single word in APL is an entire subroutine. A single sentence is a program. He says that being able to fit your entire huge program on a single screen puts you in a different domain of memory and mindspace. That it is worth the inscrutability because once trained, you can hold everything in your extended mind at once. Sometimes I feel when I’m writing stuff down on paper that it is an extension of my mind, that it is part of my short term memory. So too the computer screen. I’m not on board for APL yet, but food for thought ya know?

Differences between the pure mathematical perspective of Linear Algebra, and the Applied/Numerical Linear Alegbra

I think there a couple conceptual points of disconnect between the purely mathematical conception of vector spaces and the applied numerical perspective.

First off, the numerical world is by and large focused on full rank square matrices. The canonical problem is solving the matrix equation Ax=b for the unknown vector x. If the matrix isn’t full rank or square, we find some way to make it square and full rank.

The mathematical world is more fixated on the concept of a vector subspace, which is a set of vectors.

It is actually extremely remarkable and I invite you for a moment to contemplate that a vector subspace over the real numbers is a very very big set. Completely infinite. And yet it is tractable because it is generated by only a finite number of vectors, which we can concretely manipulate.

Ok. Here’s another thing. I am perfectly willing to pretend unless I’m being extra careful that machine floats are real numbers. This makes some mathematicians vomit blood. I’ve seen it. Cody gave me quite a look. Floats upon closer inspection are not at all the mathematical real numbers though. They’re countable first off.

From a mathematical perspective, many people are interested in precise vector arithmetic, which requires going to somewhat unusual fields. Finite fields are discrete mathematical objects that just so happen to actually have a division operation like the rationals or reals. Quite the miracle. In pure mathematics they more often do linear algebra over these things rather than the rationals or reals.

The finite basis theorem. This was brought up in conversation as a basic result in linear algebra. I’m not sure I’d ever even heard of it. It is so far from my conceptualization of these things.

Monoidal Products

The direct sum of matrices is represented by taking the block diagonal. It is a monoidal product on FinVect. Monoidal products are binary operations on morphisms in a category that play nice with it in certain ways. For example, the direct sum of two identity matrices is also an identity matrix.

The kronecker product is another useful piece of FinVect. It is a second monoidal product on the catgeory FinVect It is useful for probability and quantum mechanics. When you take the pair of the pieces of state to make a combined state, you

    def par(f,g):
        ''' One choice of monoidal product is the direct sum '''
        r, c = f.shape
        rg, cg = g.shape
        return Vect(np.block( [ [f       ,           np.zeros((r,cg))]  ,
                                [np.zeros((rg,c))  , g              ]]  ))
    def par2(f,g):
        '''  another choice is the kroncker product'''
        return np.kron(f,g)

We think about row vectors as being matrices where the number of columns is 1 or column vectors as being matrices where the number of rows is 1. This can be considered as a mapping from/to the 1 dimensional vector. These morphisms are points.

The traditional focus of category theory in linear algebra has been on the kronecker product, string diagrams as quantum circuits/ penrose notation, and applications to quantum mechanics.

However, the direct sum structure and the limit/co-limit structures of FinVect are very interesting and more applicable to everyday engineering. I associate bringing more focus to this angle with John Baez’s group and his collaborators.

Anyway, that is enough blithering.

Computational Category Theory in Python I: Dictionaries for FinSet

Category theory is a mathematical theory with reputation for being very abstract.

Category theory is an algebraic theory of functions. It has the flavor of connecting up little pipes and ports that is reminiscent of dataflow languages or circuits, but with some hearty mathematical underpinnings.

So is this really applicable to programming at all? Yes, I think so.

Here’s one argument. Libraries present an interface to their users. One of the measures of the goodness or badness of an interface is how often you are inclined to peek under the hood to get it to do the thing that you need. Designing these interfaces is hard. Category theory has taken off as a field because it has been found to be a useful and uniform interface to a surprising variety of very different mathematics. I submit that it is at least plausible that software interfaces designed with tasteful mimicry of category theory may achieve similar uniformity across disparate software domains. This is epitomized for me in Conal Elliott’s Compiling to Categories.

I think it is easy to have the miscomprehension that a fancy language like Haskell or Agda is necessary to even begin writing software that encapsulates category theory based ideas, but this is simply not the case. I’ve been under this misapprehension before.

It just so happens that category theory is especially useful in those languages for explaining some programming patterns especially those concerning polymorphism. See Bartosz Milewski’s Category theory for Programmers.

But this is not the only way to use category theory.

There’s a really delightful book by Rydeheard and Burstall called Computational Category Theory. The first time I looked at it, I couldn’t make heads or tails of it, going on the double uphill battle of category theory and Standard ML. But looking at it now, it seems extremely straightforward and well presented. It’s a cookbook of how to build category theoretic interfaces for software.

So I think it is interesting to perform some translation of its concepts and style into python, the lingua franca of computing today.

In particular, there is a dual opportunity to both build a unified interface between some of the most commonly used powerful libraries in the python ecosystem and also use these implementations to help explain categorical concepts in concrete detail. I hope to have the attention span to do this following:

A very simple category is that of finite sets. The objects in the category can be represented by python sets. The morphisms can be represented by python dictionaries. Nothing abstract here. We can rip and tear these things apart any which way we please.

The manipulations are made even more pleasant by the python features of set and dictionary comprehension which will mimic the definitions you’ll find on the wikipedia page for these constructions quite nicely.

Composition is defined as making a new dictionary by feeding the output of the first dictionary into the second. The identity dictionary over a set is one that has the same values as keys. The definition of products and coproducts (disjoint union) are probably not too surprising.

One really interesting thing about the Rydeheard and Burstall presentation is noticing what are the inputs to these constructions and what are the outputs. Do you need to hand it objects? morphisms? How many? How can we represent the universal property? We do so by outputting functions that construct the required universal morphisms. They describe this is a kind of skolemization . The constructive programmatic presentation of the things is incredibly helpful to my understanding, and I hope it is to yours as well.

Here is a python class for FinSet. I’ve implemented a couple of interesting constructions, such as pullbacks and detecting monomorphisms and epimorphisms.

I’m launching you into the a deep end here if you have never seen category theory before (although goddamn does it get deeper). Do not be surprised if this doesn’t make that much sense. Try reading Rydeheard and Burstall chapter 3 and 4 first or other resources.

from collections import Counter
class FinSet():
def init(self, dom ,cod , f):
''' In order to specify a morphism, we need to give a python set that is the domain, a python set
that is the codomain, and a dictionary f that encodes a function between these two sets.
We could by assumption just use f.keys() implicitly as the domain, however the codomain is not inferable from just f.
In other categories that domain might not be either, so we chose to require both symmettrically.
assert( dom == set(f.keys())) # f has value for everything in domain
assert( all( [y in cod for y in f.value()] )) # f has only values in codomain
self.cod = cod
self.dom = dom
self.f = f
def __getitem__(self,i):
# a convenient overloading.
return self.f[i]
def compose(f,g):
''' Composition is function composition. Dictionary comprehension syntax for the win! '''
return FinSet( g.dom, f.cod, { x : f[g[x]] for x in g.dom })
def idd(dom):
''' The identity morphism on an object dom. A function mapping every x to itself'''
return FinSet(dom, dom, { x : x for x in dom})
def __equal__(f,g):
assert(f.dom == g.dom) # I choose to say the question of equality only makes sense if the arrows are parallel.
assert(f.cod == g.cod) # ie. they have the same object at head and tail
return f.f == g.f
def terminal(dom):
''' The terminal object is an object such that for any other object, there is a unique morphism
to the terminal object
This function returns the object itself {()} and the universal morphism from dom to that object'''
return {()} , FinSet(dom, {()} , {x : () for x in dom} )
def initial(cod):
''' The initial object is an object such that for any other object, there is a unique morphsm
from the initial object to that object.
It is the dual of the terminal object.
In FinSet, the initial object is the empty set set({}). The mapping is then an empty dictionary dict({})'''
return set({}) , FinSet(set({}), cod, dict({}))
def monic(self):
''' Returns bool of whether mapping is injective.
In other words, maps every incoming element to unique outgoing element.
In other words, does `self @ g == self @ f` imply `g == f` forall g,f
Counter class counters occurences'''
codomain_vals = self.f.values()
counts = Counter(codomain_vals).values() #
return all([count == 1 for count in counts]) # no elements map to same element
def epic(self):
''' is mapping surjective? In other words does the image of the map f cover the entire codomain '''
codomain_vals = self.f.keys()
return set(codomain_vals) == self.cod # cover the codomain
def product(a,b): # takes a sepcific product
ab = { (x,y) for x in a for y in b }
p1 = FinSet( ab, a, { (x,y) : x for (x,y) in ab } )
p2 = FinSet( ab, b, { (x,y) : y for (x,y) in ab } )
return ab, p1, p2 , lambda f,g: FinSet( f.dom, ab, { x : (f[x],g[x]) for x in f.dom } ) # assert f.dom == g.dom, f.cod == a, g.cod == b
def coproduct(a,b):
ab = { (0,x) for x in a }.union({ (1,y) for y in b })
i1 = FinSet( a, ab, { x : (0,x) for x in a } )
i2 = FinSet( b, ab, { y : (1,y) for y in b } )
def fanin(f,g):
return { (tag,x) : (f[x] if tag == 0 else g[x]) for (tag,x) in ab }
return ab, i1, i2, fanin
def equalizer(f,g):
''' The equalizer is a construction that allows one to talk about the solution to an equation in a categorical manner
An equation is f(x) = g(x). It has two mappings f and g that we want to somehow be the same. The solution to this equation
should be a subset of the shared domain of f and g. Subsets are described from within FinSet by maps that map into the
assert(f.dom == g.dom)
assert(f.cod == g.cod)
e = { x for x in f.dom if f[x] == g[x] }
return e, FinSet(e, f.dom, {x : x for x in e})
def pullback(f,g): # solutions to f(x) = g(y)
assert(f.cod == g.cod)
e = {(x,y) for x in f.dom for y in g.dom if f[x] == g[y]} # subset of (f.dom, g.dom) that solves equation
p1 = FinSet( e, f.dom, { (x,y) : x for (x,y) in e } ) # projection 1
p2 = FinSet( e, g.dom, { (x,y) : y for (x,y) in e } ) # projection 2
def univ(q1,q2):
''' Universal property: Given any other commuting square of f @ q1 == g @ q2, there is a unique morphism
that injects into e such that certain triangles commute. It's best to look at the diagram'''
assert(q1.cod == p1.cod) # q1 points to the head of p1
assert(q2.cod == p2.cod) # q2 points to the head of p2
assert(q1.dom == q2.dom) # tails of q1 and q2 are the same
assert(f @ q1 == g @ q2) # commuting square condition
return FinSet( q1.dom, e , { z : ( q1[z] , q2[z] ) for z in q1.dom } )
return e, p1, p2, univ
view raw hosted with ❤ by GitHub

Here’s some fun exercises (Ok. Truth time. It’s because I got lazy). Try to implement exponential and pushout for this category.

Rough Ideas on Categorical Combinators for Model Checking Petri Nets using Cvxpy

Petri nets are a framework for modeling dynamical systems that is very intuitive to some people. The vanilla version is that there are discrete tokens at nodes on a graph representing resources of some kind and tokens can be combined according to the firing of transition rules into new tokens in some other location.

This is a natural generalization of chemical reaction kinetics, where tokens are particular kinds of atoms that need to come together. It also is a useful notion for computer systems, where tokens represent some computational resource.

To me, this becomes rather similar to a flow problem or circuit problem. Tokens feel a bit like charge transitions are a bit like current (although not necessarily conservative). In a circuit, one can have such a small current that the particulate nature of electric current in terms of electrons is important. This happens for shot noise or for coulomb blockade for example.

If the number of tokens is very large, it seems intuitively sensible to me that one can well approximate the behavior by relaxing to a continuum. Circuits have discrete electrons and yet are very well approximated by ohm’s laws and the like. Populations are made of individuals, and yet in the thousands their dynamics are well described by differential equations.

It seems to me that mixed integer programming is a natural fit for this problem. Mixed integer programming has had its implementations and theory heavily refined for over 70 years so now very general purpose and performant off the shelf solvers are available. The way mixed integer programs are solved is by considering their quickly solved continuous relaxation (allowing fractional tokens and fractional transitions more akin to continuous electrical circuit flow) and using this information to systematically inform a discrete search process. This seems to me a reasonable starting approximation. Another name for petri nets is Vector Addition Systems, which has more of the matrix-y flavor.

We can encode a bounded model checking for reachability of a petri net into a mixed integer program fairly easily. We use 2-index variables, the first of which will be used for time step. We again turn to the general purpose functional way of encoding pointful dsls into pointfree ones as I have done here and here. The key point is that you need to be careful where you generate fresh variables. This is basically copy catting my posts here.

I’m like 70% sure what I did here makes sense, but I’m pretty sure the general idea makes sense with some fiddling.

T = 10 # total number of time steps as a global parameter
class PetriCat():
def compose(f,g):
def res():
a, b , fcon = f()
b1, c, gcon = g()
return a, c, fcon + gcon + [b == b1]
def idd():
def res()
x = cvx.Variable((T-1,1), integer = True)
return x, x, [x >= 0]
def par(f,g):
def res():
a, b , fcon = f()
c, d , gcon = g()
return cvx.vstack([a,c]), cvx.vstack([b,d]), fcon + gcon
return res
def weighted_block(wi, wo, wint):
def res():
(Na, Ni) = wi.shape # number inputs, actions
(Na1,No) = wo.shape
(Na2, Nint) = wint.shape
assert(Na == Na1)
assert(Na == Na2)
action = cvx.Variable((T-1, Na), integer=True)
internal = cvx.Variable((T, Nint), integer =True)
flowin = action @ wi
flowout = action @ wo
return flowin, flowout, [internal[1:,:] == internal[:-1,:] + action @ wint, action >= 0, internal >= 0]
return res
def run(f):
a, b, fcon = f()
prob = cvx.Problem(cvx.Minimize(1), fcon)
return a, b
# We need some way of specifying the initial and final states of things, Just more parameters to constructor functions I think
view raw hosted with ❤ by GitHub

The big piece is the weighted_block function. It let’s you build a combinator with an internal state and input and output flow variables. You give matrices with entries for every possible transition. Whether transitions occurred between t and t+1 is indicated by integer variables. There is also possible accumulation of tokens at nodes, which also requires integer variables. Perhaps we’d want to expose the token state of the nodes to the outside too?

Weighted block schematically looks something like this

We can also get out a graphical representation of the net by reinterpreting our program into GraphCat. This is part of the power of the categorical interface.

When I talked to Zach about this, he seemed skeptical that MIP solvers wouldn’t eat die a horrible death if you threw a moderately large petri net into them. Hard to say without trying.


There is an interesting analogy to be found with quantum field theory in that if you lift up to considering distributions of tokens, it looks like an occupation number representation. See Baez.

If you relax the integer constraint and the positivity constraints, this really is a finite difference formulation for capacitor circuits. The internal states would then be the charge of the capacitor. Would the positivity constraint be useful for diodes?

I wonder how relevant the chunky nature of petri nets might be for considering superconducting circuits, which have quantization of quantities from quantum mechanical effects.

Cvxpy let’s you describe convex regions. We can use this to implement a the convex subcategory of Rel which you can ask reachability questions. Relational division won’t work probably. I wonder if there is something fun there with respect the the integerizing operation and galois connections.

Edit: I should have googled a bit first. mathemtical programming tecchniques for petri net reachability. So it has been tried, and it sounds like the results weren’t insanely bad.

Categorical Combinators for Graphviz in Python

Graphviz is a graph visualization tool In Conal Elliott’s Compiling to categories, compiling code to its corresponding graphviz representation was one very compelling example. These graphs are very similar to the corresponding string diagram of the monoidal category expression, but they also look like boolean circuit diagrams. Besides in Conal Elliott’s Haskell implementation, there is an implementation in the Julia Catlab.jl project

I’ve basically implemented a toy version of a similar thing in python. It lets you do things like this

plus = GraphCat.block("+", ["a","b"], ["c"])
I = GraphCat.idd()
p1 = plus
p2 = p1 @ (p1 * p1)
p3 = p1 @ (p2 * p2)
p4 = p1 @ (p3 * p3)
d =
d.format = "png"
view raw hosted with ❤ by GitHub

Why python?

  • Python is the lingua franca of computing these days. Many people encounter it, even people whose main thing isn’t computers.
  • The python ecosystem is nutso good.
  • Julia is kind of an uphill battle for me. I’m fighting the battle ( ) , but I already know python pretty well. I can rip this out and move on.

What I did was implement some wrappers around the graphviz python package. It exposes a not very feature rich stateful interface. It is pretty nice that it prints the graphs inline in jupyter notebooks though.

The code is written in a style very similar (and hopefully overloadable with) to that of z3py relation algebra. . There is a fairly general cookbook method here for categorifying dsl. Since graphviz does not directly expose fresh node creation as far as I can tell, I made my own using a random number generator. The actual combinators are graphviz object processors, so we build up a giant functional chain of processors and then actually execute it with run. The inputs and outputs are represented by lists of node names. The is some design space to consider here.

I also had to use class based wrappers Based on the precedent set by the python 3 matrix multiplication operator @, I think it is a requirement that this also be used for category composition. id is a keyword or something in python unfortunately. For monoidal product, I feel like overloading power ** looks nice even if it is a nonsensical analogy, * is also not too bad. I went with * for now.

The graphviz graphs aren’t quite string diagrams. They make no promise to preserve the ordering of your operations, but they seem to tend to.

import random
import graphviz
class GraphCat():
GC = GraphCat
def __init__(self,res): # just hold the graphviz processor function
self.res = res
def fresh(n): # makes random numbers for fresh node labels
return list([str(random.randint(0,1e9)) for i in range(n)])
def idd(): # identity morhism. 1 input 1 output.
def res(g):
#nodes = GC.fresh(n)
node = GC.fresh(1)[0]
#for node in nodes:
g.node(node, shape="point")
return [node], [node]
return GraphCat(res)
def compose(g, f): # compose morphisms. Makes graphviz edges between node labels generated by g and f
f = f.res
g = g.res
def res(G):
a, b = f(G)
b1, c = g(G)
assert(len(b) == len(b1))
return a, c
return GraphCat(res)
def par(f, g): #monoidal product. Puts f and g in parallel
f = f.res
g = g.res
def res(G):
a, b = f(G)
c, d = g(G)
return a + c, b + d
return GraphCat(res)
def dump(): # dump deletes this edge with an empty circle
def res(G):
node = GC.fresh(1)[0]
G.node(node, shape="point", fillcolor="white")
return [node], []
return GraphCat(res)
def dup(): # duplicate this edge
def res(G):
node = GC.fresh(1)[0]
G.node(node, shape="point", fillcolor="green")
return [node], [node, node]
return GraphCat(res)
def converse(f): # turn input to output of this combinator
f = f.res
def res(G):
a, b = f(G)
return b, a
return GraphCat(res)
def named_simple(name): # a simple labeled 1 input 1 output object
def res(G):
node = GC.fresh(1)[0]
return [node], [node]
return GraphCat(res)
def block(name, inputs, outputs): # an object with labelled ports
inputs_label = [f"<{inp}> {inp}" for inp in inputs]
outputs_label = [f"<{outp}> {outp}" for outp in outputs]
#return # use graphviz to build block with label and n ports
def res(G):
node = GC.fresh(1)[0]
inputs1 = [f"{node}:{inp}" for inp in inputs]
outputs1 = [f"{node}:{outp}" for outp in outputs]
grphstring = "{ {" + " | ".join(inputs_label) + "} | " + name + " | {" + "|".join(outputs_label) + "} }"
G.node(node,grphstring, shape="record")
return inputs1, outputs1
return GC(res)
def fst(): # project out first par of tuple. semnatically equal to (id * dump)
return GraphCat.block("fst", ["a","b"], ["a1"])
def snd(): # dump * id
return GraphCat.block("fst", ["a","b"], ["b1"])
def run(self): # will run the object on a fresh graphviz object
dot = graphviz.Digraph()
return dot
def __matmul__(self, rhs): # matrix multiplication is a natural analog of composition
return GC.compose(self, rhs)
def __mul__(self, rhs): # monoidal product
return GC.par(self, rhs)
def __add__(self,rhs): # hmm. What should addition be? Join?
def const(label): # inject a constant in. A labelled node with 1 outport and no input
def res(G):
node = GraphCat.fresh(1)[0]
G.node(node, str(label))
return [], [node]
return GC(res)
def cup():
def res(G):
nodes = GraphCat.fresh(2)
for node in nodes:
G.node(node, shape="point")
return [], nodes
return GraphCat(res)
def cap():
return GraphCat.converse(GraphCat.cup())
view raw hosted with ❤ by GitHub

Here’s some example usage

cup = GraphCat.cup()
cap = GraphCat.cap()
d =  cap @ (I * I) @ cup  #(I * cap) @ (I * I * I) @ (cup * I)
d = plus @ (GC.const(1) * GC.const(2))
d =
GC = GraphCat
f = GraphCat.named_simple("f")
g = GraphCat.named_simple("g")
I = GraphCat.idd()
dump = GC.dump()
dup = GC.dup()
diagram = ((f * I) @ dup @ g @ (dump * I)  @ (I * f) @ (f * f)) * g

Class based overloading is the main paradigm of overloading in python. You overload a program into different categories, by making a program take in the appropriate category class as a parameter.

# by passing in different category classes, we can make polymorphic functions
# They have to have a uniform interface though, which is hard to constrain in python.
def polymorphic_prog(M):
    idd = M.idd()
    dump = M.dump()
    dup = M.dup()
    return (idd * dump) @ dup

For example something like this ought to work. Then you can get the graph of some matrix computation to go along with it’s numpy implementation

class FinVect(np.ndarray):

    def compose(f,g):
        return f @ g
    def idd(n):
        return np.eye(n)
    def par(f,g):
        return np.kron(f,g)
    def __mult__(self,rhs):
        return np.kron(f,g)
# and so on. 

Maybe outputting tikz is promising?

Stupid is as Stupid Does: Floating Point in Z3Py

Floating points are nice and all. You can get pretty far pretending they are actually numbers. But they don’t obey some mathematical properties that feel pretty obvious. A classic to glance through is “What Every Computer Scientist Should Know About Floating-Point Arithmetic”

We can check some properties with z3py. Here are a couple simple properties that succeed for mathematical integers and reals, but fail for floating point

from z3 import *
def numbery_proofs(sort):
x = Const("x", sort)
y = Const("y", sort)
z = Const("z", sort)
prove(x + y == y + x) #Commutativity
prove(((x + y) + z) == ((x + (y + z)))) #associativity
print("Additive Identity")
prove(x + 0 == x) # 0 additive identity
print("Multiplicative Identity")
prove(1 * x == x)
prove(Or(x > 0, x < 0, x == 0)) #trichotomy
prove(x * (y + z) == x * y + x * z)
print("Square positive")
prove(x * x >= 0)
print("Ints -----------------")
print("Reals ---------------")
''' Output
Ints -----------------
Additive Identity
Multiplicative Identity
Square positive
Reals ---------------
Additive Identity
Multiplicative Identity
Square positive
[z = -1.9375*(2**8),
y = 1.0009765625*(2**14),
x = 1.5224609375*(2**4)]
Additive Identity
[x = -0.0]
Multiplicative Identity
[x = NaN]
[z = -1.0029296875*(2**-2),
y = 1.01171875*(2**-10),
x = -1.4833984375*(2**8)]
Square positive
[x = NaN]
view raw hosted with ❤ by GitHub

I recently saw on twitter a reference to a Sylvie Boldo paper “Stupid is as Stupid Does: Taking the Square Root of the Square of a Floating-Point Number”.

In it, she uses FlocQ and Coq to prove a somewhat surprising result that the naive formula \sqrt{x^2} = |x| actually is correct for the right rounding mode of floating point, something I wouldn’t have guessed.

Z3 confirms for Float16. I can’t get Float32 to come back after even a day on a fairly beefy computer. If I use FPSort(ebits,sbits) rather than a standard size, it just comes back unknown, so i can’t really see where the cutoff size is. This does not bode well for checking properties of floating point in z3 in general. I think a brute force for loop check of 32 bit float properties is feasible. I might even be pretty fast. To some degree, if z3 is taking forever to find a counterexample, I wonder to what to degree the property is probably true.

If anyone has suggestions, I’m all ears.

from z3 import *
x = FP("x", Float16())
rm = RNE() # Rounding Nearest Ties to Even
x2 = fpMul(rm, x, x)
absx1 = fpSqrt(rm, x2)
absx2 = fpAbs(x)
s = Solver()
s.add( 0.01 <= x, x <= 100)
s.add( absx1 != absx2)
res = s.check()
if res == sat:
m = s.model()
view raw hosted with ❤ by GitHub

A Sketch of Categorical Relation Algebra Combinators in Z3Py

I’ve discussed relation algebra before. Relations are sets of tuples. There, I implemented the relations naively using lists for sets. This is very simple, and very clean especially with list comprehension syntax. It is however horrifically inefficient, and we could only deal with finitely enumerable domains. The easiest path to fixing these problems is to cash out to an external solver, in this case z3.

There are many beautifully implemented solvers out there and equally beautiful DSL/modeling languages. Examples in mind include sympy, cvxpy, and z3. These modeling languages require you to instantiate variable objects and build expressions out of them and then hand it off to the solver. This is a reasonable interface, but there are advantages to a more categorical/point-free style DSL.

Point-free languages are ones that do not include binding forms that introduce bound/dummy variables. Examples of binding forms like this are \lambda \sum \max \min \int \sup \lim \forall \exists. One problem lies in the fact that the names of bound variables don’t matter, and that they end up accidentally smashing into each other. You may have experienced this in physics or math class as the dummy indices or dummy variable problem causing you to screw up your calculation of some cross product identity or some complicated tensor sum. These are surprisingly subtle problems, very difficult to diagnose and get right. de Bruijn indices is a technique for giving the bound variables canonical names, but it sucks to implement in its own way. When you make a DSL point free, it is a joy to manipulate, optimize, and search. I think this may be the core of why category theory is good language for mathematics and programming.

Point-free style also tends to have significant economy of size, for better or worse. Sometimes it is better to have an expression very dense in information. This is important if you are about the algebraically manipulate an expression with paper and pencil. Every manipulation requires a great deal of mind numbing copying as you proceed line by line, so it can be excruciating if your notation has a lot of unnecessary bulk.

Relations are like functions. The two pieces of the tuple can be roughly thought of as the “input” and the “output”. Relations are only loosely directional though. Part of the point of relations is that the converse (inverse) of a relation is easy to define.

When we are implement relations, we have a choice. Do we want the relation to produce its variables, accept its variable, or accept one and produce the other? There are advantages to each. When relations were [(a,b)], a -> b -> Bool, and a -> [b] converting between these forms was a rather painful enumeration process. The sting of converting between them is taken out by the fact that the conversion is no longer a very computationally expensive process, since we’re working at the modeling layer.

When you’re converting a pointful DSL to pointfree DSL, you have to be careful where you instantiate fresh variables or else you’ll end up with secret relations that you didn’t intend. Every instantiation of id needs to be using fresh variables for example. You don’t want the different id talking to each other. Sometimes achieving this involves a little currying and/or thunking.

There is a pattern that I have notice when I’m using modeling languages. When you have a function or relation on variables, there are constraints produced that you have to keep a record of. The pythonic way is to have a Model or Solver object, and then have that objects mutate an internal record of the set of constraints. I don’t particularly enjoy this style though. It feels like too much boiler plate.

In Haskell, I would use something like a Writer monad to automatically record the constraints that are occurring. Monads are not really all that pleasant even in Haskell, and especially a no go in python without “do” syntax.

However, because we are going point free it is no extra cost at all to include this pipework along for the ride in the composition operation.

Here are implementations of the identity and composition for three different styles. Style 1 is fully receptive, style 2 is mixed (function feeling) and style 3 is fully productive of variables.

Fair warning, I’m being sketchy here. I haven’t really tried this stuff out.

def rid1(x,y): # a receptive relations. It receives variables
return x == y
def compose1(f, sort, g): # annoying but we need to supply the sort of the inner variable
def res(x,z):
y = FreshConst(sort)
return Exists([y], And(f(x,y), g(y,z)))
return res
def rid2(x): # a functional relation. It receives a variable then produces one.
y = FreshConst(x.sort())
return y, x == y
def compose2(f,g):
def res(x):
y, cf = f(x)
z, cg = g(y)
return z, Exists([y], And(cf,cg) )
def rid3(sort): # a type indexed generator of relations. Annoying we have to supply the type of the variable
def res(): # a productive relation
x = FreshConst(sort)
y = FreshConst(sort)
return x, y, x == y
return res
def compose3(f,g):
def res():
x, yf, cf = f()
yg, z, cg = g()
return x, z, Exists([yf,yg], And(cf, cg, yf == yg))
return res
view raw hosted with ❤ by GitHub

z3 is a simply typed language. You can get away with some polymorphism at the python level (for example the == dispatches correctly accord to the object) but sometimes you need to manually specify the sort of the variables. Given these types, the different styles are interconvertible

def lift12(sorty, f):
def res(x):
y = FreshConst(sorty)
c = f(x,y)
return y, c
return res
def lift23(sortx, f):
def res():
x = FreshConst(sortx)
y, c = f(x)
return x, y, c
return res
def lift31(f):
def r(x,y):
x1, y1, c = f()
return x1, y1, And(c, x1 == x, y1 == y)
return res
view raw hosted with ❤ by GitHub

We can create the general cadre of relation algebra operators. Here is a somewhat incomplete list

trivial = BoolVal(True)
def top1(x,y): # top is the full relation,
return trivial
def bottom1(x,y):
return BoolVal(False)
def top2(sorty):
def res(x):
y = FreshConst(sorty)
return y, trivial
return res
def top3(sortx, sorty):
def res():
x = FreshConst(sortx)
y = FreshConst(sorty)
return x, y, trivial
return res
def converse1(r):
return lambda x,y: r(y,x)
def meet1(p,q):
return lambda x,y : p(x,y) & q(x,y)
def join1(p,q):
return lambda x,y : p(x,y) | q(x,y)
# product and sum types
def fst1(x,y): # proj(0)
return y == x.sort().accessor(0,0)(x)
def snd1(x,y): # proj(1)
return y == x.sort().accessor(0,1)(x)
def left1(x,y):
return y == y.sort().constructor(0)(x)
def right1(x,y):
return y == y.sort().constructor(1)(x)
def inj1(n):
return lambda x, y: return y == y.sort().constructor(n)(x)
def proj1(n):
return lambda x, y: return y == x.sort().accessor(0,n)(x)
def fan(p,q):
def res(x,y):
a = y.sort().accessor(0,0)(y)
b = y.sort().accessor(0,1)(y)
return And(p(x,a), q(x,b))
return res
def dup1(x,(y1,y2)): # alternatively we may not want to internalize the tuple into z3.
return And(x == y1 , x == y2)
def convert_tuple((x,y), xy): # convert between internal z3 tuple and python tuple.
return xy == xy.constructor(0)(x,y)
#def split():
#def rdiv # relation division is so important, and yet I'm always too mentally exhausted to try and straighten it out
def itern(n, sortx, p): # unroll
if n == 0:
return rid1(sortx)
return compose(starn(n-1, sortx, p), p)
def starn(n, sortx, p): # unroll and join
if n == 0:
return rid1(sortx)
return join(rid, compose(starn(n-1, sortx, p))) # 1 + x * p
# more specialized operations than general puyrpose structural operators
def lte1(x,y):
return x <= y
def sum1(x,y): # I'm being hazy about what x is here exactly
return x[0] + x[1] == y
def npsum(x,y):
return np.sum(x) == y # you can make numpy arrays of z3 variables. Some vectorized functions work.
def mul1(x,y):
return x[0] * x[1] == y
def and1((x,y), z):
return z == And(x,y)
def If1((b,t,e),y):
return If(b, t,e) == y
view raw hosted with ❤ by GitHub

Questions about relation algebra expressions are often phrased in term of relational inclusion. You can construct a relation algebra expression, use the rsub in the appropriate way and ask z3’s prove function if it is true.

# relational properties
def rsub(p,q, sortx, sorty):
x = FreshConst(sortx)
y = FreshConst(sorty)
return ForAll([x,y].Implies(p(x,y) , q(x,y) ))
def req(p,q,sortx, sorty):
return And(rsub(p,q,sortx,sorty), rsub(q,p,sortx,sorty)
def symmetric1(sortx, sorty, r):
x = FreshConst(sortx)
y = FreshConst(sorty)
return ForAll([x,y], r(x,y) == r(y,x))
def reflexive1(sortx, r):
x = FreshConst(sortx)
return ForAll([x],r(x,x))
def transitive1(sortx,sorty,sortz, r):
x = FreshConst(sx)
y = FreshConst(sy)
ForAll([x,y], Implies(r(x,y) & r(y,z) , r(x,z))
def strict1(r,sortx):
x = FreshConst(sortx)
return Not(r(x,x))
view raw hosted with ❤ by GitHub

Z3 has solvers for

  • Combinatorial Relations
  • Linear Relations
  • Polyhedral Relations
  • Polynomial Relations
  • Interval Relations – A point I was confused on. I thought interval relations were not interesting. But I was interpetting the term incorrectly. I was thinking of relations on AxB that are constrained to take the form of a product of intervals. In this case, the choice of A has no effect on the possible B whatsoever, so it feels non relational. However, there is also I_A x I_B , relations over the intervals of A and B. This is much closer to what is actually being discussed in interval arithmetic.

Applications we can use this for:

  • Graph Problems. The Edges can be thought of as a relation between vertices. Relation composition Using the starn operator is a way to ask for paths through the graph.
  • Linear Relations – To some degree this might supplant my efforts on linear relations. Z3 is fully capable of understanding linear relations.
  • Safety and liveness of control systems. Again. a transition relation is natural here. It is conceivable that the state space can be heterogenous in time, which is the interesting power of the categorical style. I feel like traditional control systems usually maintain the same state space throughout.
  • Program verification
  • Games? Nash equilibria?

Other Thoughts

  • Maybe we are just building a shitty version of alloy.
  • What about uninterpeted relations? What about higher order relations? What about reflecting into a z3 ADT for a relational language. Then we could do relational program synthesis. This is one style, just hand everything off to smt.
  • I should try to comply with python conventions, in particular numpy and pandas conventions. @ should be composition for example, since relation composition has a lot of flavor of matrix composition. I should overload a lot of operators, but then I’d need to wrap in a class 🙁
  • Z3 has special support for some relations. How does that play in?
  • As long as you only use composition, there is a chaining of existentials, which really isn’t so bad.
  • What we’ve done here is basically analogous/identical to what John Wiegley did compiling to the category of z3. Slightly different in that he only allowed for existential composition rather than relational division.
  • We can reduced the burden on z3 if we know the constructive proof objects that witness our various operations. Z3 is gonna do better if we can tell it exactly which y witness the composition of operators, or clues to which branch of an Or it should use.
  • It’s a bummer, but when you use quantifiers, you don’t see countermodels? Maybe there is some hook where you can, or in the dump of the proof object.
  • What about recursion schemes? The exact nature of z3 to handle unbounded problems is fuzzy to me. It does have the support to define recursive functions. Also explicit induction predicates can go through sometimes. Maybe look at the Cata I made in fancy relaion algebra post
  • I think most proof assistants have implementations of relation algebra available. I find you can do a surprising amount in z3.

Stupid Z3Py Tricks: Verifying Sorting Networks off of Wikipedia

Sorting networks are a circuit flavored take on sorting. Although you can build circuits for any size input, any particular circuit works for a fixed sized input. They are like an unrolling of the loops or recursion of more familiar sorting algorithms. They come up also in the context of parallel and gpu sorting

Here’s an interesting thing. We can go to Wikipedia and get a little python snippet for the comparison order of a Batcher odd-even mergesort. Kind of a confusing algorithm. Why does it even work? Is it even right? It’s written in some kind of funky, indexful generator style.

Source: Wikipedia
#direct from
def oddeven_merge(lo: int, hi: int, r: int):
step = r * 2
if step < hi - lo:
yield from oddeven_merge(lo, hi, step)
yield from oddeven_merge(lo + r, hi, step)
yield from [(i, i + r) for i in range(lo + r, hi - r, step)]
yield (lo, lo + r)
def oddeven_merge_sort_range(lo: int, hi: int):
""" sort the part of x with indices between lo and hi.
Note: endpoints (lo and hi) are included.
if (hi - lo) >= 1:
# if there is more than one element, split the input
# down the middle and first sort the first and second
# half, followed by merging them.
mid = lo + ((hi - lo) // 2)
yield from oddeven_merge_sort_range(lo, mid)
yield from oddeven_merge_sort_range(mid + 1, hi)
yield from oddeven_merge(lo, hi, 1)
def oddeven_merge_sort(length: int):
""" "length" is the length of the list to be sorted.
Returns a list of pairs of indices starting with 0 """
yield from oddeven_merge_sort_range(0, length - 1)
def compare_and_swap(x, a, b) -> None:
if x[a] > x[b]:
x[a], x[b] = x[b], x[a]
view raw hosted with ❤ by GitHub

Well we can confirm this relatively straightforwardly using z3 by replacing the implementation of compare_and_swap with its z3 equivalent. We then ask z3 .

def compare_and_swap_z3(x,y):
x1, y2 = FreshInt(), FreshInt()
c = If(x <= y, And(x1 == x, y1 == y) , And(x1 == y, y1 == x) )
return x1, y1, c
N = 64 # size of sorting network
s = Solver()
a = [Int(f"x_{i}") for i in range(N)] #build initial array in z3 variables
pairs_to_compare = list(oddeven_merge_sort(N)) #get sequence of compare and swaps to use
a_orig = a.copy()
for i,j in pairs_to_compare:
x = a[i]
y = a[j]
x1,y1,c = compare_and_swap_z3(x,y)
a[i] = x1
a[j] = y1
def sorted_list(x): # list is sorted
return And([x <= y for x,y in zip(x , x[1:])])
def in_list(x,a): # x is in the list of a
return Or([x == y for y in a])
def sub_list(a, b): # all elements of a appear in b
return And([in_list(x,a) for x in b ])
def same_elems(a,b): # a contains the same elements as b
return And(sub_list(a,b), sub_list(b,a))
s.add(Not(And(sorted_pred(a), same_elems(a_orig,a) )))
view raw hosted with ❤ by GitHub

This comes back unsat, hence there are no inputs or executions that do not come back sorted. If I delete some elements from pair_to_compare, it comes back sat, showing that it doesn’t always sort.

The trick here is that the circuit is fixed size, so we have no need for induction, one of the main things z3 is rather finicky at.

It’s somewhat interesting to note that the output of odd_even_merge is a sequence of instructions, we can think of this as interpreting a very small 1 instruction programming language.

We can also confirm similarly a simple odd-even bubblesort and other similar algorithms.

def even_comp(l):
for x in [(i, i + 1) for i in range(0,l-1,2)]:
yield x
def odd_comp(l):
for x in [(i, i + 1) for i in range(1,l-1,2)]:
yield x
def even_odd(l, n):
for j in range(n):
for x in even_comp(l):
yield x
for x in odd_comp(l):
yield x
def bubble(l):
for x in even_odd(l,l//2):
yield x
view raw hosted with ❤ by GitHub

Q: What about using uninterpreted sorts rather than integers? Integers is pretty convincing to me.

same_elems is slightly weaker than a permutation predicate. Wasn’t super obvious to me the best way to do a permutation predicate in z3. Would I want to internalize the array?

Edit: Upon further thought, actually the sort IS a nice predicate for permutation. How do we compute if two things are permutations of each other? By sorting them and forcing a zipped equality. Alternatively count the number of each element (a piece of bucket sort). Since this sort is done by composing swaps, it is somewhat intrinsically a permutation

As a bummer though, I think randomized testing on arrays would be equally or perhaps more convincing of the correctness of the algorithm. Oh well.