MetaOCaml style Partial Evaluation in Coq

MetaOCaml is a very cool system for partial evaluation. I’m very jealous.

If one chooses to ignore the proof aspects of Coq for a moment, it becomes a bizarre Ocaml metaprogramming system on insane steroids. Coq has very powerful evaluation mechanisms built in. Why not use these to perform partial evaluation?

Let’s put some legs on this pig. Artwork courtesy of David

We had a really fun project at work where we did partial evaluation in Coq and I’ve been tinkering around with how to make the techniques we eventually stumbled onto there less ad hoc feeling.

A problem I encountered is that it is somewhat difficult to get controlled evaluation in Coq. The fastest evaluation tactics vm_compute and native_compute do not let you protect values from unfolding. There is a construct that is protected though. Axioms added to Coq cannot be unfolded by construction, but can be extracted. So I think a useful mantra here is axioms ~ code.

Require Import Extraction.
Axiom PCode : Type -> Type.
Extract Constant PCode "'a" => "'a".

Axiom block : forall {a : Type}, a -> PCode a.
Extract Inlined Constant block => "".

It is useful to mark what things you expect to run and which you expect to block execution in the type. You can mark everything that is opaque with a type PCode. block is a useful combinator. It is not a quote combinator however, as it will allow evaluation underneath of it. Nothing however will be able to inspect a blocked piece of code. It’s amusing that the blocking of computation of axioms is exactly what people dislike, but here it is the feature we desire. PCode is short for partial code indicating that it is possible for evaluation to occur within it. We’ll see a Code that is more similar to MetaOcaml’s later.

It is a touch fishy to extract block as nothing "" rather than an identity function (fun x -> x). It is rather cute though, and I suspect that you won’t find block occurring in a higher order context although I could easily be wrong (ahhh the sweet dark freedom of ignoring correctness). If this makes you queasy, you can put the function in, which is likely to compiled away, especially if you use the flambda switch. Not as pretty an output though.

We can play a similar extraction game with two HOAS-ish combinators.

Axiom ocaml_lam : forall {a b: Type}, (PCode a -> PCode b) -> PCode (a -> b).
Extract Inlined Constant ocaml_lam => "".

Axiom ocaml_app : forall {a b : Type},  PCode (a -> b) -> PCode a -> PCode b.
Extract Inlined Constant ocaml_app => "".

Here are some examples of other primitives we might add. The extra imports makes extraction turn nat into the native Ocaml int type, which is nice. It is not made so clear in the Coq manual that you should use these libraries to get good extraction of some standard types (perhaps I missed it or should make a pull request). You can find the full set of such things here: https://github.com/coq/coq/tree/master/theories/extraction

From Coq.extraction Require Import ExtrOcamlBasic ExtrOcamlNatInt.
Axiom ocaml_add : PCode nat -> PCode nat -> PCode nat.
Extract Inlined Constant ocaml_add => "(+)".
Axiom ocaml_mul : PCode nat -> PCode nat -> PCode nat.
Extract Inlined Constant ocaml_mul => "(*)".

If we had instead used the Coq definition of nat addition, it wouldn’t be protected during vm_compute, even if we wrapped it in block. It would unfold plus into it’s recursive definition, which is not what you want for extraction. We want to extract (+) to native ocaml (+).

You can add in other primitives as you see fit. Some things can get by merely using block, such as lifting literals.

Here is a very simplistic unrolling of a power function with a compile time known exponent, following Kiselyov’s lead.

Fixpoint pow1 (n : nat) (x : PCode nat) : PCode nat :=
  match n with
  | O => block 1
  | S O => x
  | S n' => ocaml_mul x (pow1 n' x)
  end.

Definition pow2 (n : nat) : PCode (nat -> nat) := ocaml_lam (fun x => pow1 n x).

Definition compilepow : PCode (nat -> nat) := Eval native_compute in pow2 4.
Extraction compilepow.
(*

(** val compilepow : (int -> int) pCode **)

let compilepow =
   (fun x -> (*) x ((*) x ((*) x x)))

*)

What about if you want a quasiquoting interface though? Well here is one suggestion. The same code should become either PCode or more ordinary Coq values depending on whether you decide to quote it or not. So you want overloadable syntax. This can be achieved via a typeclass.

(* No, I don't really know what Symantics means. Symbolic semantics? It's an Oleg-ism.
*)
Class Symantics (repr : Type -> Type) :=
  {
  lnat : nat -> repr nat;
  lbool : bool -> repr bool;
  lam : forall {a b}, (repr a -> repr b) -> repr (a -> b);
  app : forall {a b},  repr (a -> b) -> repr a -> repr b;
  add : repr nat -> repr nat -> repr nat;
  mul : repr nat -> repr nat -> repr nat
  }.
(* A simple do nothing newtype wrapper for the typeclass *)
Record R a := { unR : a }.
Arguments Build_R {a}.
Arguments unR {a}.
(* Would Definition R (a:Type) := a. be okay? *)

Instance regularsym : Symantics R :=
  {|
  lnat := Build_R;
  lbool := Build_R;
  lam := fun a b f => Build_R (fun x => unR (f (Build_R (a:= a) x)));
  app := fun _ _ f x => Build_R ((unR f) (unR x));
  add := fun x y => Build_R ((unR x) + (unR y));
  mul := fun x y => Build_R ((unR x) * (unR y));
  |}.


Instance codesym : Symantics PCode := 
  {|
  lnat := block;
  lbool := block;
  lam := fun a b => ocaml_lam (a := a) (b := b);
  app := fun a b => ocaml_app (a := a) (b := b);
  add := ocaml_add;
  mul := ocaml_mul
  |}.

Now we’ve overloaded the meaning of the base combinators. The type PCode vs R labels which “mode” of evaluation we’re in, “mode” being which typeclass instance we’re using. Here are two combinators for quasiquoting that were somewhat surprising to me, but so far seem to be working. quote takes a value of type a being evaluated in “Code mode” and makes it a value of type Code a being evaluated in “R mode”. And splice sort of undoes that. I would have used the MetaOcaml syntax, but using periods in the notation seemed to make coq not happy.

Definition Code : Type -> Type := fun a => R (PCode a).

Definition quote {a}  : PCode a -> Code  a := Build_R.
Definition splice {a} : Code a  -> PCode a := unR.

Declare Scope quote_scope.
Notation "<' x '>" := (quote x) : quote_scope.
Notation "<, x ,>" := (splice x) : quote_scope.

Notation "n + m" := (add n m) : quote_scope.
Notation "n * m" := (mul n m) : quote_scope.

Now you can take the same version of code, add quote/splice annotations and get a partially evaluated version. The thing doesn’t type check if you don’t add the appropriate annotations.

Open Scope quote_scope.

Fixpoint pow1' (n : nat) (x : Code nat) : Code nat :=
  match n with
  | O => quote (lnat 1)
  | S O => x
  | S n' => <' <, x ,> * <, pow1' n' x ,> '>
  end.

Definition pow2' (n : nat) : Code (nat -> nat) := <' lam (fun x => <, pow1' n <' x '> ,> ) '>.

Definition compilepow' : Code (nat -> nat) := Eval native_compute in pow2' 4.
Extraction compilepow'.

(* Same as before basically.
(** val compilepow' : (int -> int) code **)

let compilepow' =
   (fun x -> (*) x ((*) x ((*) x x)))
*)

Coolio.

Increasingly Scattered Thoughts

With more elbow grease is this actually workable? Do we actually save anything over explicit language modelling with data types? Are things actually hygienic and playing nice? Not sure.

We could also give notations to lam and the other combinators. Idiom brackets https://wiki.haskell.org/Idiom_brackets might be nice for app. I am a little queasy going overboard on notation. I generally speaking hate it when people do stuff like this.

Monad have something to do with partial evaluation. Moggi’s original paper on monads seems to have partial evaluation in mind.

(* This is moggi's let. *)
Axiom ocaml_bind : forall {a b}, PCode a -> (a -> PCode b) -> PCode b.
Extract Inlined Constant ocaml_bind => "(fun x f -> f x)".

Doing fix: playing nice with Coq’s fix restrictions is going to be a pain. Maybe just gas it up?

match statements might also suck to do in a dsl of the shown style. I supposed you’ll have to deal with everything via typeclass dispatched recursors / pattern matchers. Maybe one notation per data type? Or mostly stick to if then else and booleans. 🙁

Quote and splice can also be overloaded with another typeclass such that they just interpret completely into R or the appropriate purely functionally defined Gallina monad to emulate the appropriate effects of interest. This would be helpful for verification and development purposes as then the entire code can be proved with and evaluated in Coq.

Extracting arrays, mutable refs, for loops. All seems possible with some small inlined indirections that hopefully compile away. I’ve been finding godbolt.org interesting to look at to see what flambda can and can’t do.

Do I need to explcitly model a World token or an IO monad or is the Code paradigm already sufficiently careful about order of operations and such?

Some snippets


Extract Constant ref "'a" => "'a ref".
(* make_ref     =>     "ref*)
Axiom get_ref : forall a, ref a -> World -> a * World.
Extract Constant get_ref => "fun r _ -> (!r  ,())".
Axiom set_ref : forall a, ref a -> a -> World -> unit * World.
Extract Constant set_ref => "fun r x _ -> let () = r := x in (() , ())".


Axiom Array : Type -> Type.
Extract Constant Array "'a" => "'a array".

Axiom make : forall {a : Type}, Code nat -> Code a -> Code World -> Code (Array a  *  World).

Extract Constant make => "fun i def _ -> ( make i def , ())".
Axiom get : forall a, Array a -> nat -> World -> a * World.
Extract Constant get => "fun r i _ -> (r.(i)  ,())".
Axiom set : forall a, Array a -> nat -> a -> World -> unit * World.
Extract Constant set => "fun r i x _ -> let () = r.(i) <- x in (() , ())".

MetaOCaml is super cool. The quote splice way of building of the exact expressions you want feels nice and having the type system differentiate between Code and static values is very useful conceptually. It’s another instance where I feel like the types really aid the design process and clarify thinking. The types give you a compile time guarantee of what will and won’t happen.

There are other systems that do compile time stuff. Types themselves are compile time. Some languages have const types, which is pretty similar. Templates are also code generators. Macros.

Why Coq vs metaocaml?

  • MetaOcaml doesn’t have critical mass. Its ocaml switch lags behind the mainline. Coq seems more actively developed.
  • Possible verification and more powerful types (at your peril. Some may not extract nice)
  • One can go beyond purely generative metaprogamming since Ltac (and other techniques) can inspect terms.
  • Typeclasses
  • Can target more platforms. Haskell, Scheme, possibly C, fpgas?

However, Metaocaml does present a much more ergonomic, consistent, well founded interface for what it does.

One needs to have some protected structure in coq that represents a syntax tree of your intended ocaml expression. One natural choice would be a data type to represent this AST.

You also want access to possibly impure abilities of ocaml like mutation, errors, loops, arrays, and unbounded recursion that don’t have direct equivalents in base Gallina. You can model the purely functional versions of these things, but you don’t persay want to extract the purely functional versions if you’re seeking the ultimate speed.

Why Finally Tagless Style. Anything you can do finally taglessly you can do in initial style.

  • Positivity restrictions make some things difficult to express in Coq data types. You can turn these restrictions off, at your peril. Raw axiomatic fixpoints and HOAS without PHOAS become easier
  • Ultimately we need to build both a data type and an interpreter. A little bit using finally tagless cuts out the middle man. Why have a whole extra set of things to write?
  • Finally tagless style is open. You can add new capabilities without having to rewrite everything everywhere

Downsides

  • More confusing
  • Optimizations are harder
  • Is the verification story shot?

I guess ultimately there might not be a great reason. I just wandered into it. I was doing Kiselyov stuff, so other Kiselyov stuff was on the brain. I could make a DSL with Quote and Splice constructors.

Partial Evaluation Links

Typed Template Haskell gives you similar capabilities if that is more of your jam https://www.philipzucker.com/a-little-bloop-on-typed-template-haskell/

The MetaOcaml book – Kiselyov http://okmij.org/ftp/meta-programming/tutorial/index.html

http://okmij.org/ftp/tagless-final/JFP.pdf Finally Tagless, Partially Evaluated is a paper I come back to. This is both because the subject matter is interesting, it seems to hold insights, and that it is quite confusing and long. I think there are entangled objectives occurring, chronologically this may be an early exposition of finally tagless style, for which it is not the most pedagogical reference.

Jason Gross, Chlipala, others? . Coq partial evaluation. Once you go into plugin territory it’s a different game though. https://people.csail.mit.edu/jgross/personal-website/papers/2020-rewriting-popl-draft.pdf

Partial Evaluation book – Jones Sestoft Gomard https://www.itu.dk/people/sestoft/pebook/jonesgomardsestoft-a4.pdf

Nada Amin, Tiark Rompf. Two names to know https://scala-lms.github.io/ scala partial evaluation system https://www.youtube.com/watch?v=QuJ-cEvH_oI

Strymonas – a staged streaming library https://strymonas.github.io/

Algebraic staged parsing – https://www.cl.cam.ac.uk/~nk480/parsing.pdf

Nielson and Nielson – Two level functional languages book

https://dl.acm.org/doi/10.1145/141478.141483 – Improving binding times without explicit CPS-conversion. This bondorf paper is often cited as the why CPS helps partial evaluation paper

Modal logic. Davies and Pfenning. Was a hip topic. Their modal logic is something a bit like metaocaml. “Next” stage has some relationship to Next modal operator. Metaocaml as a proof language for intuitinisitc modal logic.

Partial evaluation vs optimizing compilers. It is known that CPS tends to allow the internals of compilers to make more optimizations. The obvious optimizations performed by a compiler often correspond to simple partial evaluations. Perhaps to get a feeling for where GHC gets blocked, playing around with an explicit partial evaluation system is useful.

An unrolled power in julia. It is unlikely I suspect that you want to use this technique to achieve performance goals. The Julia compiler itself is probably smarter than you unless you’ve got some real secret sauce.

Rando (or ARE THEY!?!) continuation links

continuations and partial evaluation are like jam and peanut butter.

I’ve been digging into the continuation literature a bit

William byrd call/cc tutorial https://www.youtube.com/watch?v=2GfFlfToBCo

Kenichi Asai – Delimitted continuations for everyone https://www.youtube.com/watch?v=QNM-njddhIw

Which of the many Danvy papers is most relevant

  • Defunctionalization and refunctionalization – Defunctionalize the continuation, see Jimmy’s talk https://dl.acm.org/doi/abs/10.1145/773184.773202 and
  • http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.6.2739&rep=rep1&type=pdf Continuation based partial evaluations 1995
  • https://dl.acm.org/doi/pdf/10.1145/91556.91622 1990 abstracting control
  • Abstract machines = Evaluators https://dl.acm.org/doi/pdf/10.1145/888251.888254 Functional correspondence 2003
  • https://www.researchgate.net/profile/Olivier_Danvy/publication/226671340_The_essence_of_eta-expansion_in_partial_evaluation/links/00b7d5399ecf37a658000000/The-essence-of-eta-expansion-in-partial-evaluation.pdf Essence of Eta expansion (1995) Reference of “the trick”
  • Representing control (1992) https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.46.84&rep=rep1&type=pdf Explains plotkin translation of CPS carefully. How to get other operators

Names to look out for: Dybvig, Felleison, Oleg, Filinksi, Asai, Danvy, Sabry

https://github.com/rain-1/continuations-study-group/wiki/Reading-List Great reading list

What are the most interesting Oleg sections.

CPS. Really this is converting a syntax tree of lambda calculus to one of another type. This other type can be lowered back down to lambda calculus.

Control constructs can fill in holes in the CPS translation.

Evaluation context. Contexts are terms with a single hole. Variables can also be used to show holes so therein lies some ocnfusion.

Abstract Machines

Ben pointed out that Node is in continuation passing style

To what degree are monads and continuations related? http://hjemmesider.diku.dk/~andrzej/papers/RM-abstract.html Mother of all monads.

Certainly error handling and escape are possible with continuation

Call-cc is as if the compiler converts to cps, and then call-cc grabs the continuation for you

call-cc allows you to kind of pull a program inside out. It’s weird.

Compiling to continuations book

Lisp in Small Pieces

CPSing a value x ~ \f -> f x.

https://homepages.inf.ed.ac.uk/wadler/papers/papers-we-love/reynolds-discoveries.pdf – The discoveries of continuations – Reynolds. Interesting bit of history about the discovery in the 60s/70s

Walk on Spheres Method in Julia

I saw a cool tweet (and corresponding conference paper) by Keenan Crane

http://www.cs.cmu.edu/~kmcrane/Projects/MonteCarloGeometryProcessing/index.html

I was vaguely aware that one can use a Monte Carlo method to solve the boundary value Laplace equation \nabla^2 \phi = 0 , but I don’t think I had seen the walk on spheres variant of it before. I think Crane’s point is how similar all this is to stuff graphics people already do and do well. It’s a super cool paper. Check it out.

Conceptually, I think it is plausible that the Laplace equation and a monte carlo walk are related because the static diffusion equation \nabla^2 n = 0 from Fick’s law ultimately comes from the brownian motion of little guys wobbling about from a microscopic perspective.

Slightly more abstractly, both linear differential equations and random walks can be describe by matrices, a finite difference matrix (for concreteness) K and a transition matrix of jump probabilities T. The differential equation is discretized to Kx=b and the stationary probability distribution is Tp=b, where b are sources and sinks at the boundary.

The mean value property of the Laplace equation allows one to speed this process up. Instead of having a ton of little walks, you can just walk out randomly sampling on the surface of big spheres. en.wikipedia.org/wiki/Walk-on-spheres_method. Alternatively you can think of it as eventually every random walk exits a sphere, and it is at a random spot on it.

So here’s the procedure. Pick a point you want the value of \phi at. Make the biggest sphere you can that stays in the domain. Pick a random point on the sphere. If that point is on the boundary, record that boundary value, otherwise iterate. Do this many many times, then the average value of the boundaries you recorded it the value of \phi

This seems like a good example for Julia use. It would be somewhat difficult to code this up efficiently in python using vectorized numpy primitives. Maybe in the future we could try parallelize or do this on the GPU? Monte carlo methods like these are quite parallelizable.

The solution of the 1-d Laplace equation is absolutely trivial. If the second derivative is 0, then $\phi = a + b x $. This line is found by fitting it to the two endpoint values.

So we’re gonna get a line out

using LinearAlgebra
avg = 0
phi0 = 0
phi1 = 10
x_0 = 0.75
function monte_run(x)
    while true
            l = rand(Bool) # go left?
            if (l && x <= 0.5) # finish at left edge 0
                return phi0
            elseif (!l && x >= 0.5) # finish at right edge 1
                return phi1
            else
                if x <= 0.5 # move away from 0
                    x += x
                else
                    x -= 1 - x # move away from 1
                end
            end
    end
end

monte_runs = [monte_run(x) for run_num =1:100, x=0:0.05:1 ]
import Statistics
avgs = vec(Statistics.mean( monte_runs , dims=1))
stddevs = vec(Statistics.std(monte_runs, dims=1)) ./ sqrt(size(monte_runs)[1]) # something like this right?

plot(0:0.05:1, avgs, yerror=stddevs)
plot!(0:0.05:1,  (0:0.05:1) * 10 )

And indeed we do.

You can do a very similar thing in 2d. Here I use the boundary values on a disc corresponding to x^2 – y^2 (which is a simple exact solution of the Laplace equation).



function monte_run_2d(phi_b, x)
    while true
            r = norm(x)
            if r > 0.95 # good enough
                return phi_b(x)
            else
                dr = 1.0 - r #assuming big radius of 1
                θ = 2 * pi * rand(Float64) #
                x[1] += dr * cos(θ)
                x[2] += dr * sin(θ)
            end
    end
end


monte_run_2d( x -> x[1],  [0.0 0.0] )


monte_runs = [monte_run_2d(x -> x[1]^2 - x[2]^2 ,  [x 0.0] ) for run_num =1:1000, x=0:0.05:1 ]

import Statistics
avgs = vec(Statistics.mean( monte_runs , dims=1))
stddevs = vec(Statistics.std(monte_runs, dims=1)) ./ sqrt(size(monte_runs)[1]) # something like this right?
plot(0:0.05:1, avgs, yerror=stddevs)
plot!(0:0.05:1,  (0:0.05:1) .^2 )

There’s more notes and derivations in my notebook here https://github.com/philzook58/thoughtbooks/blob/master/monte_carlo_integrate.ipynb

A Smattering of Physics in Sympy

Sympy is fun. I’ve been enjoying trying out some simple physics problems and seeing what kind of fun angles sympy brings to the table. It does pretty good on concrete problems, not so good at abstract derivations.

Kinematics

Ah such fond memories! In high school, I was taught by Ric Thompson “the big four”.

x_f = x_i + v_i t + \frac{1}{2} a t^2

v_f = v_i + a t

v_f^2 = v_i^2 + 2 a d

d = \frac{v_f + v_i}{2} t

The equations are of course, overcomplete. They are all implied by \frac{d^2}{dt^2}x = a, but even with only algebra then second two are derivable from the first two.

Of course a natural way of deriving the equations is to solve one equation for a variable and substitute it into the other equation. sympy makes this pretty dang easy.

from sympy import *
init_printing()
t,a,d,vf,vi = symbols("t a d vf vi")
e1 = Eq(d , vi * t + 1/2 * a * t ** 2)
tsub = solve(Eq(vf , vi + a * t),t)[0]
print(tsub) # This is assuming a is nonzero though.
expand(simplify(e1.subs(t,tsub)))
(vf - vi)/a
Eq(d, 0.5*vf**2/a - 0.5*vi**2/a)

However, there is a more automated approach.

It turns out that a decent chunk of physics equations are or can be well approximated by a system of polynomial equations. There are systematic methods that are guaranteed to solve the problem (albeit maybe not in the lifetime of the universe).

A grobner basis is an equivalent set of polynomial equations that has useful properties. For some simple purposes, all you need to know is that if you give the variables you want to eliminate first, the Groebner basis will contain equations without those variable. Here we specify t as one to eliminate, so we get an equation without t in it


G = groebner(  [vi * t + 1/2 * a * t ** 2 - d,  
                vi + a * t - vf] , 
                 t,vf,d,a,vi  )
for e in G:
    print(e)
-2.0*d + 1.0*t*vf + 1.0*t*vi
1.0*a*t - 1.0*vf + 1.0*vi
-2.0*a*d + 1.0*vf**2 - 1.0*vi**2

I’ve actually been pleasantly surprised at how many physics problems reduce ultimately to systems of polynomial constraints. Energy and momentum conservation are polynomial constraints (classical feynman diagrams kind of). Special relativity questions can be reduced to polynomial constraints using the proper time.

#elephant problem
# elephants give birth at 21 months. On a rocket at velocity v
# how long T until you see it give birth? 
tau , t1, t2, x1, v, c, T = symbols("tau t1 t2 c1 v c T")

eqs = [
    tau**2 - (t1**2 - x1**2 / c**2), # proper time
    x1 - v * t1, # distance away
    c * t2 - x1, # time for light to travel back
    T - t1 - t2, # total time
    tau - 21 # proper time is 21 months
    
]

groebner(eqs, tau , t1, t2, x1, v, T)

Lagrangian Mechanics

The Structure and Interpretation of Classical Mechanics is an interesting book.

It points out that notation we use is extremely imprecise and implicit. This is a source of great confusion.

A great benefit of programming up such examples is that it makes explicit (sometimes painfully so) steps that were implicit before.

In the Euler Lagrange equation, first partially differentiates considering q and \dot{q} to be independent parameters. Then a substitution is makde for a function x(t) and then we procede with a differentiation with respect to time.

# simple harmonic oscillator lagrangian style
m, k = symbols("m k", real = True, positive=True)
v, q = symbols("v q")
K = Rational(1,2) * m * v ** 2 #kinetic energy
V = Rational(1,2) * k * q ** 2 # potential energy
L =  K - V  #Lagrangian
F = diff(L,q) # force
p = diff(L,v) # momentum

x_ = Function("x")
t = symbols("t")

x = x_(t)

subst = { v : diff(x,t),
         q : x} # replacement to turn q into a function x

# euler-lagrange equations of motion
eq = F.subs( subst ) - diff( p.subs(subst)  , t )
dsolve(eq) # general solution cosine and sines

Here’s an analogous thing for a pendulum

#simple harmonic oscillator lagrangian style
m, g, L = symbols("m g L", real = True, positive=True)
theta, w = symbols("theta omega")
K = Rational(1,2) * m * (w * L) ** 2 #kinetic energy
V = - Rational(1,2) * m * g * L * cos(theta) # potential energy. angle is defined as 0 = hanging down
L =  K - V  #Lagrangian
F = diff(L,theta) # force
p = diff(L,w) # momentum
F
p

x_ = Function("theta")
t = symbols("t")

x = x_(t)

subst = { w : diff(x,t),
         theta : x} # replacement to turn q into a function x

# euler-lagrange equations of motion
eq = F.subs( subst ) - diff( p.subs(subst)  , t )
eq
#dsolve(eq) 

Another place where an implicit stated substitution is absolutely vital is in the Legendre transform going from the Lagrangian to the Hamiltonian.

# legendre transformation to hamiltonian
p = symbols( "p" )
H_ = p * v - L # hamiltonian but we haven't solved out v yet
v_of_pq = solve(diff(H_, v), v)[0] # set derivative to 0 to solve for v.
H = simplify(H_.subs(v, v_of_pq )) # substitue back in. Here is the actual hamiltonian
H

Statistical Mechanics

Sympy can do Gaussian integrals! How convenient. It can also do power series expansions. And differentiate. So it takes the drudgery out of some simple calculations

# ideal gas partition function
beta, m, V, N, kb, T  = symbols("beta m V N k_b T", real=True, positive=True)
p = symbols("p", real=True)
Z = integrate( exp( - beta * Rational(1,2) * p ** 2 / m ), (p,-oo,oo))**(3*N) * V**N #partition function
def avg_energy(Z):
    return - diff(ln(Z), beta).subs(beta, 1/ kb / T)
print(avg_energy(Z)) #
F = (-ln(Z) / beta).subs(beta, 1 / kb / T) #helmholtz free energy
S = diff(F , T) # sentropy is derivative of helmholtz wrt T
S # the functional dependence on T and V I think is correct
P = -diff(F , V) # pressure is - derivative of V
P
# Neato
# hamrmonic oscillator partition function
beta, m, k = symbols("beta m k ", real=True, positive=True)
p, x = symbols("p x", real=True)
E = R(1,2) * p ** 2 / m  + R(1,2) * k * x ** 2
Z = integrate( integrate( exp( - beta * E ), (p,-oo,oo)) , (x,-oo, oo))**N 
diff(-ln(Z),beta)

Perturbation theory of the partition function of an anharmonic oscillator. Pretty easy. It is interesting to note that this is the very simplest schematic of how perturbation theory can be approached for quantum field theory.

# pertubration theory of anharmonic oscillator
beta, m, k, g = symbols("beta m k g ", real=True, positive=True)
p, x = symbols("p x", real=True)
E = Rational(1,2) * ( p ** 2 / m  +  k * x ** 2) + g * x ** 4
series(exp( - beta * E ), g).removeO()
Z = integrate( integrate( series(exp( - beta * E ), g, n=2).removeO(), (p,-oo,oo)) , (x,-oo, oo))
simplify(diff(-ln(Z),beta)) #E
simplify(diff(-ln(Z),k)/beta) #<x**> 

Other things that might be interesing : 2 oscillators, A chain of oscillators, virial expansion

Thermo and Legendre Tranformations

Thermodynamics is a poorly communicated topic. Which variables remain in expressions and what things are held constant when differentiating are crucial and yet poorly communicated and the notation is trash. Sympy helps make some things explicit. It’s fun.

u,s,t,p,v,n,r = symbols("u s t p v n r")

du,ds,dt,dp,dv = symbols("du ds dt dp dv")
# taylor series in stuff?

e1 = p * v - n * r * t
e2 = u - Rational(3 , 2) * n * r * t

state = [  (u,du), (s,ds), (t,dt) , (p,dp) , (v,dv) ]

def differential(e):
    return sum( [ diff(e,x) * dx  for x,dx in state]   )


de1 = differential(e1 )
de2 = differential(e2 )

e3 = du - (t * ds - p * dv)

eqs = [e1,e2,de1,de2,e3]
print(eqs)
G = groebner( eqs, u , du,  t, dt, p, dp, v, dv,  ds )
for e in G:
    print(e)
R = Rational
U,S,T,P,V,N, k = symbols("U S T P V N k")

cv = R(3,2) * N * k
e1 = U - cv * T
e2 = P * V - N * k * T
e3 = S - cv * ln(T) + N * k * ln(V)

elim = [P,T]
Ps = solve([e1,e2,e3], P)
Ps
es = [ e.subs(Ps) for e in [e1,e2,e3] ]
Ts = solve(e3, T)[0]
es = [  e.subs(T,Ts) for e in es ]
Usv = solve(es[0],U)[0]
psv = diff(Usv,V)
tsv = diff( Usv , S )

#solve(es[0], V)

Hsv = Usv + P * V  # enthalpy and legendre trnasformation
Vps = solve(diff(Hsv, V) , V) 
H =  Hsv.subs(V, Vps[0]) 
simplify(H)

There are so many other things!

What about a Van Der Waals equation? Optics (geometrical and wave, paraxial ~ Schrodinger, fourier optics), GR (exterior derivatives ) , Quantum (wave matching problems. What can we do about hydrogen? WKB, QHE) rutherford scattering, Weiss mean field, canonical transformations, Rotations. Clebsh-Gordon coefficients

Computational Category Theory in Python III: Monoids, Groups, and Preorders

Parts 1 and 2 are found here and here

From one perspective, categories are just another algebraic structure, like groups, monoids and rings. They are these abstract things that have some abstract equational axioms and operations. They are the next stop on our magnificent category journey.

A monoid is a thing that has an associative operation with a unit. Addition and 0 make numbers a monoid. Multiplication and 1 are a separate monoid for numbers. Concatenation and empty lists make lists a monoid. Union and empty set make sets a monoid. We can encode this in python like so:

What does this have to do with categories? Well, if some thing is a category, it obeys the axioms that define what it means to be a category. It has morphisms and objects. The morphisms compose if head meets tail on an object. There are always identity morphism.

The morphisms in a category with 1 object automatically obey the monoid axioms. In this case, the category axioms imply the monoid axioms. Everything composes because there is only one object. It’s a kind of degenerate case where we are not using the partiality of the composition operator. There is automatically a unit for composition because the identity morphism is a unit. Composition is already required to be associative. Boom. The thing is a monoid.

Continuing with our representation from previous posts, we make a python class for each category. An instance of this class is a morphism in this category. If you ask for the domain or codomain of any morphism, you always get back () because it is a single object category. Compare these classes with the above classes.

Some monoids are also groups if there is a natural inverse operation. The integers are a group under addition where the negative gives you the inverse. Some aren’t though. The natural numbers (0,1,2…) aren’t a group under addition though.

Similarly groups can be thought of as a category with one object, with the additional requirement that every morphism is invertible, that there is always a f^{-1} such that f \circ f^{-1} = id.

Sympy has groups in it. We can make a wrapper of that functionality that looks like a categorical interface. To match our pattern of using python classes to represent categories, it is convenient to do the slightly uncommon thing of making a class definition generator function fp_group_cat. Every time you call this function, it makes a different class and a different category. I have only here wrapped the finitely presented group functionality, but there are also free groups, permutation groups, and named groups available in sympy.

Many objects, at most one arrow per pair: Preorders

We can simplify the power of a category in a different direction. Instead of having only 1 object, we’ll have few arrows.

A category with many objects but at most a single morphism between a pair of them obeys the axioms of a preorder. In categorical terminology this is sometimes called a thin category Any actual order like like \le on numbers is also a preorder, but preorders have slightly weaker requirements. Here is a categorical representation of the ordering on integers (although really the same implementation will work for any python type that implements <= and == )

An example of a partial order is the subset relationship, which we can represent using python sets. This is an important but perhaps confusing example. Haven’t we already defined FinSet? Yes, but these are different categories. In FinSet, morphisms are functions. In SubSetCat a morphisms is the subset relationship (of which there can either be one or not one). They just plain are not the same thing even though there are sets in the mix for both. The situation is made even more confusing by the fact that the subset relationship can be talked about indirectly inside FinSet using monic morphisms, which have as their image the subset of interest.

Preorders are related to directed acyclic graphs (DAG), the directed graphs that have no loops. If you give me a DAG, there is a preorder that is generated by that DAG. Exercise for the reader (AKA I’m lazy): Can you turn a Networkx DAG into a category?

Thoughts

This is nice and all just to explain categories in terms of some perhaps more familiar concepts. It feels a little ho-hum to me. We are not getting really any benefit from the concept of a category from this post. However, the examples of monoids, groups and preorders are always something you should think about when presented when a new categorical concept, because it probably reduces to something more familiar in this case. In addition, mappings to/from these simple objects to more complicated categories can be very interesting.

The methods of computational group theory are intriguing. It seems like some of them should extend to category theory. See this book by RFC Walters for example https://www.cambridge.org/core/books/categories-and-computer-science/203EBBEE29BEADB035C9DD80191E67B1 A very interesting book in other ways too. (Thanks to Evan Patterson for the tip)

Next time I think we’ll talk about finite categories and the finite Yoneda lemma.

Artwork courtesy of David

Edit: Hacker News discussion: https://news.ycombinator.com/item?id=23058551

Computational Category Theory in Python II: Numpy for FinVect

Linear algebra seems to be the place where any energy you put in to learning it seems to pay off massively in understanding other subjects and applications. It is the beating heart of numerical computing. I can’t find the words to overstate the importance of linear algebra.

Here’s some examples:

  • Least Squares Fitting – Goddamn is this one useful.
  • Quadratic optimization problems
  • Partial Differential Equations – Heat equations, electricity and magnetism, elasticity, fluid flow. Differential equations can be approximated as finite difference matrices acting on vectors representing the functions you’re solving for.
  • Linear Dynamical systems – Solving, frequency analysis, control, estimation, stability
  • Signals – Filtering, Fourier transforms
  • Quantum mechanics – Eigenvalues for energy, evolving in time, perturbation theory
  • Probability – Transition matrices, eigenvectors for steady state distributions.
  • Multidimensional Gaussian integrals – A canonical model in quantum mechanics and probability because they are solvable in closed form. Gaussian integrals are linear algebra in disguise. Their solution is describable in terms of the matrices and vectors in the exponent. More on this another day.

Where does category theory come in to this?

On one side, exploring what categorical constructions mean concretely and computationally in linear algebra land helps explain the category theory. I personally feel very comfortable with linear algebra. Matrices make me feel good and safe and warm and fuzzy. You may or may not feel the same way depending on your background.

In particular, understanding what the categorical notion of a pullback means in the context of matrices is the first time the concept clicked for me thanks to discussions with James Fairbanks and Evan Patterson.

But the other direction is important too. A categorical interface to numpy has the promise of making certain problems easier to express and solve. It gives new tools for thought and programming. The thing that seems the most enticing to me about the categorical approach to linear algebra is that it gives you a flexible language to discuss gluing together rectangular subpieces of a numerical linear algebra problem and it gives a high level algebra for manipulating this gluing. Down this road seems to be an actionable, applicable, computational, constructible example of open systems.

Given how important linear algebra is, given that I’ve been tinkering and solving problems (PDEs, fitting problems, control problems, boundary value problems, probabilistic dynamics, yada yada ) using numpy/scipy for 10 years now and given that I actually have a natural reluctance towards inscrutable mathematics for its own sake, I hope that lends some credence to when I say that there really is something here with this category theory business.

It frankly boggles my mind that these implementations aren’t available somewhere already! GAH!

Uh oh. I’m foaming. I need to take my pills now.

FinVect

The objects in the category FinVect are the vector spaces. We can represent a vector space by its dimensionality n (an integer). The morphisms are linear maps which are represented by numpy matrices. ndarray.shape basically tells you what are the domain and codomain of the morphism. We can get a lot of mileage by subclassing ndarray to make our FinVect morphisms. Composition is matrix multiplication (which is associative) and identity morphisms are identity matrices. We’ve checked our category theory boxes.

A part of the flavor of category theory comes from taking the focus away from the objects and putting focus on the morphisms.

One does not typically speak of the elements of a set, or subsets of a set in category theory. One takes the slight indirection of using the map whose image is that subset or the element in question when/if you need to talk about such things.

This actually makes a lot of sense from the perspective of numerical linear algebra. Matrices are concrete representations of linear maps. But also sometimes we use them as data structures for collections of vectors. When one wants to describe a vector subspace concretely, you can describe it either as the range of a matrix or the nullspace of a matrix. This is indeed describing a subset in terms of a mapping. In the case of the range, we are describing the subspace as all possible linear combinations of the columns \lambda_1 c_1 + \lambda_2 c_2 + ... . It is a matrix mapping from the space of parameters \lambda to the subspace (1 dimension for each generator vector / column). In the case of the nullspace it is a matrix mapping from the subspace to the space of constraints (1 dimension for each equation / row).

The injectivity or surjectivity of a matrix is easily detectable as a question about its rank. These kinds of morphisms are called monomorphisms and epimorphisms respectively. They are characterized by whether you can “divide” out by the morphism on the left or on the right. In linear algebra terms, whether there is a left or right inverse to these possibly rectangular, possibly ill-posed matrices. I personally can never remember which is which (surf/ing, left/right, mono/epi) without careful thought, but then again I’m an ape.

Some categorical constructions are very simple structural transformation that correspond to merely stacking matrices, shuffling elements, or taking transposes. The product and coproduct are examples of this. The product is an operation that takes in 2 objects and returns a new object, two projections \pi_1 \pi_2 and a function implementing the universal property that constructs f from f_1 f_2.

The diagram for a product

Here is the corresponding python program. The matrix e (called f in the diagram. Sorry about mixed conventions. ) is the unique morphism that makes those triangles commute, which is checked numerically in the assert statements.

The coproduct proceeds very similarly. Give it a shot. The coproduct is more similar to the product in FinVect than it is in FinSet.

The initial and terminal objects are 0 dimensional vector spaces. Again, these are more similar to each other in FinVect than they are in FinSet. A matrix with one dimension as 0 really is unique. You have no choice for entries.

Where the real meat and potatoes lives is in the pullback, pushout, equalizer, and co-equalizer. These are the categorical constructions that hold equation solving capabilities. There is a really nice explanation of the concept of a pullback and the other above constructions here .

Vector subspaces can be described as the range of the matrix or the nullspace of a matrix. These representations are dual to each other in some sense. RN=0. Converting from one representation to the other is a nontrivial operation.

In addition to thinking of these constructions as solving equations, you can also think of a pullback/equalizer as converting a nullspace representation of a subspace into a range representation of a subspace and the coequalizer/pushout as converting the range representation into a nullspace representation.

The actual heart of the computation lies in the scipy routine null_space and orth. Under the hood these use an SVD decomposition, which seems like the most reasonable numerical approach to questions about nullspaces. (An aside: nullspaces are not a very numerical question. The dimensionality of a nullspace of a collection of vectors is pretty sensitive to perturbations. This may or may not be an issue. Not sure. )

Let’s talk about how to implement the pullback. The input is the two morphisms f and g. The output is an object P, two projections p1 p2, and a universal property function that given q1 q2 constructs u. This all seems very similar to the product. The extra feature is that the squares are required to commute, which corresponds to the equation f p_1 = g p_2 and is checked in assert statements in the code. This is the equation that is being solved. Computationally this is done by embedding this equation into a nullspace calculation \begin{bmatrix} F & -G \end{bmatrix} \begin{bmatrix} x  \\ y \end{bmatrix} = 0. The universal morphism is calculated by projecting q1 and q2 onto the calculated orthogonal basis for the nullspace. Because q1 and q2 are required to be in a commuting square with f and g by hypothesis, their columns live in the nullspace of the FG stacked matrix. There is extra discussion with James and Evan and some nice derivations as mentioned before here

The equalizer, coequalizer, and pushout can all be calculated similarly. A nice exercise for the reader (AKA I’m lazy)!

Thoughts

I think there are already some things here for you to chew on. Certainly a lot of polish and filling out of the combinators is required.

I am acutely aware that I haven’t shown any of this being used. So I’ve only shown the side where the construction helps teach us category theory and not entirely fulfilled the lofty promises I set in the intro. I only have finite endurance. I’m sure the other direction, where this helps us formulate problems, will show up on this blog at some point. For what I’m thinking, it will be something like this post http://www.philipzucker.com/linear-relation-algebra-of-circuits-with-hmatrix/ but in a different pullback-y style. Mix together FinSet and FinVect. Something something decorated cospans? https://arxiv.org/abs/1609.05382

One important thing is we really need to extend this to affine maps rather than linear maps (affine maps allow an offset y = Ax + b. This is important for applications. The canonical linear algebra problem is Ax=b which we haven’t yet shown how to represent.

One common approach to embed the affine case in the linear case is to use homogenous coordinates. https://en.wikipedia.org/wiki/Homogeneous_coordinates.

Alternatively, we could make a new python class FinAff that just holds the b vector as a separate field. Which approach will be more elegant is not clear to me at the moment.

Another very enticing implementation on the horizon is a nice wrapper for compositionally calculating gaussian integrals + linear delta functions. Gaussian integrals + delta functions are solved by basically a minimization problem over the exponent. I believe this can be formulated by describing the linear subspace we are in as a span over the input and output variables, associating a quadratic form with the vertex of the span. You’ll see.

References

Blah Blah Blah

Whenever I write a post, I just let it flow, because I am entranced by the sound of my own keyboard clackin’. But it would deeply surprise me if you are as equally entranced, so I take sections out that are just musings and not really on the main point. So let’s toss em down here if you’re interested.

I like to draw little schematic matrices sometimes so I can visually see with dimensions match with which dimensions.

Making the objects just the dimension is a structural approach and you could make other choices. It may also make sense to not necessarily identify two vector spaces of the same dimensionality. It is nonsensical to consider a vector of dog’s nose qualities to be interchangeable with a vector of rocket ship just because they both have dimensionality 7.

High Level Linear Algebra

Linear algebra already has some powerful high level abstractions in common use.

Numpy indexing and broadcasting can sometimes be a little cryptic, but it is also very, very powerful. You gain both concision and speed.

Matrix notation is the most commonly used “pointfree” notation in the world. Indexful expressions can be very useful, but the calculus of matrices lets us use intuition built about algebraic manipulation of ordinary numbers to manipulate large systems of equations in a high level way. There are simple rules governing matrix inverse, transpose, addition, multiplication, identity.

Another powerful notion in linear algebra is that of block matrices. Block matrices are the standard high level notation to talk about subpieces of a numerical linear algebra problem. For example, you might be solving the heat equation on two hunks of metal attached at a joint. It is natural to consider this system in block form with the off diagonal blocks corresponding to the coupling of the two hunks. https://en.wikipedia.org/wiki/Domain_decomposition_methods

One does not typically speak of the elements of a set, or subsets of a set in category theory. One takes the slight indirection of using the map whose image is that subset or the element in question when/if you need to talk about such things. This pays off in a couple ways. There is a nice minimalism in that you don’t need a whole new entity (python class, data structure, what have you) when you already have morphisms lying around. More importantly though the algebraic properties of what it means to be an element or subset are more clearly stated and manipulated in this form. On the flipside, given that we often return to subset or element based thinking when we’re confused or explaining something to a beginner shows that I think it is a somewhat difficult game to play.

The analogy is that a beginner will often write for loops for a numpy calculation that an expert knows how to write more concisely and efficiently using broadcasting and vectorization. And sometimes the expert just can’t figure out how to vectorize some complicated construction and defeatedly writes the dirty feeling for loop.

What about in a language where the for loops are fast, like Julia? Then isn’t the for loop version just plain better, since any beginner can read and write it and it runs fast too? Yes, I think learning some high level notation or interface is a harder sell here. Nevertheless, there is utility. High level formulations enable optimizing compilers to do fancier things. They open up opportunities for parallelism. They aid reasoning about code. See query optimization for databases. Succinctness is surprising virtue in and of itself.

Aaron Hsu (who is an APL beast) said something that has stuck with me. APL has a reputation for being incredibly unscrutable. It uses characters you can’t type, each of which is a complex operation on arrays. It is the epitome of concision. A single word in APL is an entire subroutine. A single sentence is a program. He says that being able to fit your entire huge program on a single screen puts you in a different domain of memory and mindspace. That it is worth the inscrutability because once trained, you can hold everything in your extended mind at once. Sometimes I feel when I’m writing stuff down on paper that it is an extension of my mind, that it is part of my short term memory. So too the computer screen. I’m not on board for APL yet, but food for thought ya know?

Differences between the pure mathematical perspective of Linear Algebra, and the Applied/Numerical Linear Alegbra

I think there a couple conceptual points of disconnect between the purely mathematical conception of vector spaces and the applied numerical perspective.

First off, the numerical world is by and large focused on full rank square matrices. The canonical problem is solving the matrix equation Ax=b for the unknown vector x. If the matrix isn’t full rank or square, we find some way to make it square and full rank.

The mathematical world is more fixated on the concept of a vector subspace, which is a set of vectors.

It is actually extremely remarkable and I invite you for a moment to contemplate that a vector subspace over the real numbers is a very very big set. Completely infinite. And yet it is tractable because it is generated by only a finite number of vectors, which we can concretely manipulate.

Ok. Here’s another thing. I am perfectly willing to pretend unless I’m being extra careful that machine floats are real numbers. This makes some mathematicians vomit blood. I’ve seen it. Cody gave me quite a look. Floats upon closer inspection are not at all the mathematical real numbers though. They’re countable first off.

From a mathematical perspective, many people are interested in precise vector arithmetic, which requires going to somewhat unusual fields. Finite fields are discrete mathematical objects that just so happen to actually have a division operation like the rationals or reals. Quite the miracle. In pure mathematics they more often do linear algebra over these things rather than the rationals or reals.

The finite basis theorem. This was brought up in conversation as a basic result in linear algebra. I’m not sure I’d ever even heard of it. It is so far from my conceptualization of these things.

Monoidal Products

The direct sum of matrices is represented by taking the block diagonal. It is a monoidal product on FinVect. Monoidal products are binary operations on morphisms in a category that play nice with it in certain ways. For example, the direct sum of two identity matrices is also an identity matrix.

The kronecker product is another useful piece of FinVect. It is a second monoidal product on the catgeory FinVect It is useful for probability and quantum mechanics. When you take the pair of the pieces of state to make a combined state, you

    def par(f,g):
        ''' One choice of monoidal product is the direct sum '''
        r, c = f.shape
        rg, cg = g.shape
        return Vect(np.block( [ [f       ,           np.zeros((r,cg))]  ,
                                [np.zeros((rg,c))  , g              ]]  ))
    def par2(f,g):
        '''  another choice is the kroncker product'''
        return np.kron(f,g)

We think about row vectors as being matrices where the number of columns is 1 or column vectors as being matrices where the number of rows is 1. This can be considered as a mapping from/to the 1 dimensional vector. These morphisms are points.

The traditional focus of category theory in linear algebra has been on the kronecker product, string diagrams as quantum circuits/ penrose notation, and applications to quantum mechanics.

However, the direct sum structure and the limit/co-limit structures of FinVect are very interesting and more applicable to everyday engineering. I associate bringing more focus to this angle with John Baez’s group and his collaborators.

Anyway, that is enough blithering.

Computational Category Theory in Python I: Dictionaries for FinSet

Category theory is a mathematical theory with reputation for being very abstract.

Category theory is an algebraic theory of functions. It has the flavor of connecting up little pipes and ports that is reminiscent of dataflow languages or circuits, but with some hearty mathematical underpinnings.

So is this really applicable to programming at all? Yes, I think so.

Here’s one argument. Libraries present an interface to their users. One of the measures of the goodness or badness of an interface is how often you are inclined to peek under the hood to get it to do the thing that you need. Designing these interfaces is hard. Category theory has taken off as a field because it has been found to be a useful and uniform interface to a surprising variety of very different mathematics. I submit that it is at least plausible that software interfaces designed with tasteful mimicry of category theory may achieve similar uniformity across disparate software domains. This is epitomized for me in Conal Elliott’s Compiling to Categories. http://conal.net/papers/compiling-to-categories/

I think it is easy to have the miscomprehension that a fancy language like Haskell or Agda is necessary to even begin writing software that encapsulates category theory based ideas, but this is simply not the case. I’ve been under this misapprehension before.

It just so happens that category theory is especially useful in those languages for explaining some programming patterns especially those concerning polymorphism. See Bartosz Milewski’s Category theory for Programmers.

But this is not the only way to use category theory.

There’s a really delightful book by Rydeheard and Burstall called Computational Category Theory. The first time I looked at it, I couldn’t make heads or tails of it, going on the double uphill battle of category theory and Standard ML. But looking at it now, it seems extremely straightforward and well presented. It’s a cookbook of how to build category theoretic interfaces for software.

So I think it is interesting to perform some translation of its concepts and style into python, the lingua franca of computing today.

In particular, there is a dual opportunity to both build a unified interface between some of the most commonly used powerful libraries in the python ecosystem and also use these implementations to help explain categorical concepts in concrete detail. I hope to have the attention span to do this following:

A very simple category is that of finite sets. The objects in the category can be represented by python sets. The morphisms can be represented by python dictionaries. Nothing abstract here. We can rip and tear these things apart any which way we please.

The manipulations are made even more pleasant by the python features of set and dictionary comprehension which will mimic the definitions you’ll find on the wikipedia page for these constructions quite nicely.

Composition is defined as making a new dictionary by feeding the output of the first dictionary into the second. The identity dictionary over a set is one that has the same values as keys. The definition of products and coproducts (disjoint union) are probably not too surprising.

One really interesting thing about the Rydeheard and Burstall presentation is noticing what are the inputs to these constructions and what are the outputs. Do you need to hand it objects? morphisms? How many? How can we represent the universal property? We do so by outputting functions that construct the required universal morphisms. They describe this is a kind of skolemization . The constructive programmatic presentation of the things is incredibly helpful to my understanding, and I hope it is to yours as well.

Here is a python class for FinSet. I’ve implemented a couple of interesting constructions, such as pullbacks and detecting monomorphisms and epimorphisms.

I’m launching you into the a deep end here if you have never seen category theory before (although goddamn does it get deeper). Do not be surprised if this doesn’t make that much sense. Try reading Rydeheard and Burstall chapter 3 and 4 first or other resources.

Here’s some fun exercises (Ok. Truth time. It’s because I got lazy). Try to implement exponential and pushout for this category.

Uniform Continuity is Kind of Like a Lens

A really interesting topic is exact real arithmetic. It turns out, there are systematic ways of calculating numerical results with arbitrarily fine accuracy.

In practice this is not used much as it is complicated and slow.

There are deep waters here.

The problem is made rather difficult by the fact that you can’t compute real numbers strictly, you have to in some sense compute better and better finite approximations.

One way of doing this is to compute a stream of arbitrarily good approximations. If someone needs a better approximation than you’ve already given, they pop the next one off.

Streams give you some inverted control flow. They allow the results to pull on the input, going against the grain of the ordinary direction of computation. If you are interested in a final result of a certain accuracy, they seem somewhat inefficient. You have to search for the right amount to pull the incoming streams, and the intermediate computations may not be helpful.

Haskell chews infinite lists up for breakfast, so it’s a convenient place for such things https://wiki.haskell.org/Exact_real_arithmetic https://hackage.haskell.org/package/exact-real

A related but slightly different set of methods comes in the form of interval arithmetic. Interval arithmetic also gives precise statements of accuracy, maintain bounds of the accuracy as a number is carried along

Interval arithmetic is very much like forward mode differentiation. In forward mode differentiation, you compute on dual numbers (x,dx) and carry along the derivatives as you go.

type ForwardMode x dx y dy = (x,dx) -> (y,dy)
type IntervalFun x delx y dely = (x,delx) -> (y, dely)

Conceptually, differentiation and these validated bounds are connected as well. They are both telling you something about how the function is behaving nearby. The derivative is mostly meaningful at exactly the point it is evaluated. It is extremely local. The verified bounds being carried along are sort of a very principled finite difference approximation.

But reverse mode differentiation is often where it is at. This is the algorithm that drives deep learning. Reverse mode differentiation can be modeled functionally as a kind of lens. http://www.philipzucker.com/reverse-mode-differentiation-is-kind-of-like-a-lens-ii/ . The thing that makes reverse mode confusing is the backward pass. This is also inverted control flow, where the output pushes information to the input. The Lens structure does this too

type Lens s t a b = s -> (a, b -> t)

It carrier a function that goes in the reverse direction which are being composed in the opposite direction of ordinary control flow. These functions are the “setters” in the ordinary usage of the Lens, but they are the backproppers for differentiation.

By analogy one might try

type RealF x delta y epsilon = Lens x delta y epsilon = x -> (y, epsilon -> delta)

There is something pleasing here compared to interval arithmetic in that the output epsilon drives the input delta. The second function is kind of a Skolemized \delta(\epsilon) from the definition of continuity.

Although it kind of makes sense, there is something unsatisfying about this. How do you compute the x -> y? You already need to know the accuracy before you can make this function?

So it seems to me that actually a better definition is

type RealF x delta y epsilon = Lens epsilon y delta x  = epsilon -> (delta, x -> y)

This type surprised me and is rather nice in many respects. It let’s you actually calculate x -> y, has that lazy pull based feel without infinite streams, and has delta as a function of epsilon.

I have heard, although don’t understand, that uniform continuity is the more constructive definition (see constructive analysis by Bridger) https://en.wikipedia.org/wiki/Uniform_continuity This definition seems to match that.

In addition we are able to use approximations of the actual function if we know the accuracy it needs to be computed to. For example, given we know we need 0.01 accuracy of the output, we know we only need 0.009 accuracy in the input and we only need the x term of a Taylor series of sine (the total inaccuracy of the input and the inaccuracy of our approximation of sine combine to give total inaccuracy of output). If we know the needed accuracy allows it, we can work with fast floating point operations. If we need better we can switch over to mpfr, etc.

This seems nice for MetaOcaml staging or other compile time macro techniques. If the epsilon required is known at compile time, it makes sense to me that one could use MetaOcaml to produce fast unrolled code. In addition, if you know the needed accuracy you can switch between methods and avoid the runtime overhead. The stream based approach seems to have a lot of context switching and perhaps unnecessary intermediate computations. It isn’t as bad as it seems, since these intermediate computations are usually necessary to compute anyhow, but still.

We can play the same monoidal category games with these lenses as ever. We can use dup, par, add, mul, sin, cos etc. and wire things up in diagrams and what have you.

This might be a nice type for use in a theorem prover. The Lens type combined with the appropriate properties that the intervals go to zero and stay consistent for arbitrary epsilon seems like enough? { Realf | something something something}

Relation to Backwards error analysis?

Does this have nice properties like backprop when on high dimensional inputs? That’s where backprop really shines, high to low dimensional functions

Computing Syzygy Modules in Sympy

Reading about the methods of computational algebra is really compelling to me because some domains that seem like natural fits

I used to have no idea that multivariate polynomial equations had guaranteed methods that in some sense solve those systems. It’s pretty cool.

However, when I was pouring over the two Cox Little O’shea volumes, the chapter on modules made my eyes glaze over. Who ordered that? From my perspective, modules are vector spaces where you cripple the ability to divide scalars. Fair enough, but the language is extremely confusing and off-putting. Syzygy? Free Resolution? Everything described as homomorphisms and exact sequences? Forget it. Why do this? This is too abstract.

So I’ve been on the lookout for some application to motivate them. And I think I have at least one. Capacitor Inductor circuits.

A pure resistive circuit can be treated by linear algebra. The voltages and currents are connected by linear relations. http://www.philipzucker.com/linear-relation-algebra-of-circuits-with-hmatrix/

The common way to describe inductor capacitors circuits is by using phasor analysis, where the resistances become impedances which have a frequency parameter in them. I’m not entirely convinced that it isn’t acceptable to just use linear algebra over rational functions of the frequency, but I have some reason to believe more carefulness regarding division may bear fruit. I suspect that carefulness with division corresponds to always sticky issues of boundary conditions.

On a slightly different front, I was very impressed by Jan Willems Open Dynamical systems. https://homes.esat.kuleuven.be/~sistawww/smc/jwillems/Articles/JournalArticles/2007.1.pdf In it, he talks about differential equations as describing sets of possible trajectories of systems. He uses module theory as a way to manipulate those sets and conditions from module theory to describe interesting qualitative features like controllability of those systems.

He sticks to the tools of Hermite and Smith forms of matrices, as he is mostly interested in single variable polynomials as the ring in question. Here’s my issues

  1. I’m not really familiar with these forms
  2. I can’t find good implementations of these. Perhaps here https://desr.readthedocs.io/en/latest/index.html (Differential Equation Symmetry Reduction), which seems like an interesting project for other reasons as well. Maybe I’m a fool, but I’d like to stick to python for the moment.
  3. I also have an inkling that modules over multivariate polynomials will come in handy for playing around with band theory (or partial different equations for that matter). Maybe something interesting to be said regarding topological materials?

It seems like Groebner basis techniques should acceptably solve these systems as well. Converting between the analog of range and nullspace representations as I did in my previous post corresponds to syzygy calculations in the terminology of modules

Sympy does supply a Groebner basis algorithm, but not much beyond that. The AGCA module that should supply calculations over modules is mostly a lie. The documentation lists many functions that are not implemented. Which is too bad.

However, you can can hack in syzygy calculation into a Groebner basis calculation. I started pouring over chapter 5 of Using Algebra again, and it really has everything you need.

When one converts a set of polynomials to a Groebner basis, one is getting an equivalent set of polynomials with excellent properties. A Groebner basis is an analog of reduced echelon form of a matrix (these rows are equivalent to the old rows), and Buchberger’s algorithm is an analog of gaussian elimination. https://mattpap.github.io/masters-thesis/html/src/groebner.html#special-case-1-gauss-algorithm . You can find a decomposition of a polynomial in your ideal by a multivariate division algorithm with respect to the Groebner basis. This is the analog of the ability of an upper triangular matrix to be easily inverted.

You can perform a number of tricks by adding in dummy variables to the Groebner basis algorithm. The first thing you can do with this sort of trick is track how to write the Groebner basis in terms of the original basis. This is the analog of working with an augmented matrix during gaussian elimination. https://en.wikipedia.org/wiki/Augmented_matrix

I found this Maple documentation helpful in this regard (although formatted horrifically)

https://www.maplesoft.com/support/help/Maple/view.aspx?path=Groebner%2fBasis_details

We want to track a matrix A that writes the Groebner basis vector G to the original vector of polynomials F. G = AF. We do it by attaching the each generator f of F a fresh marker variable f + m. Then the coefficients on m in the extended Groebner basis correspond to the matrix A. Think about it.

The other direction matrix can be found via the reduction algorithm with respect to the Grobner basis F = BG . This is pretty straightforward given that sympy implemented reduction for us.

From these we determine that

G = GBA
F = FAB

Finding the syzygies of a set of generators is the analog of finding a nullspace of a matrix. A syzygy is a set of coefficients to “dot” onto the generators and get zero. In linear algebra talk, they are sort of orthogonal to the generator set.

The ability to find a nullspace gives you a lot of juice. One can phrase many problems, including solving a Ax=b system of equations as a nullspace finding problem.

Proposition 3.3 of Using Algebra tells us how to calculate the generators of a syzygy module for a Groebner basis. It’s a little strange. The S-polynomial of two generators from the basis is zero after reduction by the basis. The S-polynomial plus the reduction = 0 gives us a very interesting identity on the generators (a syzygy) and it turns out that actually these generate all possible syzygies. This is still not obvious to me but the book does explain it.

Proposition 3.8 of Using Algebra tells us how to get the syzygies of the original generators given the previous information. We map back the generators of G and append the columns I – AB also

I – AB columns are syzygys of F. F (I – AB) = F – FAB = F- F = 0 using the equation from above F = FAB

I’m still trying to figure out how to do calculations on modules proper. I think it can be done be using dummy variables to turn module vectors into single expressions. There is an exercise in Using Algebra that mentions this.

Grobner basis reference suggestions:

Categorical Combinators for Convex Optimization and Model Predictive Control using Cvxpy

We’re gonna pump this well until someone MAKES me stop.

This particular example is something that I’ve been trying to figure out for a long time, and I am pleasantly surprised at how simple it all seems to be. The key difference with my previous abortive attempts is that I’m not attempting the heavy computational lifting myself.

We can take pointful DSLs and convert them into point-free category theory inspired interface. In this case a very excellent pointful dsl for convex optimization: cvxpy. Some similar and related posts converting dsls to categorical form

A convex optimization problem optimizes a convex objective function with constraints that define a convex set like polytopes or balls. They are polynomial time tractable and shockingly useful. We can make a category out of convex optimization problems. We consider some variables to be “input” and some to be “output”. This choice is somewhat arbitrary as is the case for many “relation” feeling things that aren’t really so rigidly oriented.

Convex programming problems do have a natural notion of composition. Check out the last chapter of Rockafeller, where he talks about the convex algebra of bifunctions. Instead of summing over the inner composition variable like in Vect \sum_j A_{ij}B_{jk}, or existentializing like in Rel \{ (a,c) |\exists b. (a,b)\in A, (b,c) \in B \}, we instead minimize over the inner composition variable min_y A(x,y) + B(y,z). These are similar operations in that they all produce bound variables.

The identity morphism is just the simple constraint that the input variables equal to output variables with an objective function of 0. This is an affine constraint, hence convex.

In general, if we ignore the objective part entirely by just setting it to zero, we’re actually working in a very computationally useful subcategory of Rel, ConvexRel, the category of relations which are convex sets. Composition corresponds to an existential operation, which is quickly solvable by convex optimization techniques. In operations research terms, these are feasibility problems rather than optimization problems. Many of the combinators do nothing to the objective.

The monoidal product just stacks variables side by side and adds the objectives and combines the constraints. They really are still independent problems. Writing things in this way opens up a possibility for parallelism. More on that some other day.

We can code this all up in python with little combinators that return the input, output, objective, constraintlist. We need to hide these in inner lambdas for appropriate fresh generation of variables.

Now for a somewhat more concrete example: Model Predictive control of an unstable (linearized) pendulum.

Model predictive control is where you solve an optimization problem of the finite time rollout of a control system online. In other words, you take measurement of the current state, update the constraint in an optimization problem, ask the solver to solve it, and then apply the force or controls that the solver says is the best.

This gives the advantage over the LQR controller in that you can set hard inequality bounds on total force available, or positions where you wish to allow the thing to go. You don’t want your system crashing into some wall or falling over some cliff for example. These really are useful constraints in practice. You can also include possibly time dependent aspects of the dynamics of your system, possibly helping you model nonlinear dynamics of your system.

I have more posts on model predictive control here http://www.philipzucker.com/model-predictive-control-of-cartpole-in-openai-gym-using-osqp/ http://www.philipzucker.com/flappy-bird-as-a-mixed-integer-program/

Here we model the unstable point of a pendulum and ask the controller to find forces to balance the pendulum.

We can interpret the controller in GraphCat in order to produce a diagram of the 10 step lookahead controller. This is an advantage of the categorical style a la compiling to categories

We can also actually run it in model predictive control configuration in simulation.

And some curves. How bout that.

Bits and Bobbles

LazySets https://github.com/JuliaReach/LazySets.jl

ADMM – It’s a “lens”. I’m pretty sure I know how to do it pointfree. Let’s do it next time.

The minimization can be bubbled out to the top is we are always minimizing. If we mix in maximization, then we can’t and we’re working on a more difficult problem. This is similar to what happens in Rel when you have relational division, which is a kind of universal quantification \forall . Mixed quantifier problems in general are very tough. These kinds of problems include games, synthesis, and robustness. More on this some other day.

Convex-Concave programming minimax https://web.stanford.edu/~boyd/papers/pdf/dccp_cdc.pdf https://web.stanford.edu/class/ee364b/lectures/cvxccv.pdf

The minimization operation can be related to the summation operation by the method of steepest descent in some cases. The method of steepest descent approximates a sum or integral by evaluating it at it’s most dominant position and expanding out from there, hence converts a linear algebra thing into an optimization problem. Examples include the connection between statistical mechanics and thermodynamics and classical mechanics and quantum mechanics.

Legendre Transformation ~ Laplace Transformation via steepest descent https://en.wikipedia.org/wiki/Convex_conjugate yada yada, all kinds of good stuff.

Intersection is easy. Join/union is harder. Does MIP help?

def meet(f,g):
   def res():
      x,y,o,c = f()
      x1,y1,o1,c1 = g()
      return x,y,o+o1, c+ c1 + [x ==  x1, y == y1]
   return res

Quantifier elimination

MIP via adjunction

Naive Synthesis of Sorting Networks using Z3Py

As a simple extension of verifying the sorting networks from before, we can synthesize optimally small sorting networks. The “program” of the sorting network is specified by a list of tuples of the elements we wish to compare and swap in order. We just generate all possible sequences of comparison operations and ask z3 to try verifying. If z3 says it verifies, we’re done.

Here are some definitions for running the thing

and here is a simple generating thing for all possible pairs.

As is, this is astoundingly slow. Truly truly abysmally slow. The combinatorics of really naively search through this space is abysmal. I doubt you’re going to get more than a network of size 6 out of this as is.

Some possible optimizations: early pruning of bad networks with testing, avoiding ever looking at obviously bad networks. Maybe a randomized search might be faster if one doesn’t care about optimality. We could also ask z3 to produce networks.

For more on program synthesis, check out Nadia Polikarpova’s sick course here.

https://github.com/nadia-polikarpova/cse291-program-synthesis