Computing Syzygy Modules in Sympy

Reading about the methods of computational algebra is really compelling to me because some domains that seem like natural fits

I used to have no idea that multivariate polynomial equations had guaranteed methods that in some sense solve those systems. It’s pretty cool.

However, when I was pouring over the two Cox Little O’shea volumes, the chapter on modules made my eyes glaze over. Who ordered that? From my perspective, modules are vector spaces where you cripple the ability to divide scalars. Fair enough, but the language is extremely confusing and off-putting. Syzygy? Free Resolution? Everything described as homomorphisms and exact sequences? Forget it. Why do this? This is too abstract.

So I’ve been on the lookout for some application to motivate them. And I think I have at least one. Capacitor Inductor circuits.

A pure resistive circuit can be treated by linear algebra. The voltages and currents are connected by linear relations. http://www.philipzucker.com/linear-relation-algebra-of-circuits-with-hmatrix/

The common way to describe inductor capacitors circuits is by using phasor analysis, where the resistances become impedances which have a frequency parameter in them. I’m not entirely convinced that it isn’t acceptable to just use linear algebra over rational functions of the frequency, but I have some reason to believe more carefulness regarding division may bear fruit. I suspect that carefulness with division corresponds to always sticky issues of boundary conditions.

On a slightly different front, I was very impressed by Jan Willems Open Dynamical systems. https://homes.esat.kuleuven.be/~sistawww/smc/jwillems/Articles/JournalArticles/2007.1.pdf In it, he talks about differential equations as describing sets of possible trajectories of systems. He uses module theory as a way to manipulate those sets and conditions from module theory to describe interesting qualitative features like controllability of those systems.

He sticks to the tools of Hermite and Smith forms of matrices, as he is mostly interested in single variable polynomials as the ring in question. Here’s my issues

  1. I’m not really familiar with these forms
  2. I can’t find good implementations of these. Perhaps here https://desr.readthedocs.io/en/latest/index.html (Differential Equation Symmetry Reduction), which seems like an interesting project for other reasons as well. Maybe I’m a fool, but I’d like to stick to python for the moment.
  3. I also have an inkling that modules over multivariate polynomials will come in handy for playing around with band theory (or partial different equations for that matter). Maybe something interesting to be said regarding topological materials?

It seems like Groebner basis techniques should acceptably solve these systems as well. Converting between the analog of range and nullspace representations as I did in my previous post corresponds to syzygy calculations in the terminology of modules

Sympy does supply a Groebner basis algorithm, but not much beyond that. The AGCA module that should supply calculations over modules is mostly a lie. The documentation lists many functions that are not implemented. Which is too bad.

However, you can can hack in syzygy calculation into a Groebner basis calculation. I started pouring over chapter 5 of Using Algebra again, and it really has everything you need.

When one converts a set of polynomials to a Groebner basis, one is getting an equivalent set of polynomials with excellent properties. A Groebner basis is an analog of reduced echelon form of a matrix (these rows are equivalent to the old rows), and Buchberger’s algorithm is an analog of gaussian elimination. https://mattpap.github.io/masters-thesis/html/src/groebner.html#special-case-1-gauss-algorithm . You can find a decomposition of a polynomial in your ideal by a multivariate division algorithm with respect to the Groebner basis. This is the analog of the ability of an upper triangular matrix to be easily inverted.

You can perform a number of tricks by adding in dummy variables to the Groebner basis algorithm. The first thing you can do with this sort of trick is track how to write the Groebner basis in terms of the original basis. This is the analog of working with an augmented matrix during gaussian elimination. https://en.wikipedia.org/wiki/Augmented_matrix

I found this Maple documentation helpful in this regard (although formatted horrifically)

https://www.maplesoft.com/support/help/Maple/view.aspx?path=Groebner%2fBasis_details

We want to track a matrix A that writes the Groebner basis vector G to the original vector of polynomials F. G = AF. We do it by attaching the each generator f of F a fresh marker variable f + m. Then the coefficients on m in the extended Groebner basis correspond to the matrix A. Think about it.

The other direction matrix can be found via the reduction algorithm with respect to the Grobner basis F = BG . This is pretty straightforward given that sympy implemented reduction for us.

From these we determine that

G = GBA
F = FAB

Finding the syzygies of a set of generators is the analog of finding a nullspace of a matrix. A syzygy is a set of coefficients to “dot” onto the generators and get zero. In linear algebra talk, they are sort of orthogonal to the generator set.

The ability to find a nullspace gives you a lot of juice. One can phrase many problems, including solving a Ax=b system of equations as a nullspace finding problem.

Proposition 3.3 of Using Algebra tells us how to calculate the generators of a syzygy module for a Groebner basis. It’s a little strange. The S-polynomial of two generators from the basis is zero after reduction by the basis. The S-polynomial plus the reduction = 0 gives us a very interesting identity on the generators (a syzygy) and it turns out that actually these generate all possible syzygies. This is still not obvious to me but the book does explain it.

import sympy as sy
def spoly(f,g,*gens):
ltf = sy.LT(f,gens)
ltg = sy.LT(g,gens)
lcm = sy.lcm(ltf,ltg)
s = lcm / ltf * f - lcm / ltg * g
return s
#grobner tracking. Maintaining the relation of the grobner basis to the original
def extended_groebner(F, *gens):
n = len(F)
markers = [sy.Dummy() for i in range(n)]
Fext = [ f + a for a, f in zip(markers, F)]
gen_ext = list(gens) + markers
Gext = sy.groebner(Fext, gen_ext)
A = [[g.coeff(m) for m in markers ] for g in Gext]
G = [ sy.simplify(g - sum( [ m * aa for m,aa in zip(markers, a) ] )) for a,g in zip(A,Gext) ] #remove dummy parts
assert( sy.simplify(sy.Matrix(G) - sy.Matrix(A) * sy.Matrix(F)).is_zero )
# maybe assert buchberger criterion
return G, A
def reduce_basis(F,G,*gens):
B,rems = list(zip(*[sy.reduced(f,G, gens) for f in F]))
print(B)
print(rems)
assert( all([r == 0 for r in rems] )) # assuming G is a grobner basis
assert( sy.simplify(sy.Matrix(F) - sy.Matrix(B) * sy.Matrix(G)).is_zero )
return B
# generators for the syzygies of G. Schreyer's Theorem. Cox Little Oshea Theorem 3.2 chapter 5
def syzygy(G,*gens): # assuming G is groebner basis
n = len(G)
basis = []
I = sy.eye(n)
for i, gi in enumerate(G):
for j, gj in enumerate(G):
if i < j:
s = spoly(gi,gj,*gens)
print(s)
a,r = sy.reduced( s , G, gens )
assert(r == 0) # should be groebner basis
lti = sy.LT(gi,gens)
ltj = sy.LT(gj,gens)
lcm = sy.lcm(lti,ltj)
ei = I[:,i]
ej = I[:,j]
basis.append(lcm / lti * ei - lcm / ltj * ej - sy.Matrix(a))
assert( sy.simplify(sy.Matrix(G).T * basis[-1]).is_zero) # should all null out on G
return basis
x,y,z,s = sy.symbols("x y z s")
F = [x+y+z, x*y+y*z+z*x, x*y*z-1]
G, A = extended_groebner(F, x,y,z)
B = reduce_basis(F,G,x,y,z)
Gsyz = syzygy(G)
view raw syzygy.py hosted with ❤ by GitHub

Proposition 3.8 of Using Algebra tells us how to get the syzygies of the original generators given the previous information. We map back the generators of G and append the columns I – AB also

I – AB columns are syzygys of F. F (I – AB) = F – FAB = F- F = 0 using the equation from above F = FAB

I’m still trying to figure out how to do calculations on modules proper. I think it can be done be using dummy variables to turn module vectors into single expressions. There is an exercise in Using Algebra that mentions this.

def matrix_to_eqs(m):
nrows, ncols = m.shape
gens = [sy.Dummy() for i in range(ncols)]
eqs = m @ sy.Matrix(gens)
return eqs, gens
def eqs_to_matrix(eqns, gens):
return sy.Matrix( [[ eq.coeff(g) for g in gens] for eq in eqns])
view raw matrix.py hosted with ❤ by GitHub

Grobner basis reference suggestions:

Categorical Combinators for Convex Optimization and Model Predictive Control using Cvxpy

We’re gonna pump this well until someone MAKES me stop.

This particular example is something that I’ve been trying to figure out for a long time, and I am pleasantly surprised at how simple it all seems to be. The key difference with my previous abortive attempts is that I’m not attempting the heavy computational lifting myself.

We can take pointful DSLs and convert them into point-free category theory inspired interface. In this case a very excellent pointful dsl for convex optimization: cvxpy. Some similar and related posts converting dsls to categorical form

A convex optimization problem optimizes a convex objective function with constraints that define a convex set like polytopes or balls. They are polynomial time tractable and shockingly useful. We can make a category out of convex optimization problems. We consider some variables to be “input” and some to be “output”. This choice is somewhat arbitrary as is the case for many “relation” feeling things that aren’t really so rigidly oriented.

Convex programming problems do have a natural notion of composition. Check out the last chapter of Rockafeller, where he talks about the convex algebra of bifunctions. Instead of summing over the inner composition variable like in Vect \sum_j A_{ij}B_{jk}, or existentializing like in Rel \{ (a,c) |\exists b. (a,b)\in A, (b,c) \in B \}, we instead minimize over the inner composition variable min_y A(x,y) + B(y,z). These are similar operations in that they all produce bound variables.

The identity morphism is just the simple constraint that the input variables equal to output variables with an objective function of 0. This is an affine constraint, hence convex.

In general, if we ignore the objective part entirely by just setting it to zero, we’re actually working in a very computationally useful subcategory of Rel, ConvexRel, the category of relations which are convex sets. Composition corresponds to an existential operation, which is quickly solvable by convex optimization techniques. In operations research terms, these are feasibility problems rather than optimization problems. Many of the combinators do nothing to the objective.

The monoidal product just stacks variables side by side and adds the objectives and combines the constraints. They really are still independent problems. Writing things in this way opens up a possibility for parallelism. More on that some other day.

We can code this all up in python with little combinators that return the input, output, objective, constraintlist. We need to hide these in inner lambdas for appropriate fresh generation of variables.

import cvxpy as cvx
import matplotlib.pyplot as plt
class ConvexCat():
def __init__(self,res):
self.res = res
def idd(n):
def res():
x = cvx.Variable(n)
return x, x, 0, []
return ConvexCat(res)
def par(f,g):
def res():
x,y,o1, c1 = f()
z,w,o2, c2 = g()
a = cvx.hstack((x,z))
b = cvx.hstack((y,w))
return a , b , o1 + o2, c1 + c2 + [a == a, b == b] # sigh. Don't ask. Alright. FINE. cvxpyp needs hstacks registered to populate them with values
return ConvexCat(res)
def compose(g,f):
def res():
x,y,o1, c1 = f()
y1,z,o2, c2 = g()
return x , z, o1 + o2, c1 + c2 + [y == y1]
return ConvexCat(res)
def dup(n):
def res():
x = cvx.Variable(n)
return x, cvx.vstack(x,x) , 0, []
return ConvexCat(res)
def __call__(self):
return self.res()
def run(self):
x, y, o, c = self.res()
prob = cvx.Problem(cvx.Minimize(o),c)
sol = prob.solve()
return sol, x, y, o
def __matmul__(self,g):
return self.compose(g)
def __mul__(self,g):
return self.par(g)
def const(v):
def res():
x = cvx.Variable(1) # hmm. cvxpy doesn't like 0d variables. That's too bad
return x, x, 0, [x == v]
return ConvexCat(res)
def converse(f):
def res():
a,b,o,c = f()
return b, a, o, c
return ConvexCat(res)
def add(n):
def res():
x = cvx.Variable(n)
return x, cvx.sum(x), 0, []
return ConvexCat(res)
def smul(s,n):
def res():
x = cvx.Variable(n)
return x, s * x, 0, []
return ConvexCat(res)
def pos(n):
def res():
x = cvx.Variable(n, positive=True)
return x, x, 0 , []
return ConvexCat(res)
# And many many more!
view raw convexcat.py hosted with ❤ by GitHub

Now for a somewhat more concrete example: Model Predictive control of an unstable (linearized) pendulum.

Model predictive control is where you solve an optimization problem of the finite time rollout of a control system online. In other words, you take measurement of the current state, update the constraint in an optimization problem, ask the solver to solve it, and then apply the force or controls that the solver says is the best.

This gives the advantage over the LQR controller in that you can set hard inequality bounds on total force available, or positions where you wish to allow the thing to go. You don’t want your system crashing into some wall or falling over some cliff for example. These really are useful constraints in practice. You can also include possibly time dependent aspects of the dynamics of your system, possibly helping you model nonlinear dynamics of your system.

I have more posts on model predictive control here http://www.philipzucker.com/model-predictive-control-of-cartpole-in-openai-gym-using-osqp/ http://www.philipzucker.com/flappy-bird-as-a-mixed-integer-program/

Here we model the unstable point of a pendulum and ask the controller to find forces to balance the pendulum.

def controller(Cat,pend_step, pos, vel):
acc = Cat.idd(3)
for i in range(10):
acc = acc @ pend_step
init = Cat.const(pos) * Cat.const(vel) * Cat.idd(1)
return acc @ init
view raw controller.py hosted with ❤ by GitHub

We can interpret the controller in GraphCat in order to produce a diagram of the 10 step lookahead controller. This is an advantage of the categorical style a la compiling to categories

from graphcat import *
GC = GraphCat
pend_step = GC.block("pend_step", ["x(t)", "v(t)", "f(t)"],["x(t+1)", "v(t+1)", "f(t+1)"])
pos = 0.5
vel = 0
prob = controller(GraphCat, pend_step, pos, vel)
prob.run()
view raw diagram.py hosted with ❤ by GitHub

We can also actually run it in model predictive control configuration in simulation.

CC = ConvexCat
idd = CC.idd
def pend_step_res():
dt = 0.1
x = cvx.Variable(2)
xnext = cvx.Variable(2)
f = cvx.Variable(1)
pos = x[0]
vel = x[1]
posnext = xnext[0]
velnext = xnext[1]
c = []
c += [posnext == pos + vel*dt] # position update
c += [velnext == vel + (pos + f )* dt] # velocity update
c += [-1 <= f, f <= 1] # force constraint
c += [-1 <= posnext, posnext <= 1] # safety constraint
c += [-1 <= pos, pos <= 1]
obj = pos**2 + posnext**2 # + 0.1 * f**2
z = cvx.Variable(1)
c += [z == 0]
return cvx.hstack((x , f)) , cvx.hstack((xnext,z)), obj, c
pend_step = ConvexCat(pend_step_res)
pos = 0.5
vel = 0
poss = []
vels = []
fs = []
dt = 0.1
'''
# we could get this to run faster if we used cvxpy params instead of recompiling the controller every time.
# some other day
p_pos = cvx.Param(1)
p_vel = cvx.Param(1)
p_pos.value = pos
p_pos.value = pos
'''
for j in range(30):
prob = controller(ConvexCat, pend_step, pos, vel)
_, x ,y ,_ = prob.run()
print(x.value[2])
#print(dir(x))
f = x.value[2]
#print(f)
# actually update the state
pos = pos + vel * dt
vel = vel + (pos + f )* dt
poss.append(pos)
vels.append(vel)
fs.append(f)
plt.plot(poss,label="pos")
plt.plot(vels, label="vel")
plt.plot(fs, label="force")
plt.legend()
view raw mpc.py hosted with ❤ by GitHub
And some curves. How bout that.

Bits and Bobbles

LazySets https://github.com/JuliaReach/LazySets.jl

ADMM – It’s a “lens”. I’m pretty sure I know how to do it pointfree. Let’s do it next time.

The minimization can be bubbled out to the top is we are always minimizing. If we mix in maximization, then we can’t and we’re working on a more difficult problem. This is similar to what happens in Rel when you have relational division, which is a kind of universal quantification \forall . Mixed quantifier problems in general are very tough. These kinds of problems include games, synthesis, and robustness. More on this some other day.

Convex-Concave programming minimax https://web.stanford.edu/~boyd/papers/pdf/dccp_cdc.pdf https://web.stanford.edu/class/ee364b/lectures/cvxccv.pdf

The minimization operation can be related to the summation operation by the method of steepest descent in some cases. The method of steepest descent approximates a sum or integral by evaluating it at it’s most dominant position and expanding out from there, hence converts a linear algebra thing into an optimization problem. Examples include the connection between statistical mechanics and thermodynamics and classical mechanics and quantum mechanics.

Legendre Transformation ~ Laplace Transformation via steepest descent https://en.wikipedia.org/wiki/Convex_conjugate yada yada, all kinds of good stuff.

Intersection is easy. Join/union is harder. Does MIP help?

def meet(f,g):
   def res():
      x,y,o,c = f()
      x1,y1,o1,c1 = g()
      return x,y,o+o1, c+ c1 + [x ==  x1, y == y1]
   return res

Quantifier elimination

MIP via adjunction

Naive Synthesis of Sorting Networks using Z3Py

As a simple extension of verifying the sorting networks from before, we can synthesize optimally small sorting networks. The “program” of the sorting network is specified by a list of tuples of the elements we wish to compare and swap in order. We just generate all possible sequences of comparison operations and ask z3 to try verifying. If z3 says it verifies, we’re done.

Here are some definitions for running the thing

from z3 import *
def compare_and_swap_z3(x,y):
x1, y1 = FreshInt(), FreshInt()
c = If(x <= y, And(x1 == x, y1 == y) , And(x1 == y, y1 == x) )
return x1, y1, c
# predicates of interest
def sorted_list(x): # list is sorted
return And([x <= y for x,y in zip(x , x[1:])])
def in_list(x,a): # x is in the list of a
return Or([x == y for y in a])
def sub_list(a, b): # all elements of a appear in b
return And([in_list(x,a) for x in b ])
def same_elems(a,b): # a contains the same elements as b
return And(sub_list(a,b), sub_list(b,a))
def verify_network(pairs_to_compare, N):
s = Solver()
a = [Int(f"x_{i}") for i in range(N)] #build initial array in z3 variables
#a_orig = a.copy() # keep around copy for correctness predicate
for i,j in pairs_to_compare:
x = a[i]
y = a[j]
x1, y1, c = compare_and_swap_z3(x,y)
a[i] = x1
a[j] = y1
s.add(c)
#s.add(Not(And(sorted_list(a), same_elems(a_orig,a))))
s.add(Not(sorted_list(a)))
if s.check() == unsat:
print("proved")
return True
else:
return False
N = 8
verify_network(list(oddeven_merge_sort(N)), N)
view raw prelude.py hosted with ❤ by GitHub

and here is a simple generating thing for all possible pairs.

def all_swaps(m): # all pairs of integers from 0 to m-1
return [ [(i, j)] for i in range(m) for j in range(i+1, m) ]
# All list of length n on swaps on m wires
def all_networks(m, n):
if n == 0:
return []
elif n == 1:
return all_swaps(m)
else:
return [ c + swap for c in all_networks(m,n-1) for swap in all_swaps(m)]
def synthesize(N):
for n in range(N**2): # we can definitely do it in N*2 gates.
print(f"trying network size: {n}")
for pairs_to_compare in all_networks(N,n):
if verify_network(pairs_to_compare, N):
return pairs_to_compare
synthesize(4)
view raw synthesize.py hosted with ❤ by GitHub

As is, this is astoundingly slow. Truly truly abysmally slow. The combinatorics of really naively search through this space is abysmal. I doubt you’re going to get more than a network of size 6 out of this as is.

Some possible optimizations: early pruning of bad networks with testing, avoiding ever looking at obviously bad networks. Maybe a randomized search might be faster if one doesn’t care about optimality. We could also ask z3 to produce networks.

For more on program synthesis, check out Nadia Polikarpova’s sick course here.

https://github.com/nadia-polikarpova/cse291-program-synthesis

Rough Ideas on Categorical Combinators for Model Checking Petri Nets using Cvxpy

Petri nets are a framework for modeling dynamical systems that is very intuitive to some people. The vanilla version is that there are discrete tokens at nodes on a graph representing resources of some kind and tokens can be combined according to the firing of transition rules into new tokens in some other location.

This is a natural generalization of chemical reaction kinetics, where tokens are particular kinds of atoms that need to come together. It also is a useful notion for computer systems, where tokens represent some computational resource.

To me, this becomes rather similar to a flow problem or circuit problem. Tokens feel a bit like charge transitions are a bit like current (although not necessarily conservative). In a circuit, one can have such a small current that the particulate nature of electric current in terms of electrons is important. This happens for shot noise or for coulomb blockade for example.

If the number of tokens is very large, it seems intuitively sensible to me that one can well approximate the behavior by relaxing to a continuum. Circuits have discrete electrons and yet are very well approximated by ohm’s laws and the like. Populations are made of individuals, and yet in the thousands their dynamics are well described by differential equations.

It seems to me that mixed integer programming is a natural fit for this problem. Mixed integer programming has had its implementations and theory heavily refined for over 70 years so now very general purpose and performant off the shelf solvers are available. The way mixed integer programs are solved is by considering their quickly solved continuous relaxation (allowing fractional tokens and fractional transitions more akin to continuous electrical circuit flow) and using this information to systematically inform a discrete search process. This seems to me a reasonable starting approximation. Another name for petri nets is Vector Addition Systems, which has more of the matrix-y flavor.

We can encode a bounded model checking for reachability of a petri net into a mixed integer program fairly easily. We use 2-index variables, the first of which will be used for time step. We again turn to the general purpose functional way of encoding pointful dsls into pointfree ones as I have done here and here. The key point is that you need to be careful where you generate fresh variables. This is basically copy catting my posts here. http://www.philipzucker.com/categorical-combinators-for-graphviz-in-python/ http://www.philipzucker.com/a-sketch-of-categorical-relation-algebra-combinators-in-z3py/

I’m like 70% sure what I did here makes sense, but I’m pretty sure the general idea makes sense with some fiddling.

T = 10 # total number of time steps as a global parameter
class PetriCat():
def compose(f,g):
def res():
a, b , fcon = f()
b1, c, gcon = g()
return a, c, fcon + gcon + [b == b1]
def idd():
def res()
x = cvx.Variable((T-1,1), integer = True)
return x, x, [x >= 0]
def par(f,g):
def res():
a, b , fcon = f()
c, d , gcon = g()
return cvx.vstack([a,c]), cvx.vstack([b,d]), fcon + gcon
return res
def weighted_block(wi, wo, wint):
def res():
(Na, Ni) = wi.shape # number inputs, actions
(Na1,No) = wo.shape
(Na2, Nint) = wint.shape
assert(Na == Na1)
assert(Na == Na2)
action = cvx.Variable((T-1, Na), integer=True)
internal = cvx.Variable((T, Nint), integer =True)
flowin = action @ wi
flowout = action @ wo
return flowin, flowout, [internal[1:,:] == internal[:-1,:] + action @ wint, action >= 0, internal >= 0]
return res
def run(f):
a, b, fcon = f()
prob = cvx.Problem(cvx.Minimize(1), fcon)
prob.solve()
return a, b
# We need some way of specifying the initial and final states of things, Just more parameters to constructor functions I think
view raw petricat.py hosted with ❤ by GitHub

The big piece is the weighted_block function. It let’s you build a combinator with an internal state and input and output flow variables. You give matrices with entries for every possible transition. Whether transitions occurred between t and t+1 is indicated by integer variables. There is also possible accumulation of tokens at nodes, which also requires integer variables. Perhaps we’d want to expose the token state of the nodes to the outside too?

Weighted block schematically looks something like this

We can also get out a graphical representation of the net by reinterpreting our program into GraphCat. This is part of the power of the categorical interface. http://www.philipzucker.com/categorical-combinators-for-graphviz-in-python/

When I talked to Zach about this, he seemed skeptical that MIP solvers wouldn’t eat die a horrible death if you threw a moderately large petri net into them. Hard to say without trying.

Thoughts

There is an interesting analogy to be found with quantum field theory in that if you lift up to considering distributions of tokens, it looks like an occupation number representation. See Baez. http://math.ucr.edu/home/baez/stoch_stable.pdf

If you relax the integer constraint and the positivity constraints, this really is a finite difference formulation for capacitor circuits. The internal states would then be the charge of the capacitor. Would the positivity constraint be useful for diodes?

I wonder how relevant the chunky nature of petri nets might be for considering superconducting circuits, which have quantization of quantities from quantum mechanical effects.

Cvxpy let’s you describe convex regions. We can use this to implement a the convex subcategory of Rel which you can ask reachability questions. Relational division won’t work probably. I wonder if there is something fun there with respect the the integerizing operation and galois connections.

Edit: I should have googled a bit first. https://www.sciencedirect.com/science/article/pii/S0377221705009240 mathemtical programming tecchniques for petri net reachability. So it has been tried, and it sounds like the results weren’t insanely bad.

Categorical Combinators for Graphviz in Python

Graphviz is a graph visualization tool https://www.graphviz.org/. In Conal Elliott’s Compiling to categories http://conal.net/papers/compiling-to-categories/, compiling code to its corresponding graphviz representation was one very compelling example. These graphs are very similar to the corresponding string diagram of the monoidal category expression, but they also look like boolean circuit diagrams. Besides in Conal Elliott’s Haskell implementation, there is an implementation in the Julia Catlab.jl project https://epatters.github.io/Catlab.jl/stable/

I’ve basically implemented a toy version of a similar thing in python. It lets you do things like this

plus = GraphCat.block("+", ["a","b"], ["c"])
I = GraphCat.idd()
p1 = plus
p2 = p1 @ (p1 * p1)
p3 = p1 @ (p2 * p2)
p4 = p1 @ (p3 * p3)
d = p4.run()
d.format = "png"
d.render("adders")
view raw adder.py hosted with ❤ by GitHub

Why python?

  • Python is the lingua franca of computing these days. Many people encounter it, even people whose main thing isn’t computers.
  • The python ecosystem is nutso good.
  • Julia is kind of an uphill battle for me. I’m fighting the battle ( https://github.com/philzook58/Rel.jl ) , but I already know python pretty well. I can rip this out and move on.

What I did was implement some wrappers around the graphviz python package. It exposes a not very feature rich stateful interface. It is pretty nice that it prints the graphs inline in jupyter notebooks though.

The code is written in a style very similar (and hopefully overloadable with) to that of z3py relation algebra. http://www.philipzucker.com/a-sketch-of-categorical-relation-algebra-combinators-in-z3py/ . There is a fairly general cookbook method here for categorifying dsl. Since graphviz does not directly expose fresh node creation as far as I can tell, I made my own using a random number generator. The actual combinators are graphviz object processors, so we build up a giant functional chain of processors and then actually execute it with run. The inputs and outputs are represented by lists of node names. The is some design space to consider here.

I also had to use class based wrappers Based on the precedent set by the python 3 matrix multiplication operator @, I think it is a requirement that this also be used for category composition. id is a keyword or something in python unfortunately. For monoidal product, I feel like overloading power ** looks nice even if it is a nonsensical analogy, * is also not too bad. I went with * for now.

The graphviz graphs aren’t quite string diagrams. They make no promise to preserve the ordering of your operations, but they seem to tend to.

import random
import graphviz
class GraphCat():
GC = GraphCat
def __init__(self,res): # just hold the graphviz processor function
self.res = res
def fresh(n): # makes random numbers for fresh node labels
return list([str(random.randint(0,1e9)) for i in range(n)])
def idd(): # identity morhism. 1 input 1 output.
def res(g):
#nodes = GC.fresh(n)
node = GC.fresh(1)[0]
#for node in nodes:
g.node(node, shape="point")
return [node], [node]
return GraphCat(res)
def compose(g, f): # compose morphisms. Makes graphviz edges between node labels generated by g and f
f = f.res
g = g.res
def res(G):
a, b = f(G)
b1, c = g(G)
assert(len(b) == len(b1))
G.edges(zip(b,b1))
return a, c
return GraphCat(res)
def par(f, g): #monoidal product. Puts f and g in parallel
f = f.res
g = g.res
def res(G):
a, b = f(G)
c, d = g(G)
return a + c, b + d
return GraphCat(res)
def dump(): # dump deletes this edge with an empty circle
def res(G):
node = GC.fresh(1)[0]
G.node(node, shape="point", fillcolor="white")
return [node], []
return GraphCat(res)
def dup(): # duplicate this edge
def res(G):
node = GC.fresh(1)[0]
G.node(node, shape="point", fillcolor="green")
return [node], [node, node]
return GraphCat(res)
def converse(f): # turn input to output of this combinator
f = f.res
def res(G):
a, b = f(G)
return b, a
return GraphCat(res)
def named_simple(name): # a simple labeled 1 input 1 output object
def res(G):
node = GC.fresh(1)[0]
G.node(node,name)
return [node], [node]
return GraphCat(res)
def block(name, inputs, outputs): # an object with labelled ports
inputs_label = [f"<{inp}> {inp}" for inp in inputs]
outputs_label = [f"<{outp}> {outp}" for outp in outputs]
#return # use graphviz to build block with label and n ports
def res(G):
node = GC.fresh(1)[0]
inputs1 = [f"{node}:{inp}" for inp in inputs]
outputs1 = [f"{node}:{outp}" for outp in outputs]
grphstring = "{ {" + " | ".join(inputs_label) + "} | " + name + " | {" + "|".join(outputs_label) + "} }"
G.node(node,grphstring, shape="record")
return inputs1, outputs1
return GC(res)
def fst(): # project out first par of tuple. semnatically equal to (id * dump)
return GraphCat.block("fst", ["a","b"], ["a1"])
def snd(): # dump * id
return GraphCat.block("fst", ["a","b"], ["b1"])
def run(self): # will run the object on a fresh graphviz object
dot = graphviz.Digraph()
self.res(dot)
return dot
def __matmul__(self, rhs): # matrix multiplication is a natural analog of composition
return GC.compose(self, rhs)
def __mul__(self, rhs): # monoidal product
return GC.par(self, rhs)
def __add__(self,rhs): # hmm. What should addition be? Join?
pass
def const(label): # inject a constant in. A labelled node with 1 outport and no input
def res(G):
node = GraphCat.fresh(1)[0]
G.node(node, str(label))
return [], [node]
return GC(res)
def cup():
def res(G):
nodes = GraphCat.fresh(2)
for node in nodes:
G.node(node, shape="point")
G.edge(nodes[0],nodes[1])
return [], nodes
return GraphCat(res)
def cap():
return GraphCat.converse(GraphCat.cup())
view raw graphcat.py hosted with ❤ by GitHub

Here’s some example usage

cup = GraphCat.cup()
cap = GraphCat.cap()
d =  cap @ (I * I) @ cup  #(I * cap) @ (I * I * I) @ (cup * I) 
d.run()
d = plus @ (GC.const(1) * GC.const(2))
d = d.run()
GC = GraphCat
f = GraphCat.named_simple("f")
g = GraphCat.named_simple("g")
I = GraphCat.idd()
dump = GC.dump()
dup = GC.dup()
diagram = ((f * I) @ dup @ g @ (dump * I)  @ (I * f) @ (f * f)) * g
diagram.run()

Class based overloading is the main paradigm of overloading in python. You overload a program into different categories, by making a program take in the appropriate category class as a parameter.

# by passing in different category classes, we can make polymorphic functions
# They have to have a uniform interface though, which is hard to constrain in python.
def polymorphic_prog(M):
    idd = M.idd()
    dump = M.dump()
    dup = M.dup()
    return (idd * dump) @ dup
polymorphic_prog(GraphCat).run()

For example something like this ought to work. Then you can get the graph of some matrix computation to go along with it’s numpy implementation

class FinVect(np.ndarray):

    def compose(f,g):
        return f @ g
    def idd(n):
        return np.eye(n)
    def par(f,g):
        return np.kron(f,g)
    def __mult__(self,rhs):
        return np.kron(f,g)
# and so on. 

Maybe outputting tikz is promising? https://github.com/negrinho/sane_tikz

Stupid is as Stupid Does: Floating Point in Z3Py

Floating points are nice and all. You can get pretty far pretending they are actually numbers. But they don’t obey some mathematical properties that feel pretty obvious. A classic to glance through is “What Every Computer Scientist Should Know About Floating-Point Arithmetic” https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

We can check some properties with z3py. Here are a couple simple properties that succeed for mathematical integers and reals, but fail for floating point

from z3 import *
def numbery_proofs(sort):
x = Const("x", sort)
y = Const("y", sort)
z = Const("z", sort)
print("Commutativity")
prove(x + y == y + x) #Commutativity
print("Associativity")
prove(((x + y) + z) == ((x + (y + z)))) #associativity
print("Additive Identity")
prove(x + 0 == x) # 0 additive identity
print("Multiplicative Identity")
prove(1 * x == x)
print("Trichotomy")
prove(Or(x > 0, x < 0, x == 0)) #trichotomy
print("Distributivity")
prove(x * (y + z) == x * y + x * z)
print("Square positive")
prove(x * x >= 0)
print("Ints -----------------")
numbery_proofs(IntSort())
print("Reals ---------------")
numbery_proofs(RealSort())
print("Float----------------")
numbery_proofs(Float16())
''' Output
Ints -----------------
Commutativity
proved
Associativity
proved
Additive Identity
proved
Multiplicative Identity
proved
Trichotomy
proved
Distributivity
proved
Square positive
proved
Reals ---------------
Commutativity
proved
Associativity
proved
Additive Identity
proved
Multiplicative Identity
proved
Trichotomy
proved
Distributivity
proved
Square positive
proved
Float----------------
Commutativity
proved
Associativity
counterexample
[z = -1.9375*(2**8),
y = 1.0009765625*(2**14),
x = 1.5224609375*(2**4)]
Additive Identity
counterexample
[x = -0.0]
Multiplicative Identity
proved
Trichotomy
counterexample
[x = NaN]
Distributivity
counterexample
[z = -1.0029296875*(2**-2),
y = 1.01171875*(2**-10),
x = -1.4833984375*(2**8)]
Square positive
counterexample
[x = NaN]
'''
view raw numbery.py hosted with ❤ by GitHub

I recently saw on twitter a reference to a Sylvie Boldo paper https://hal.archives-ouvertes.fr/hal-01148409/ “Stupid is as Stupid Does: Taking the Square Root of the Square of a Floating-Point Number”.

In it, she uses FlocQ and Coq to prove a somewhat surprising result that the naive formula \sqrt{x^2} = |x| actually is correct for the right rounding mode of floating point, something I wouldn’t have guessed.

Z3 confirms for Float16. I can’t get Float32 to come back after even a day on a fairly beefy computer. If I use FPSort(ebits,sbits) rather than a standard size, it just comes back unknown, so i can’t really see where the cutoff size is. This does not bode well for checking properties of floating point in z3 in general. I think a brute force for loop check of 32 bit float properties is feasible. I might even be pretty fast. To some degree, if z3 is taking forever to find a counterexample, I wonder to what to degree the property is probably true.

If anyone has suggestions, I’m all ears.

from z3 import *
x = FP("x", Float16())
rm = RNE() # Rounding Nearest Ties to Even
x2 = fpMul(rm, x, x)
absx1 = fpSqrt(rm, x2)
absx2 = fpAbs(x)
s = Solver()
s.add( 0.01 <= x, x <= 100)
s.add( absx1 != absx2)
res = s.check()
print(res)
if res == sat:
m = s.model()
print(m.eval(absx1))
print(m.eval(absx2))
view raw stupid.py hosted with ❤ by GitHub

A Sketch of Gimped Interval Propagation with Lenses

David Sanders (who lives in Julia land https://github.com/JuliaIntervals ) explained a bit of how interval constraint propagation library worked to me last night. He described it as being very similar to backpropagation, which sets off alarm bells for me.

Backpropagation can be implemented in a point-free functional style using the lens pattern. http://www.philipzucker.com/reverse-mode-differentiation-is-kind-of-like-a-lens-ii/ Lenses are generally speaking a natural way to express in a functional style forward-backward pass algorithm that shares information between the two passes .

I also note Conal Elliot explicitly mentions interval computation in his compiling to categories work http://conal.net/papers/compiling-to-categories/ https://github.com/conal/concat and he does have something working there.

Interval arithmetic itself has already been implemented in Haskell in Ed Kmett’s interval package. https://hackage.haskell.org/package/intervals-0.9.1/docs/Numeric-Interval.html so we can just use that.

The interesting thing the backward pass gives you is that everything feels a bit more relational rather than functional. The backward pass allows you to infer new information using constraints given down the line. For example, fuse :: Lens (a,a) a let’s you enforce that two variables we actually equal. The lens pattern lets you store the forward pass intervals in a closure, so that you can intersect it with the backwards pass intervals.

I make no guarantees what I have here is right. It’s a very rough first pass. It compiles, so that is cool I guess.

module Numeric.ADLens.Interval where
import Numeric.Interval
-- interval combinators
type Lens a b = a -> (b, b -> a)
class Lattice a where
(/\) :: a -> a -> a
(\/) :: a -> a -> a
top :: a
bottom :: a
-- x /\ x = x
--
instance (Lattice a, Lattice b) => Lattice (a,b) where
(x,y) /\ (a,b) = ( x /\ a , y /\ b )
(x,y) \/ (a,b) = ( x \/ a , y \/ b )
top = (top, top)
bottom = (bottom, bottom) -- ? very fishy
instance (Fractional a, Ord a) => Lattice (Interval a) where
(/\) = intersection
(\/) = hull
top = whole
bottom = empty
instance Lattice () where
_ /\ _ = ()
_ \/ _ = ()
top = () -- fishy
bottom = () -- fishy
id :: Lattice a => Lens a a
id = \x -> (x, \x' -> x /\ x')
dup :: Lattice a => Lens a (a,a)
dup = \x -> ( (x,x), \(x1,x2) -> x1 /\ x2 )
fuse :: Lattice a => Lens (a,a) a
fuse = \(x,x') -> let x'' = x /\ x' in ( x'' , \x''' -> let x4 = x'' /\ x''' in (x4, x4))
-- cap is equality
cap :: Lattice a => Lens (a,a) ()
cap = \(x,x') -> ((), \_ -> let x'' = x /\ x' in (x'', x''))
posinf :: Fractional a => a
posinf = sup whole
neginf :: Fractional a => a
neginf = inf whole
-- insist that x <= y
leq :: (Ord a, Fractional a) => Lens (Interval a, Interval a) ()
leq = \(x,y) -> (() , \_ -> ( x /\ (neginf ... (sup y)) , y /\ ( (inf x) ... posinf) ) )
-- the repetitive-ness of the /\ is a real code smell.
swap :: (Lattice a, Lattice b) => Lens (a,b) (b,a)
swap = \(x,y) -> ( (y,x) , \(y',x') -> (x /\ x', y /\ y' ) )
-- min :: Lens (a,a) a
-- min = \(x,y) -> (min x y, \m -> (x /\ (m + pos) , y /\ (m + pos))
cup :: Lattice a => Lens () (a,a) -- not a very good combinator
cup = \_ -> ( (top, top) , \_ -> () )
-- it sure seems like we are doing a lot of unnecessary /\ operations
labsorb :: Lattice a => Lens ((), a) a
labsorb = \(_, x) -> (x, (\x' -> ((), x' /\ x)))
-- rabsorb
{-
and vice verse
-}
compose f g = \x -> let (y, f' ) = f x in
let (z, g' ) = g y in
(z , f' . g')
-- we could do the intersection here.
{-
compose f g = \x -> let (y, f' ) = f x in
let (z, g' ) = g y in
(z , \z' -> f' (y /\ (g' z')) /\ x)
-}
-- can we though? Does this still work for add?
{-
yeah no, there is something off.
You need to change what you do depending on the number of arguments to the function.
hmm.
-}
const :: a -> Lens () a
const x = \_ -> (x, \_ -> ())
{-
Functions and taylor models... ????
-}
fst :: Lattice a => Lens (a,b) a
fst = \(x,y) -> (x, \x' -> (x /\ x', y))
snd :: Lattice b => Lens (a,b) b
snd = \(x,y) -> (y, \y' -> (x, y /\ y'))
-- par :: Lens a b -> Lens c d -> Lens (a,c) (b,d)
-- par f g =
sum :: (Num a, Lattice a) => Lens (a,a) a
sum = \(x,y) -> ( x + y, \s -> ( x /\ (s - y), y /\ (s - x)))
mul :: (Fractional a, Lattice a) => Lens (a,a) a
mul = \(x,y) -> (x * y, \m -> (x /\ (m / y) , y /\ (m / x) ))
div :: (Fractional a, Lattice a) => Lens (a,a) a
div = \(x,y) -> (x / y, \d -> ( (d * y) /\ x , ((recip d) * x) /\ y ))
liftunop :: Lattice a => (a -> b) -> (b -> a) -> Lens a b
liftunop f finv = \x -> ( f x , \y -> x /\ (finv y) )
recip' :: (Fractional a, Lattice a) => Lens a a
recip' = liftunop recip recip
sin' :: (Floating a, Lattice a) => Lens a a
sin' = liftunop sin asin
cos' :: (Floating a, Lattice a) => Lens a a
cos' = liftunop cos acos
acos' :: (Floating a, Lattice a) => Lens a a
acos' = liftunop acos cos
exp' :: (Floating a, Lattice a) => Lens a a
exp' = liftunop exp log
-- depending on how library implements sqrt?
sqr :: (Floating a, Lattice a) => Lens a a
sqr = \x -> (x ** 2, \x2 -> x /\ (sqrt x) /\ negate (sqrt x) )
-- sqrt = liftunop ()
pow :: (Floating a, Lattice a) => Lens (a,a) a
pow = \(x,n) -> ( x ** n, \xn -> (x /\ xn ** (recip n), n /\ (logBase x xn) ))
view raw interval_lens.hs hosted with ❤ by GitHub

Here’s my repo in case I fix more things up and you wanna check it out https://github.com/philzook58/ad-lens/blob/master/src/Numeric/ADLens/Interval.hs

Now having said that, to my knowledge Propagators are a more appropriate technique for this domain. https://www.youtube.com/watch?v=s2dknG7KryQ https://www.youtube.com/watch?v=nY1BCv3xn24 I don’t really know propagators though. It’s on my to do list.

Lens has a couple problems. It is probably doing way more work than it should, and we aren’t iterating to a fixed point.

Maybe an iterated lens would get us closer?

data Lens s t a b = Lens (a -> (b , (b -> (a, Lens s t a b))))

This is one way to go about the iterative process of updating a neural network in a functional way by evaluating it over and over and backpropagating. The updated weights will be stored in those closures. It seems kind of nice. It is clearly some relative of Iteratees and streaming libraries like pipes and conduit (which are also a compositional bidirectional programming pattern), the main difference being that it enforces a particular ordering of passes (for better or worse). Also I haven’t put in any monadic effects, which is to some degree the point of those libraries, but also extremely conceptually clouding to what is going on.

Another interesting possiblity is the type

type Lens s t a b = s -> (a, b -> t)

Lens s (Interval s) a (Interval a)

This has pieces that might be helpful for talking about continuous functions in a constructive way. It has the forward definition of the function, and then the inverse image of intervals. The inverse image function depends on the original evaluation point? Does this actually make sense? The definition of continuity is that this inverse image function must make arbitrarily small image intervals as you give it smaller and smaller range intervals. Continuity is compositional and plays nice with many arithmetic and structural combinators. So maybe something like this might be a nice abstraction for proof carrying continuous functions in Coq or Agda? Pure conjecture.

A Sketch of Categorical Relation Algebra Combinators in Z3Py

I’ve discussed relation algebra before. Relations are sets of tuples. There, I implemented the relations naively using lists for sets. This is very simple, and very clean especially with list comprehension syntax. It is however horrifically inefficient, and we could only deal with finitely enumerable domains. The easiest path to fixing these problems is to cash out to an external solver, in this case z3.

There are many beautifully implemented solvers out there and equally beautiful DSL/modeling languages. Examples in mind include sympy, cvxpy, and z3. These modeling languages require you to instantiate variable objects and build expressions out of them and then hand it off to the solver. This is a reasonable interface, but there are advantages to a more categorical/point-free style DSL.

Point-free languages are ones that do not include binding forms that introduce bound/dummy variables. Examples of binding forms like this are \lambda \sum \max \min \int \sup \lim \forall \exists. One problem lies in the fact that the names of bound variables don’t matter, and that they end up accidentally smashing into each other. You may have experienced this in physics or math class as the dummy indices or dummy variable problem causing you to screw up your calculation of some cross product identity or some complicated tensor sum. These are surprisingly subtle problems, very difficult to diagnose and get right. de Bruijn indices is a technique for giving the bound variables canonical names, but it sucks to implement in its own way. When you make a DSL point free, it is a joy to manipulate, optimize, and search. I think this may be the core of why category theory is good language for mathematics and programming.

Point-free style also tends to have significant economy of size, for better or worse. Sometimes it is better to have an expression very dense in information. This is important if you are about the algebraically manipulate an expression with paper and pencil. Every manipulation requires a great deal of mind numbing copying as you proceed line by line, so it can be excruciating if your notation has a lot of unnecessary bulk.

Relations are like functions. The two pieces of the tuple can be roughly thought of as the “input” and the “output”. Relations are only loosely directional though. Part of the point of relations is that the converse (inverse) of a relation is easy to define.

When we are implement relations, we have a choice. Do we want the relation to produce its variables, accept its variable, or accept one and produce the other? There are advantages to each. When relations were [(a,b)], a -> b -> Bool, and a -> [b] converting between these forms was a rather painful enumeration process. The sting of converting between them is taken out by the fact that the conversion is no longer a very computationally expensive process, since we’re working at the modeling layer.

When you’re converting a pointful DSL to pointfree DSL, you have to be careful where you instantiate fresh variables or else you’ll end up with secret relations that you didn’t intend. Every instantiation of id needs to be using fresh variables for example. You don’t want the different id talking to each other. Sometimes achieving this involves a little currying and/or thunking.

There is a pattern that I have notice when I’m using modeling languages. When you have a function or relation on variables, there are constraints produced that you have to keep a record of. The pythonic way is to have a Model or Solver object, and then have that objects mutate an internal record of the set of constraints. I don’t particularly enjoy this style though. It feels like too much boiler plate.

In Haskell, I would use something like a Writer monad to automatically record the constraints that are occurring. Monads are not really all that pleasant even in Haskell, and especially a no go in python without “do” syntax.

However, because we are going point free it is no extra cost at all to include this pipework along for the ride in the composition operation.

Here are implementations of the identity and composition for three different styles. Style 1 is fully receptive, style 2 is mixed (function feeling) and style 3 is fully productive of variables.

Fair warning, I’m being sketchy here. I haven’t really tried this stuff out.

def rid1(x,y): # a receptive relations. It receives variables
return x == y
def compose1(f, sort, g): # annoying but we need to supply the sort of the inner variable
def res(x,z):
y = FreshConst(sort)
return Exists([y], And(f(x,y), g(y,z)))
return res
def rid2(x): # a functional relation. It receives a variable then produces one.
y = FreshConst(x.sort())
return y, x == y
def compose2(f,g):
def res(x):
y, cf = f(x)
z, cg = g(y)
return z, Exists([y], And(cf,cg) )
def rid3(sort): # a type indexed generator of relations. Annoying we have to supply the type of the variable
def res(): # a productive relation
x = FreshConst(sort)
y = FreshConst(sort)
return x, y, x == y
return res
def compose3(f,g):
def res():
x, yf, cf = f()
yg, z, cg = g()
return x, z, Exists([yf,yg], And(cf, cg, yf == yg))
return res
view raw idcomp.py hosted with ❤ by GitHub

z3 is a simply typed language. You can get away with some polymorphism at the python level (for example the == dispatches correctly accord to the object) but sometimes you need to manually specify the sort of the variables. Given these types, the different styles are interconvertible

def lift12(sorty, f):
def res(x):
y = FreshConst(sorty)
c = f(x,y)
return y, c
return res
def lift23(sortx, f):
def res():
x = FreshConst(sortx)
y, c = f(x)
return x, y, c
return res
def lift31(f):
def r(x,y):
x1, y1, c = f()
return x1, y1, And(c, x1 == x, y1 == y)
return res
view raw convert.py hosted with ❤ by GitHub

We can create the general cadre of relation algebra operators. Here is a somewhat incomplete list

trivial = BoolVal(True)
def top1(x,y): # top is the full relation,
return trivial
def bottom1(x,y):
return BoolVal(False)
def top2(sorty):
def res(x):
y = FreshConst(sorty)
return y, trivial
return res
def top3(sortx, sorty):
def res():
x = FreshConst(sortx)
y = FreshConst(sorty)
return x, y, trivial
return res
def converse1(r):
return lambda x,y: r(y,x)
def meet1(p,q):
return lambda x,y : p(x,y) & q(x,y)
def join1(p,q):
return lambda x,y : p(x,y) | q(x,y)
# product and sum types
def fst1(x,y): # proj(0)
return y == x.sort().accessor(0,0)(x)
def snd1(x,y): # proj(1)
return y == x.sort().accessor(0,1)(x)
def left1(x,y):
return y == y.sort().constructor(0)(x)
def right1(x,y):
return y == y.sort().constructor(1)(x)
def inj1(n):
return lambda x, y: return y == y.sort().constructor(n)(x)
def proj1(n):
return lambda x, y: return y == x.sort().accessor(0,n)(x)
def fan(p,q):
def res(x,y):
a = y.sort().accessor(0,0)(y)
b = y.sort().accessor(0,1)(y)
return And(p(x,a), q(x,b))
return res
def dup1(x,(y1,y2)): # alternatively we may not want to internalize the tuple into z3.
return And(x == y1 , x == y2)
def convert_tuple((x,y), xy): # convert between internal z3 tuple and python tuple.
return xy == xy.constructor(0)(x,y)
#def split():
#def rdiv # relation division is so important, and yet I'm always too mentally exhausted to try and straighten it out
def itern(n, sortx, p): # unroll
if n == 0:
return rid1(sortx)
else:
return compose(starn(n-1, sortx, p), p)
def starn(n, sortx, p): # unroll and join
if n == 0:
return rid1(sortx)
else:
return join(rid, compose(starn(n-1, sortx, p))) # 1 + x * p
# more specialized operations than general puyrpose structural operators
def lte1(x,y):
return x <= y
def sum1(x,y): # I'm being hazy about what x is here exactly
return x[0] + x[1] == y
def npsum(x,y):
return np.sum(x) == y # you can make numpy arrays of z3 variables. Some vectorized functions work.
def mul1(x,y):
return x[0] * x[1] == y
def and1((x,y), z):
return z == And(x,y)
def If1((b,t,e),y):
return If(b, t,e) == y
view raw combinators.py hosted with ❤ by GitHub

Questions about relation algebra expressions are often phrased in term of relational inclusion. You can construct a relation algebra expression, use the rsub in the appropriate way and ask z3’s prove function if it is true.

# relational properties
def rsub(p,q, sortx, sorty):
x = FreshConst(sortx)
y = FreshConst(sorty)
return ForAll([x,y].Implies(p(x,y) , q(x,y) ))
def req(p,q,sortx, sorty):
return And(rsub(p,q,sortx,sorty), rsub(q,p,sortx,sorty)
def symmetric1(sortx, sorty, r):
x = FreshConst(sortx)
y = FreshConst(sorty)
return ForAll([x,y], r(x,y) == r(y,x))
def reflexive1(sortx, r):
x = FreshConst(sortx)
return ForAll([x],r(x,x))
def transitive1(sortx,sorty,sortz, r):
x = FreshConst(sx)
y = FreshConst(sy)
ForAll([x,y], Implies(r(x,y) & r(y,z) , r(x,z))
def strict1(r,sortx):
x = FreshConst(sortx)
return Not(r(x,x))
view raw properties.py hosted with ❤ by GitHub

Z3 has solvers for

  • Combinatorial Relations
  • Linear Relations
  • Polyhedral Relations
  • Polynomial Relations
  • Interval Relations – A point I was confused on. I thought interval relations were not interesting. But I was interpetting the term incorrectly. I was thinking of relations on AxB that are constrained to take the form of a product of intervals. In this case, the choice of A has no effect on the possible B whatsoever, so it feels non relational. However, there is also I_A x I_B , relations over the intervals of A and B. This is much closer to what is actually being discussed in interval arithmetic.

Applications we can use this for:

  • Graph Problems. The Edges can be thought of as a relation between vertices. Relation composition Using the starn operator is a way to ask for paths through the graph.
  • Linear Relations – To some degree this might supplant my efforts on linear relations. http://www.philipzucker.com/linear-relation-algebra-of-circuits-with-hmatrix/ Z3 is fully capable of understanding linear relations.
  • Safety and liveness of control systems. Again. a transition relation is natural here. It is conceivable that the state space can be heterogenous in time, which is the interesting power of the categorical style. I feel like traditional control systems usually maintain the same state space throughout.
  • Program verification
  • Games? Nash equilibria?

Other Thoughts

  • Maybe we are just building a shitty version of alloy. https://alloytools.org/
  • What about uninterpeted relations? What about higher order relations? What about reflecting into a z3 ADT for a relational language. Then we could do relational program synthesis. This is one style, just hand everything off to smt. https://github.com/nadia-polikarpova/cse291-program-synthesis/tree/master/lectures
  • I should try to comply with python conventions, in particular numpy and pandas conventions. @ should be composition for example, since relation composition has a lot of flavor of matrix composition. I should overload a lot of operators, but then I’d need to wrap in a class 🙁
  • Z3 has special support for some relations. How does that play in? https://theory.stanford.edu/~nikolaj/programmingz3.html#sec-special-relations https://z3prover.github.io/api/html/ml/Z3.Relation.html
  • As long as you only use composition, there is a chaining of existentials, which really isn’t so bad.
  • What we’ve done here is basically analogous/identical to what John Wiegley did compiling to the category of z3. Slightly different in that he only allowed for existential composition rather than relational division. http://newartisans.com/2017/04/haskell-and-z3/
  • We can reduced the burden on z3 if we know the constructive proof objects that witness our various operations. Z3 is gonna do better if we can tell it exactly which y witness the composition of operators, or clues to which branch of an Or it should use.
  • It’s a bummer, but when you use quantifiers, you don’t see countermodels? Maybe there is some hook where you can, or in the dump of the proof object.
  • What about recursion schemes? The exact nature of z3 to handle unbounded problems is fuzzy to me. It does have the support to define recursive functions. Also explicit induction predicates can go through sometimes. Maybe look at the Cata I made in fancy relaion algebra post
  • I think most proof assistants have implementations of relation algebra available. I find you can do a surprising amount in z3.

Fiddling around with validated ODE integration, Sum of Squares, Taylor Models.

As I have gotten more into the concerns of formal methods, I’ve become unsure that ODEs actually exist. These are concerns that did not bother me much when I defined myself as being more in the physics game. How times change. Here’s a rough cut.

A difficulty with ODE error analysis is that it is very confusing how to get the error on something you are having difficulty approximating in the first place.

If I wanted to know the error of using a finite step size dt vs a size dt/10, great. Just compute both and compare. However, no amount of this seems to bootstrap you down to the continuum. And so I thought, you’re screwed in regards to using numerics in order to get true hard facts about the true solution. You have to go to paper and pencil considerations of equations and variables and epsilons and deltas and things. It is now clearer to me that this is not true. There is a field of verified/validated numerics.

A key piece of this seems to be interval arithmetic. https://en.wikipedia.org/wiki/Interval_arithmetic An interval can be concretely represented by its left and right point. If you use rational numbers, you can represent the interval precisely. Interval arithmetic over approximates operations on intervals in such a way as to keep things easily computable. One way it does this is by ignoring dependencies between different terms. Check out Moore et al’s book for more.

What switching over to intervals does is you think about sets as the things you’re operating on rather than points. For ODEs (and other things), this shift of perspective is to no longer consider individual functions, but instead sets of functions. And not arbitrary extremely complicated sets, only those which are concretely manipulable and storable on a computer like intervals. Taylor models are a particular choice of function sets. You are manipulating an interval tube around a finite polynomial. If during integration / multiplication you get higher powers, truncate the polynomials by dumping the excess into the interval term. This keeps the complexity under wraps and closes the loop of the descriptive system.

If we have an iterative, contractive process for getting better and better solutions of a problem (like a newton method or some iterative linear algebra method), we can get definite bounds on the solution if we can demonstrate that a set maps into itself under this operation. If this is the case and we know there is a unique solution, then it must be in this set.

It is wise if at all possible to convert an ODE into integral form. \dot{x}= f(x,t) is the same as x(t) = x_0 + \int f(x,t)dt.

For ODEs, the common example of such an operation is known as Picard iteration. In physical terms, this is something like the impulse approximation / born approximation. One assumes that the ODE evolves according to a known trajectory x_0(t) as a first approximation. Then one plugs in the trajectory to the equations of motion f(x_0,t) to determine the “force” it would feel and integrate up all this force. This creates a better approximation x_1(t) (probably) which you can plug back in to create an even better approximation.

If we instead do this iteration on an intervally function set / taylor model thing, and can show that the set maps into itself, we know the true solution lies in this interval. The term to search for is Taylor Models (also some links below).

I was tinkering with whether sum of squares optimization might tie in to this. I have not seen SOS used in this context, but it probably has or is worthless.

An aspect of sum of squares optimization that I thought was very cool is that it gives you a simple numerical certificate that confirms that at the infinitude of points for which you could evaluate a polynomial, it comes out positive. This is pretty cool. http://www.philipzucker.com/deriving-the-chebyshev-polynomials-using-sum-of-squares-optimization-with-sympy-and-cvxpy/

But that isn’t really what makes Sum of squares special. There are other methods by which to do this.

There are very related methods called DSOS and SDSOS https://arxiv.org/abs/1706.02586 which are approximations of the SOS method. They replace the SDP constraint at the core with a more restrictive constraint that can be expressed with LP and socp respectively. These methods lose some of the universality of the SOS method and became basis dependent on your choice of polynomials. DSOS in fact is based around the concept of a diagonally dominant matrix, which means that you should know roughly what basis your certificate should be in.

This made me realize there is an even more elementary version of DSOS that perhaps should have been obvious to me from the outset. Suppose we have a set of functions we already know are positive everywhere on a domain of interest. A useful example is the raised chebyshev polynomials. https://en.wikipedia.org/wiki/Chebyshev_polynomials The appropriate chebyshev polynomials oscillate between [-1,1] on the interval [-1,1], so if you add 1 to them they are positive over the whole interval [-1,1]. Then nonnegative linear sums of them are also positive. Bing bang boom. And that compiles down into a simple linear program (inequality constraints on the coefficients) with significantly less variables than DSOS. What we are doing is restricting ourselves to completely positive diagonal matrices again, which are of course positive semidefinite. It is less flexible, but it also has more obvious knobs to throw in domain specific knowledge. You can use a significantly over complete basis and finding this basis is where you can insert your prior knowledge.

It is not at all clear there is any benefit over interval based methods.

Here is a sketch I wrote for x'=x which has solution e^t. I used raised chebyshev polynomials to enforce positive polynomial constraints and tossed in a little taylor model / interval arithmetic to truncate off the highest terms.

I’m using my helper functions for translating between sympy and cvxpy expressions. https://github.com/philzook58/cvxpy-helpers Sympy is great for collecting up the coefficients on terms and polynomial multiplication integration differentiation etc. I do it by basically creating sympy matrix variables corresponding to cvxpy variables which I compile to cvxpy expressions using lambdify with an explicit variable dictionary.

import cvxpy as cvx
import numpy as np
import sos
import sympy as sy
import matplotlib.pyplot as plt
#raised chebyehve
t = sy.symbols("t")
N = 5
# seems like even N becomes infeasible.
terms = [sy.chebyshevt(n,t) + 1 for n in range(N)] # raised chebyshev functions are positive on interval [-1,1]
print(terms)
'''
for i in range(1,4):
ts = np.linspace(-1,1,100)
#print(ts)
#rint(sy.lambdify(t,terms[i], 'numpy')(ts))
plt.plot( ts , sy.lambdify(t,terms[i])(ts))
plt.show()
'''
vdict = {}
l,d = sos.polyvar(terms) # lower bound on solution
vdict.update(d)
w,d = sos.polyvar(terms, nonneg=True) # width of tube. Width is always positive (nonneg)
vdict.update(d)
u = l + w # upper curve is higher than lower by width
def picard(t,f):
return sy.integrate(f, [t,-1,t]) + np.exp(-1) # picard integration on [-1,1] interval with initial cond x(-1)=1/e
ui = picard(t,u)
li = picard(t,l)
c = []
def split(y , N): # split a polynomial into lower an upper parts.
yp = sy.poly(y, gens=t)
lower = sum([ c*t**p for (p,), c in yp.terms() if p < N])
#upper = sum([ c*x**p for (p,), c in yp.terms() if p > N])
upper = y - lower
return lower,upper
terms = [sy.chebyshevt(n,t) + 1 for n in range(N+1)]
# ui <= u
lowerui, upperui = split(ui, N) # need to truncate highest power of u using interval method
print(lowerui)
print(upperui)
du = upperui.subs(t,1) #Is this where the even dependence of N comes from?
#c += [ du >= sos.cvxify(upperui.subs(t,1), vdict), du >= sos.cvxify(upperui.subs(t,1)] # , upperui.subs(t,-1))
print(du)
lam1,d = sos.polyvar(terms,nonneg=True) # positive polynomial
vdict.update(d)
# This makes the iterated interval inside the original interval
c += sos.poly_eq( lowerui + du + lam1 , u , vdict) # write polynomial inequalities in slack equality form
# l <= li
#
lam2, d = sos.polyvar(terms,nonneg=True)
vdict.update(d)
c += sos.poly_eq( l + lam2 , li , vdict) # makes new lower bound higher than original lower bound
obj = cvx.Minimize( sos.cvxify(w.subs(t ,0.9), vdict) ) # randomly picked reasonable objective. Try minimax?
#obj = cvx.Maximize( sos.cvxify(l.subs(t ,1), vdict) )
print(c)
prob = cvx.Problem(obj, c)
res = prob.solve(verbose=True) #solver=cvx.CBC
print(res)
lower = sy.lambdify(t, sos.poly_value(l , vdict))
upper = sy.lambdify(t, sos.poly_value(u , vdict))
#plt.plot(ts, upper(ts) - np.exp(ts) ) # plot differences
#plt.plot(ts, lower(ts) - np.exp(ts) )
ts = np.linspace(-1,1,100)
plt.plot(ts, upper(ts) , label= "upper")
plt.plot(ts, lower(ts) , label= "lower")
plt.plot(ts, np.exp(ts) , label= "exact")
#plt.plot(ts,np.exp(ts) - lower(ts) )
plt.legend()
plt.show()
'''
if I need to add in
interval rounding to get closure
is there a point to this? Is it actually simpler in any sense?
Collecting up chebyshev compoentns and chebysehv splitting would perform
lanczos economization. That'd be coo
What about a bvp
Get iterative formulation.
And what are the requirements
1. We need an iterative contractive operator
2. We need to confirm all functions that forall t, l <= f <= u
map to in between li and ui. This part might be challenging
3. Get the interval contracting and small.
x <= a
y = Lx
Lx <= La ? Yes, if positive semi definite. Otherwise we need to split it.
No. Nice try. not component wise inequality.
Secondary question: finitely confirming a differential operator is positive semi definite
forall x, xLx >= 0 ?
Similar to the above. Make regions in space.
Value function learning is contractive.
hmm.
Piecewise lyapunov functions
Being able to use an LP makes it WAY faster, WAY more stable, and opens up sweet MIPpurtunities.
'''
view raw euler.py hosted with ❤ by GitHub

Seems to work, but I’ve been burned before.

man, LP solvers are so much better than SDP solvers


Random junk and links: Should I be more ashamed of dumps like this? I don’t expect you to read this.

https://github.com/JuliaIntervals/TaylorModels.jl

https://github.com/JuliaIntervals

Functional analysis by and large analyzes functions by analogy with more familiar properties of finite dimensional vector spaces. In ordinary 2d space, it is convenient to work with rectangular regions or polytopic regions.

Suppose I had a damped oscillator converging to some unknown point. If we can show that every point in a set maps within the set, we can show that the function

One model of a program is that it is some kind of kooky complicated hyper nonlinear discrete time dynamical system. And vice versa, dynamical systems are continuous time programs. The techniques for analyzing either have analogs in the other domain. Invariants of programs are essential for determining correctness properties of loops. Invariants like energy and momentum are essential for determining what physical systems can and cannot do. Lyapunov functions demonstrate that control systems are converging to the set point. Terminating metrics are showing that loops and recursion must eventually end.

If instead you use interval arithmetic for a bound on your solution rather than your best current solution, and if you can show the interval maps inside itself, then you know that the iterative process must converge inside of the interval, hence that is where the true solution lies.

A very simple bound for an integral \int_a^b f(x)dx is \int max_{x \in [a,b]}f(x)  dx= max_{x \in [a,b]}f(x) \int dx = max_{x \in [a,b]}f(x) (b - a)

The integral is a very nice operator. The result of the integral is a positive linear sum of the values of a function. This means it plays nice with inequalities.

Rigorously Bounding ODE solutions with Sum of Squares optimization – Intervals

Intervals – Moore book. Computational functional analaysis. Tucker book. Coqintervals. fixed point theorem. Hardware acceleration? Interval valued functions. Interval extensions.

  • Banach fixed point – contraction mapping
  • Brouwer fixed point
  • Schauder
  • Knaster Tarski

Picard iteration vs? Allowing flex on boundary conditions via an interval?

Interval book had an interesting integral form for the 2-D

sympy has cool stuff

google scholar search z3, sympy brings up interesting things

https://moorepants.github.io/eme171/resources.html

The pydy guy Moore has a lot of good shit. resonance https://www.moorepants.info/blog/introducing-resonance.html

Lyapunov functions. Piecewise affine lyapunov funcions. Are lyapunov functions kind of like a PDE? Value functions are pdes. If the system is piecewise affine we can define a grid on the same piecewise affine thingo. Compositional convexity. Could we use compositional convexity + Relu style piecewise affinity to get complicated lyapunov functions. Lyapunov functions don’t have to be continiuous, they just have to be decreasing. The Lie derivative wrt the flow is always negative, i.e gradeint of function points roughly in direction of flow. trangulate around equilibrium if you want to avoid quadratic lyapunov. For guarded system, can relax lyapunov constrain outside of guard if you tighten inside guard. Ax>= 0 is guard. Its S-procedure.

Best piecewise approximation with point choice?

http://theory.stanford.edu/~arbrad/papers/lr.ps linear ranking functions

Connection to petri nets?

https://ths.rwth-aachen.de/wp-content/uploads/sites/4/hs_lecture_notes.pdf

https://www.cs.colorado.edu/~xich8622/papers/rtss12.pdf

KoAt, LoAT. AProve. Integer transition systems. Termination analysis. Loops?

https://lfcps.org/pub/Pegasus.pdf darboux polynomials. barreir certificates. prelle-singer method. first integrals.

method 1. arbitrary polynomial p(t). calculate p'(t). find coefficents that make p'(t) = 0 by linear algebra. Idea: near invaraints? min max|p'(t) |

Lie Algebra method

https://www.researchgate.net/publication/233653257_Solving_Differential_Equations_by_Symmetry_Groups sympy links this paper. Sympy has some lie algebra stuff in there

https://www-users.math.umn.edu/~olver/sm.html Peter Olver tutorial

http://www-users.math.umn.edu/~olver/talk.html olver talks

https://www-sop.inria.fr/members/Evelyne.Hubert/publications/PDF/Hubert_HDR.pdf

https://www.cs.cmu.edu/~aplatzer/logic/diffinv.html andre platzer. Zach says Darboux polynomials?

https://sylph.io/blog/math.html

Books: Birhoff and Rota, Guggenheimer, different Olver books, prwctical guide to invaraints https://www.amazon.com/Practical-Invariant-Monographs-Computational-Mathematics/dp/0521857015

Idea: Approximate invariants? At least this ought to make a good coordinate system to work in where the dynamics are slow. Like action-angle and adiabatic transformations. Could also perhaps bound the

Picard Iteration

I have a method that I’m not sure is ultimately sound. The idea is to start with

Error analysis most often uses an appeal to Taylor’s theorem and Taylor’s theorem is usually derived from them mean value theorem or intermediate value theorem. Maybe that’s fine. But the mean value theorem is some heavy stuff. There are computational doo dads that use these bounds + interval analysis to rigorously integrate ODEs. See https://github.com/JuliaIntervals/TaylorModels.jl

The beauty of sum of squares certificates is that they are very primitive proofs of positivity for a function on a domain of infinitely many values. If I give you a way to write an expression as a sum of square terms, it is then quite obvious that it has to be always positive. This is algebra rather than analysis.

y(t) = \lambda(t) \and \lambda(t) is SOS \Rightarrow \forall t. y(t) >= 0. Sum of squares is a kind of a quantifier elimination method. The reverse direction of the above implication is the subject of the positivstullensatz, a theorem of real algebraic geometry. At the very least, we can use the SOS constraint as a relaxation of the quantified constraint.

So, I think by using sum of squares, we can turn a differential equation into a differential inequation. If we force the highest derivative to be larger than the required differential equation, we will get an overestimate of the required function.

A function that is dominated by another in derivative, will be dominated in value also. You can integrate over inequalities (I think. You have to be careful about such things. ) \forall t. \frac{dx}{dt} >= \frac{dx}{dt} \Rightarrow x(t) – x(0) >= y(t) – y(0) $

The derivative of a polynomial can be thought of as a completely formal operation, with no necessarily implied calculus meaning. It seems we can play a funny kind of shell game to avoid the bulk of calculus.

As an example, let’s take \frac{dx}{dt}=y y(0) = 1 with the solution y = e^t. e is a transcendental

The S-procedure is trick by which you can relax a sum of squares inequality to only need to be enforced in a domain. If you build a polynomials function that describes the domain, it that it is positive inside the domain and negative outside the domain, you can add a positive multiple of that to your SOS inequalities. Inside the domain you care about, you’ve only made them harder to satisfy, not easier. But outside the domain you have made it easier because you can have negative slack.

For the domain t \in [0,1] the polynomial (1 - t)t works as our domain polynomial.

We parametrize our solution as an explicit polynomial x(t) = a_0 + a_1 t + a_2 t^2 + .... It is important to note that what follows is always linear in the a_i.

\frac{dx}{dt} - x >= 0 can be relaxed to \frac{dx}{dt} - x(t) + \lambda(t)(1-t)t >= 0 with \lambda(t) is SOS.

So with that we get a reasonable formulation of finding a polynomial upper bound solution of the differential equation

\min x(1)

\frac{dx}{dt} - x(t) + \lambda_1(t)(1-t)t =  \lambda_2(t)

\lambda_{1,2}(t) is SOS.

And here it is written out in python using my cvxpy-helpers which bridge the gap between sympy polynomials and cvxpy.

We can go backwards to figure out sufficient conditions for a bound. We want x_u(t_f) \gte x(t_f). It is sufficient that \forall t. x_u(t) \gte x(t). For this it is sufficient that \forall t. x_u'(t)  >= x'(t) \and x_u(t_i) >= x(t_i) . We follow this down in derivative until we get the lowest derivative in the differential equation. Then we can use the linear differential equation itself x^{(n)}(t) = \sum_i a_i(t) x^{(i)}(t). x_u^{(n)}(t) >= \sum max(a_i x^{(i)}_u, x^{(i)}_l).

a(t) * x(t) >= \max a(t) x_u(t), a(t) x_l(t). This accounts for the possibility of terms changing signs. Or you could separate the terms into regions of constant sign.

The minimization characterization of the bound is useful. For any class of functions that contains our degree-d polynomial, we can show that the minimum of the same optimization problem is less than or equal to our value.

Is the dual value useful? The lower bound on the least upper bound

Doesn’t seem like the method will work for nonlinear odes. Maybe it will if you relax the nonlinearity. Or you could use perhaps a MIDSP to make piecewise linear approximations of the nonlinearity?

It is interesting to investigtae linear programming models. It is simpler and more concrete to examine how well different step sizes approximate each other rather than worry about the differential case.

We can explicit compute a finite difference solution in the LP, which is a power that is difficult to achieve in general for differential equations.

We can instead remove the exact solution by a convservative bound.

While we can differentiate through an equality, we can’t differentiate through an inequality. Differentiation involves negation, which plays havoc with inequalities. We can however integrate through inequalities.

\frac{dx}{dt} >= f \and x(0) >= a \Rightarrow x(t) >= \int^t_0 f(x) + a$

As a generalization we can integrate \int p(x) over inequalities as long as p(x) \gte 0

In particular \forall t. \frac{dx}{dt} >= \frac{dx}{dt} \Rightarrow x(t) – x(0) >= y(t) – y(0) $

We can convert a differential equation into a differential inequation. It is not entirely clear to me that there is a canonical way to do this. But it works to take the biggest.

\frac{dx}{dt} = A(t)x + f(t)

Is there a tightest

We can integrate

Here let’s calculate e

https://tel.archives-ouvertes.fr/tel-00657843v2/document Thesis on ODE bounds in Isabelle

myfunc x y = 3

not so good. very small

Stupid Z3Py Tricks Strikes Back: Verifying a Keras Neural Network

Neural networks are all the rage these days. One mundane way of looking at neural networks is that they are a particular class of parametrized functions . What makes them useful is:

  1. They can be used at insane scale due to their simplicity and excellent available implementations and tooling
  2. There are intuitive ways to input abstract structure and symmetry expected of a problem, for example translation symmetry, or a hierarchy of small scale pattern recognition combining into large scale structures. How this all works is very peculiar.
  3. Inspirational analogies can be drawn from nature.

Neural networks made out of just relus (rectified linear units, relu(x) = max(0,x) ) and linear layers are particularly amenable to formal analysis. Regarding the weights as fixed (once a network has be trained), the complete neural network is a piecewise linear function. The regions where it is piecewise define are polyhedra (They are defined by the hyperplanes of the relu transitions as inequality constraints). Such functions are among those the most amenable to automated rigorous analysis.

A relu

Most machine learning tasks don’t have a mathematically precise specification. What is the mathematically precise definition of a picture of a horse? We could try to come up with something (this is sort of what good old fashioned AI tried to do), but in my opinion it would be rather suspect.

Tasks that do have a precise spec are questionable areas for machine learning techniques, because how can you know that the network meets the spec? Also, one would suspect such a problem would have structure that you might be better off with a more traditional algorithmic approach.

However, there a a couple areas where one does have reasonable formal questions one might want to ask about a neural network:

  • Robustness around training and validation data. Finding Adversarial examples or proving there are none.
  • Games like Go. Alpha Go is a marriage of more traditional algorithmic approaches and machine learning. There is a core of traditional game tree search to it.
  • Control Systems – There are many control systems which we do have a reasonable formal spec one could write, such as walking robots. These systems are so high dimensional that it is difficult to derive a good controller from the spec, and hence reinforcement learning may be a reasonable option. But we would like to confirm the controller is good and not dangerous
  • Neural networks as computational accelerators. There are problems which we know how to solve, but are too slow. Neural networks can be evaluated very quickly and easily thanks to modern frameworks. It may be useful to presolve a large number of examples offline using the slow algorithm and train a neural network to give good estimates. We may be able to replace the slow thing entirely if we can verify the neural network always is good enough.

We’re going to use a neural network to fit a chebyshev polynomial. Here we’re picking a Chebyshev polynomial as our base truth because Chebyshev polynomials have some pleasant waviness to them. Why not. I like ’em. Also polynomials are easily understood by z3 as a base spec.

This example of course is a complete toy. How often do you see 1-D input space neural networks? Not often I bet.
But it’s nice for a couple reasons:

  1. Because we can visualize it.
  2. It drives home the point about neural nets being a space of piecewise linear function approximators, and how similar training is to curve fitting.
  3. It’s simple enough that z3 can crush it. There is a big question if z3 or other methods can scale to realistic neural nets. Modern top of the line neural nets are insanely huge. As we’ve done it here, I highly doubt it. There are special purpose SMT solvers being built for this task. Also the slightly different technology of mixed integer programming can be used and seems very promising. So this is an area of research. See links at the bottom for more.

Generally speaking, the combination of the capabilities of sympy and z3 give us access to some very intriguing possibilities. I’m not going to investigate this in detail in this post, but I are showing how you can convert a sympy derived polynomial into a python expression using lambdify, which can then be in turn used on z3 variables.

import sympy as sy
import matplotlib.pyplot as plt
import numpy as np
x = sy.symbols('x')
cheb = sy.lambdify(x, sy.chebyshevt(4,x))
xs = np.linspace(-1,1,1000)
ys = cheb(xs)
plt.plot(xs, ys)
plt.show()
view raw cheb.py hosted with ❤ by GitHub
Our actual chebyshev polynomial

Here we build a very small 3 layers network using Keras. We use a least squares error and an adam optimizer. Whatever. I actually had difficulty getting nice results out for higher order chebyshev polynomials. Neural networks are so fiddly.

import tensorflow as tf
from tensorflow import keras
model = keras.Sequential([
keras.layers.Dense(20, activation='relu', input_shape=[1]),
keras.layers.Dense(25, activation='relu'),
keras.layers.Dense(1)
])
optimizer = tf.keras.optimizers.Adam()
model.compile(loss='mse',
optimizer=optimizer,
metrics=['mae', 'mse'])
model.fit(xs, ys, epochs=100, verbose=0)
plt.plot(xs,model.predict(xs))
plt.show()
view raw train.py hosted with ❤ by GitHub
Our neural fit of the polynomial

And here we extract the weights and reinterpret them into z3. We could also have used z3 floating point capabilities rather than reals if you’re concerned about numerical issues. It was convenient to have my layers be different sizes, so that size mismatch would throw a python error. That’s how I found out the weights are transposed by default. The code at the end extracts a found countermodel and evaluates it. If you want to feel fancier, you can also use the prove function rather than an explicit Solver() instance. Saying you proved the neural network matches the polynomial to a certain tolerance feels really good. If you look at the graphs, the edges at 1 and -1 actually have pretty bad absolute error, around 0.5.

from z3 import *
w1, b1, w2, b2, w3, b3 = model.get_weights() # unpack weights from model
def Relu(x):
return np.vectorize(lambda y: If(y >= 0 , y, RealVal(0)))(x)
def Abs(x):
return If(x <= 0, -x, x)
def net(x):
x1 = w1.T @ x + b1
y1 = Relu(x1)
x2 = w2.T @ y1 + b2
y2 = Relu(x2)
x3 = w3.T @ y2 + b3
return x3
x = np.array([Real('x')])
y_true = cheb(x)
y_pred = net(x)
s = Solver()
s.add(-1 <= x[0], x[0] <= 1)
s.add(Abs( y_pred[0] - y_true[0] ) >= 0.5)
#prove(Implies( And(-1 <= x[0], x[0] <= 1), Abs( y_pred[0] - y_true[0] ) >= 0.2))
res = s.check()
print(res)
if res == sat:
m = s.model()
print("Bad x value:", m[x[0]])
x_bad = m[x[0]].numerator_as_long() / m[x[0]].denominator_as_long()
print("Error of prediction: ", abs(model.predict(np.array([x_bad])) - cheb(x_bad)))
view raw verify.py hosted with ❤ by GitHub

Links

https://github.com/sisl/NeuralVerification.jl

https://arxiv.org/abs/1711.07356 – Evaluating Robustness of Neural Networks with Mixed Integer Programming

https://github.com/vtjeng/MIPVerify.jl

https://arxiv.org/abs/1702.01135 – reluplex, SMT specifically for neural networks