I used to have no idea that multivariate polynomial equations had guaranteed methods that in some sense solve those systems. It’s pretty cool.
However, when I was pouring over the two Cox Little O’shea volumes, the chapter on modules made my eyes glaze over. Who ordered that? From my perspective, modules are vector spaces where you cripple the ability to divide scalars. Fair enough, but the language is extremely confusing and off-putting. Syzygy? Free Resolution? Everything described as homomorphisms and exact sequences? Forget it. Why do this? This is too abstract.
So I’ve been on the lookout for some application to motivate them. And I think I have at least one. Capacitor Inductor circuits.
The common way to describe inductor capacitors circuits is by using phasor analysis, where the resistances become impedances which have a frequency parameter in them. I’m not entirely convinced that it isn’t acceptable to just use linear algebra over rational functions of the frequency, but I have some reason to believe more carefulness regarding division may bear fruit. I suspect that carefulness with division corresponds to always sticky issues of boundary conditions.
On a slightly different front, I was very impressed by Jan Willems Open Dynamical systems. https://homes.esat.kuleuven.be/~sistawww/smc/jwillems/Articles/JournalArticles/2007.1.pdf In it, he talks about differential equations as describing sets of possible trajectories of systems. He uses module theory as a way to manipulate those sets and conditions from module theory to describe interesting qualitative features like controllability of those systems.
He sticks to the tools of Hermite and Smith forms of matrices, as he is mostly interested in single variable polynomials as the ring in question. Here’s my issues
I’m not really familiar with these forms
I can’t find good implementations of these. Perhaps here https://desr.readthedocs.io/en/latest/index.html (Differential Equation Symmetry Reduction), which seems like an interesting project for other reasons as well. Maybe I’m a fool, but I’d like to stick to python for the moment.
I also have an inkling that modules over multivariate polynomials will come in handy for playing around with band theory (or partial different equations for that matter). Maybe something interesting to be said regarding topological materials?
It seems like Groebner basis techniques should acceptably solve these systems as well. Converting between the analog of range and nullspace representations as I did in my previous post corresponds to syzygy calculations in the terminology of modules
Sympy does supply a Groebner basis algorithm, but not much beyond that. The AGCA module that should supply calculations over modules is mostly a lie. The documentation lists many functions that are not implemented. Which is too bad.
However, you can can hack in syzygy calculation into a Groebner basis calculation. I started pouring over chapter 5 of Using Algebra again, and it really has everything you need.
When one converts a set of polynomials to a Groebner basis, one is getting an equivalent set of polynomials with excellent properties. A Groebner basis is an analog of reduced echelon form of a matrix (these rows are equivalent to the old rows), and Buchberger’s algorithm is an analog of gaussian elimination. https://mattpap.github.io/masters-thesis/html/src/groebner.html#special-case-1-gauss-algorithm . You can find a decomposition of a polynomial in your ideal by a multivariate division algorithm with respect to the Groebner basis. This is the analog of the ability of an upper triangular matrix to be easily inverted.
You can perform a number of tricks by adding in dummy variables to the Groebner basis algorithm. The first thing you can do with this sort of trick is track how to write the Groebner basis in terms of the original basis. This is the analog of working with an augmented matrix during gaussian elimination. https://en.wikipedia.org/wiki/Augmented_matrix
I found this Maple documentation helpful in this regard (although formatted horrifically)
We want to track a matrix A that writes the Groebner basis vector G to the original vector of polynomials F. . We do it by attaching the each generator f of F a fresh marker variable f + m. Then the coefficients on m in the extended Groebner basis correspond to the matrix A. Think about it.
The other direction matrix can be found via the reduction algorithm with respect to the Grobner basis . This is pretty straightforward given that sympy implemented reduction for us.
From these we determine that
G = GBA F = FAB
Finding the syzygies of a set of generators is the analog of finding a nullspace of a matrix. A syzygy is a set of coefficients to “dot” onto the generators and get zero. In linear algebra talk, they are sort of orthogonal to the generator set.
The ability to find a nullspace gives you a lot of juice. One can phrase many problems, including solving a system of equations as a nullspace finding problem.
Proposition 3.3 of Using Algebra tells us how to calculate the generators of a syzygy module for a Groebner basis. It’s a little strange. The S-polynomial of two generators from the basis is zero after reduction by the basis. The S-polynomial plus the reduction = 0 gives us a very interesting identity on the generators (a syzygy) and it turns out that actually these generate all possible syzygies. This is still not obvious to me but the book does explain it.
#grobner tracking. Maintaining the relation of the grobner basis to the original
markers= [sy.Dummy() foriinrange(n)]
Fext= [ f+afora, finzip(markers, F)]
A= [[g.coeff(m) forminmarkers ] forginGext]
G= [ sy.simplify(g-sum( [ m*aaform,aainzip(markers, a) ] )) fora,ginzip(A,Gext) ] #remove dummy parts
Proposition 3.8 of Using Algebra tells us how to get the syzygies of the original generators given the previous information. We map back the generators of G and append the columns I – AB also
I – AB columns are syzygys of F. F (I – AB) = F – FAB = F- F = 0 using the equation from above F = FAB
I’m still trying to figure out how to do calculations on modules proper. I think it can be done be using dummy variables to turn module vectors into single expressions. There is an exercise in Using Algebra that mentions this.
We’re gonna pump this well until someone MAKES me stop.
This particular example is something that I’ve been trying to figure out for a long time, and I am pleasantly surprised at how simple it all seems to be. The key difference with my previous abortive attempts is that I’m not attempting the heavy computational lifting myself.
We can take pointful DSLs and convert them into point-free category theory inspired interface. In this case a very excellent pointful dsl for convex optimization: cvxpy. Some similar and related posts converting dsls to categorical form
A convex optimization problem optimizes a convex objective function with constraints that define a convex set like polytopes or balls. They are polynomial time tractable and shockingly useful. We can make a category out of convex optimization problems. We consider some variables to be “input” and some to be “output”. This choice is somewhat arbitrary as is the case for many “relation” feeling things that aren’t really so rigidly oriented.
Convex programming problems do have a natural notion of composition. Check out the last chapter of Rockafeller, where he talks about the convex algebra of bifunctions. Instead of summing over the inner composition variable like in Vect , or existentializing like in Rel , we instead minimize over the inner composition variable . These are similar operations in that they all produce bound variables.
The identity morphism is just the simple constraint that the input variables equal to output variables with an objective function of 0. This is an affine constraint, hence convex.
In general, if we ignore the objective part entirely by just setting it to zero, we’re actually working in a very computationally useful subcategory of Rel, ConvexRel, the category of relations which are convex sets. Composition corresponds to an existential operation, which is quickly solvable by convex optimization techniques. In operations research terms, these are feasibility problems rather than optimization problems. Many of the combinators do nothing to the objective.
The monoidal product just stacks variables side by side and adds the objectives and combines the constraints. They really are still independent problems. Writing things in this way opens up a possibility for parallelism. More on that some other day.
We can code this all up in python with little combinators that return the input, output, objective, constraintlist. We need to hide these in inner lambdas for appropriate fresh generation of variables.
returnx, x, 0, 
returna , b , o1+o2, c1+c2+ [a==a, b==b] # sigh. Don't ask. Alright. FINE. cvxpyp needs hstacks registered to populate them with values
returnx , z, o1+o2, c1+c2+ [y==y1]
returnx, cvx.vstack(x,x) , 0, 
x, y, o, c=self.res()
returnsol, x, y, o
x=cvx.Variable(1) # hmm. cvxpy doesn't like 0d variables. That's too bad
Now for a somewhat more concrete example: Model Predictive control of an unstable (linearized) pendulum.
Model predictive control is where you solve an optimization problem of the finite time rollout of a control system online. In other words, you take measurement of the current state, update the constraint in an optimization problem, ask the solver to solve it, and then apply the force or controls that the solver says is the best.
This gives the advantage over the LQR controller in that you can set hard inequality bounds on total force available, or positions where you wish to allow the thing to go. You don’t want your system crashing into some wall or falling over some cliff for example. These really are useful constraints in practice. You can also include possibly time dependent aspects of the dynamics of your system, possibly helping you model nonlinear dynamics of your system.
ADMM – It’s a “lens”. I’m pretty sure I know how to do it pointfree. Let’s do it next time.
The minimization can be bubbled out to the top is we are always minimizing. If we mix in maximization, then we can’t and we’re working on a more difficult problem. This is similar to what happens in Rel when you have relational division, which is a kind of universal quantification . Mixed quantifier problems in general are very tough. These kinds of problems include games, synthesis, and robustness. More on this some other day.
The minimization operation can be related to the summation operation by the method of steepest descent in some cases. The method of steepest descent approximates a sum or integral by evaluating it at it’s most dominant position and expanding out from there, hence converts a linear algebra thing into an optimization problem. Examples include the connection between statistical mechanics and thermodynamics and classical mechanics and quantum mechanics.
As a simple extension of verifying the sorting networks from before, we can synthesize optimally small sorting networks. The “program” of the sorting network is specified by a list of tuples of the elements we wish to compare and swap in order. We just generate all possible sequences of comparison operations and ask z3 to try verifying. If z3 says it verifies, we’re done.
As is, this is astoundingly slow. Truly truly abysmally slow. The combinatorics of really naively search through this space is abysmal. I doubt you’re going to get more than a network of size 6 out of this as is.
Some possible optimizations: early pruning of bad networks with testing, avoiding ever looking at obviously bad networks. Maybe a randomized search might be faster if one doesn’t care about optimality. We could also ask z3 to produce networks.
For more on program synthesis, check outNadia Polikarpova’s sick course here.
Petri nets are a framework for modeling dynamical systems that is very intuitive to some people. The vanilla version is that there are discrete tokens at nodes on a graph representing resources of some kind and tokens can be combined according to the firing of transition rules into new tokens in some other location.
This is a natural generalization of chemical reaction kinetics, where tokens are particular kinds of atoms that need to come together. It also is a useful notion for computer systems, where tokens represent some computational resource.
To me, this becomes rather similar to a flow problem or circuit problem. Tokens feel a bit like charge transitions are a bit like current (although not necessarily conservative). In a circuit, one can have such a small current that the particulate nature of electric current in terms of electrons is important. This happens for shot noise or for coulomb blockade for example.
If the number of tokens is very large, it seems intuitively sensible to me that one can well approximate the behavior by relaxing to a continuum. Circuits have discrete electrons and yet are very well approximated by ohm’s laws and the like. Populations are made of individuals, and yet in the thousands their dynamics are well described by differential equations.
It seems to me that mixed integer programming is a natural fit for this problem. Mixed integer programming has had its implementations and theory heavily refined for over 70 years so now very general purpose and performant off the shelf solvers are available. The way mixed integer programs are solved is by considering their quickly solved continuous relaxation (allowing fractional tokens and fractional transitions more akin to continuous electrical circuit flow) and using this information to systematically inform a discrete search process. This seems to me a reasonable starting approximation. Another name for petri nets is Vector Addition Systems, which has more of the matrix-y flavor.
The big piece is the weighted_block function. It let’s you build a combinator with an internal state and input and output flow variables. You give matrices with entries for every possible transition. Whether transitions occurred between and is indicated by integer variables. There is also possible accumulation of tokens at nodes, which also requires integer variables. Perhaps we’d want to expose the token state of the nodes to the outside too?
When I talked to Zach about this, he seemed skeptical that MIP solvers wouldn’t eat die a horrible death if you threw a moderately large petri net into them. Hard to say without trying.
There is an interesting analogy to be found with quantum field theory in that if you lift up to considering distributions of tokens, it looks like an occupation number representation. See Baez. http://math.ucr.edu/home/baez/stoch_stable.pdf
If you relax the integer constraint and the positivity constraints, this really is a finite difference formulation for capacitor circuits. The internal states would then be the charge of the capacitor. Would the positivity constraint be useful for diodes?
I wonder how relevant the chunky nature of petri nets might be for considering superconducting circuits, which have quantization of quantities from quantum mechanical effects.
Cvxpy let’s you describe convex regions. We can use this to implement a the convex subcategory of Rel which you can ask reachability questions. Relational division won’t work probably. I wonder if there is something fun there with respect the the integerizing operation and galois connections.
Edit: I should have googled a bit first. https://www.sciencedirect.com/science/article/pii/S0377221705009240 mathemtical programming tecchniques for petri net reachability. So it has been tried, and it sounds like the results weren’t insanely bad.
What I did was implement some wrappers around the graphviz python package. It exposes a not very feature rich stateful interface. It is pretty nice that it prints the graphs inline in jupyter notebooks though.
The code is written in a style very similar (and hopefully overloadable with) to that of z3py relation algebra. http://www.philipzucker.com/a-sketch-of-categorical-relation-algebra-combinators-in-z3py/ . There is a fairly general cookbook method here for categorifying dsl. Since graphviz does not directly expose fresh node creation as far as I can tell, I made my own using a random number generator. The actual combinators are graphviz object processors, so we build up a giant functional chain of processors and then actually execute it with run. The inputs and outputs are represented by lists of node names. The is some design space to consider here.
I also had to use class based wrappers Based on the precedent set by the python 3 matrix multiplication operator @, I think it is a requirement that this also be used for category composition. id is a keyword or something in python unfortunately. For monoidal product, I feel like overloading power ** looks nice even if it is a nonsensical analogy, * is also not too bad. I went with * for now.
The graphviz graphs aren’t quite string diagrams. They make no promise to preserve the ordering of your operations, but they seem to tend to.
def__init__(self,res): # just hold the graphviz processor function
deffresh(n): # makes random numbers for fresh node labels
cup = GraphCat.cup()
cap = GraphCat.cap()
d = cap @ (I * I) @ cup #(I * cap) @ (I * I * I) @ (cup * I)
d = plus @ (GC.const(1) * GC.const(2))
d = d.run()
GC = GraphCat
f = GraphCat.named_simple("f")
g = GraphCat.named_simple("g")
I = GraphCat.idd()
dump = GC.dump()
dup = GC.dup()
diagram = ((f * I) @ dup @ g @ (dump * I) @ (I * f) @ (f * f)) * g
Class based overloading is the main paradigm of overloading in python. You overload a program into different categories, by making a program take in the appropriate category class as a parameter.
# by passing in different category classes, we can make polymorphic functions
# They have to have a uniform interface though, which is hard to constrain in python.
idd = M.idd()
dump = M.dump()
dup = M.dup()
return (idd * dump) @ dup
For example something like this ought to work. Then you can get the graph of some matrix computation to go along with it’s numpy implementation
return f @ g
# and so on.
Floating points are nice and all. You can get pretty far pretending they are actually numbers. But they don’t obey some mathematical properties that feel pretty obvious. A classic to glance through is “What Every Computer Scientist Should Know About Floating-Point Arithmetic” https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
We can check some properties with z3py. Here are a couple simple properties that succeed for mathematical integers and reals, but fail for floating point
In it, she uses FlocQ and Coq to prove a somewhat surprising result that the naive formula actually is correct for the right rounding mode of floating point, something I wouldn’t have guessed.
Z3 confirms for Float16. I can’t get Float32 to come back after even a day on a fairly beefy computer. If I use FPSort(ebits,sbits) rather than a standard size, it just comes back unknown, so i can’t really see where the cutoff size is. This does not bode well for checking properties of floating point in z3 in general. I think a brute force for loop check of 32 bit float properties is feasible. I might even be pretty fast. To some degree, if z3 is taking forever to find a counterexample, I wonder to what to degree the property is probably true.
David Sanders (who lives in Julia land https://github.com/JuliaIntervals ) explained a bit of how interval constraint propagation library worked to me last night. He described it as being very similar to backpropagation, which sets off alarm bells for me.
The interesting thing the backward pass gives you is that everything feels a bit more relational rather than functional. The backward pass allows you to infer new information using constraints given down the line. For example, fuse :: Lens (a,a) a let’s you enforce that two variables we actually equal. The lens pattern lets you store the forward pass intervals in a closure, so that you can intersect it with the backwards pass intervals.
I make no guarantees what I have here is right. It’s a very rough first pass. It compiles, so that is cool I guess.
-- interval combinators
typeLensab=a-> (b, b->a)
-- x /\ x = x
instance (Latticea, Latticeb) =>Lattice (a,b) where
(x,y) /\ (a,b) = ( x /\ a , y /\ b )
(x,y) \/ (a,b) = ( x \/ a , y \/ b )
top = (top, top)
bottom = (bottom, bottom) -- ? very fishy
instance (Fractionala, Orda) =>Lattice (Intervala) where
top = whole
bottom = empty
_ /\ _ =()
_ \/ _ =()
top =()-- fishy
bottom =()-- fishy
id=\x -> (x, \x' -> x /\ x')
dup =\x -> ( (x,x), \(x1,x2) -> x1 /\ x2 )
fuse::Latticea=>Lens (a,a) a
fuse =\(x,x') ->let x'' = x /\ x' in ( x'' , \x''' ->let x4 = x'' /\ x''' in (x4, x4))
-- cap is equality
cap::Latticea=>Lens (a,a) ()
cap =\(x,x') -> ((), \_ ->let x'' = x /\ x' in (x'', x''))
Lens has a couple problems. It is probably doing way more work than it should, and we aren’t iterating to a fixed point.
Maybe an iterated lens would get us closer?
data Lens s t a b = Lens (a -> (b , (b -> (a, Lens s t a b))))
This is one way to go about the iterative process of updating a neural network in a functional way by evaluating it over and over and backpropagating. The updated weights will be stored in those closures. It seems kind of nice. It is clearly some relative of Iteratees and streaming libraries like pipes and conduit (which are also a compositional bidirectional programming pattern), the main difference being that it enforces a particular ordering of passes (for better or worse). Also I haven’t put in any monadic effects, which is to some degree the point of those libraries, but also extremely conceptually clouding to what is going on.
Another interesting possiblity is the type
type Lens s t a b = s -> (a, b -> t)
Lens s (Interval s) a (Interval a)
This has pieces that might be helpful for talking about continuous functions in a constructive way. It has the forward definition of the function, and then the inverse image of intervals. The inverse image function depends on the original evaluation point? Does this actually make sense? The definition of continuity is that this inverse image function must make arbitrarily small image intervals as you give it smaller and smaller range intervals. Continuity is compositional and plays nice with many arithmetic and structural combinators. So maybe something like this might be a nice abstraction for proof carrying continuous functions in Coq or Agda? Pure conjecture.
I’ve discussed relation algebra before. Relations are sets of tuples. There, I implemented the relations naively using lists for sets. This is very simple, and very clean especially with list comprehension syntax. It is however horrifically inefficient, and we could only deal with finitely enumerable domains. The easiest path to fixing these problems is to cash out to an external solver, in this case z3.
There are many beautifully implemented solvers out there and equally beautiful DSL/modeling languages. Examples in mind include sympy, cvxpy, and z3. These modeling languages require you to instantiate variable objects and build expressions out of them and then hand it off to the solver. This is a reasonable interface, but there are advantages to a more categorical/point-free style DSL.
Point-free languages are ones that do not include binding forms that introduce bound/dummy variables. Examples of binding forms like this are . One problem lies in the fact that the names of bound variables don’t matter, and that they end up accidentally smashing into each other. You may have experienced this in physics or math class as the dummy indices or dummy variable problem causing you to screw up your calculation of some cross product identity or some complicated tensor sum. These are surprisingly subtle problems, very difficult to diagnose and get right. de Bruijn indices is a technique for giving the bound variables canonical names, but it sucks to implement in its own way. When you make a DSL point free, it is a joy to manipulate, optimize, and search. I think this may be the core of why category theory is good language for mathematics and programming.
Point-free style also tends to have significant economy of size, for better or worse. Sometimes it is better to have an expression very dense in information. This is important if you are about the algebraically manipulate an expression with paper and pencil. Every manipulation requires a great deal of mind numbing copying as you proceed line by line, so it can be excruciating if your notation has a lot of unnecessary bulk.
Relations are like functions. The two pieces of the tuple can be roughly thought of as the “input” and the “output”. Relations are only loosely directional though. Part of the point of relations is that the converse (inverse) of a relation is easy to define.
When we are implement relations, we have a choice. Do we want the relation to produce its variables, accept its variable, or accept one and produce the other? There are advantages to each. When relations were [(a,b)], a -> b -> Bool, and a -> [b] converting between these forms was a rather painful enumeration process. The sting of converting between them is taken out by the fact that the conversion is no longer a very computationally expensive process, since we’re working at the modeling layer.
When you’re converting a pointful DSL to pointfree DSL, you have to be careful where you instantiate fresh variables or else you’ll end up with secret relations that you didn’t intend. Every instantiation of id needs to be using fresh variables for example. You don’t want the different id talking to each other. Sometimes achieving this involves a little currying and/or thunking.
There is a pattern that I have notice when I’m using modeling languages. When you have a function or relation on variables, there are constraints produced that you have to keep a record of. The pythonic way is to have a Model or Solver object, and then have that objects mutate an internal record of the set of constraints. I don’t particularly enjoy this style though. It feels like too much boiler plate.
In Haskell, I would use something like a Writer monad to automatically record the constraints that are occurring. Monads are not really all that pleasant even in Haskell, and especially a no go in python without “do” syntax.
However, because we are going point free it is no extra cost at all to include this pipework along for the ride in the composition operation.
Here are implementations of the identity and composition for three different styles. Style 1 is fully receptive, style 2 is mixed (function feeling) and style 3 is fully productive of variables.
Fair warning, I’m being sketchy here. I haven’t really tried this stuff out.
defrid1(x,y): # a receptive relations. It receives variables
defcompose1(f, sort, g): # annoying but we need to supply the sort of the inner variable
returnExists([y], And(f(x,y), g(y,z)))
defrid2(x): # a functional relation. It receives a variable then produces one.
returnz, Exists([y], And(cf,cg) )
defrid3(sort): # a type indexed generator of relations. Annoying we have to supply the type of the variable
z3 is a simply typed language. You can get away with some polymorphism at the python level (for example the == dispatches correctly accord to the object) but sometimes you need to manually specify the sort of the variables. Given these types, the different styles are interconvertible
Questions about relation algebra expressions are often phrased in term of relational inclusion. You can construct a relation algebra expression, use the rsub in the appropriate way and ask z3’s prove function if it is true.
Interval Relations – A point I was confused on. I thought interval relations were not interesting. But I was interpetting the term incorrectly. I was thinking of relations on AxB that are constrained to take the form of a product of intervals. In this case, the choice of A has no effect on the possible B whatsoever, so it feels non relational. However, there is also I_A x I_B , relations over the intervals of A and B. This is much closer to what is actually being discussed in interval arithmetic.
Applications we can use this for:
Graph Problems. The Edges can be thought of as a relation between vertices. Relation composition Using the starn operator is a way to ask for paths through the graph.
Safety and liveness of control systems. Again. a transition relation is natural here. It is conceivable that the state space can be heterogenous in time, which is the interesting power of the categorical style. I feel like traditional control systems usually maintain the same state space throughout.
I should try to comply with python conventions, in particular numpy and pandas conventions. @ should be composition for example, since relation composition has a lot of flavor of matrix composition. I should overload a lot of operators, but then I’d need to wrap in a class 🙁
As long as you only use composition, there is a chaining of existentials, which really isn’t so bad.
What we’ve done here is basically analogous/identical to what John Wiegley did compiling to the category of z3. Slightly different in that he only allowed for existential composition rather than relational division. http://newartisans.com/2017/04/haskell-and-z3/
We can reduced the burden on z3 if we know the constructive proof objects that witness our various operations. Z3 is gonna do better if we can tell it exactly which y witness the composition of operators, or clues to which branch of an Or it should use.
It’s a bummer, but when you use quantifiers, you don’t see countermodels? Maybe there is some hook where you can, or in the dump of the proof object.
What about recursion schemes? The exact nature of z3 to handle unbounded problems is fuzzy to me. It does have the support to define recursive functions. Also explicit induction predicates can go through sometimes. Maybe look at the Cata I made in fancy relaion algebra post
I think most proof assistants have implementations of relation algebra available. I find you can do a surprising amount in z3.
As I have gotten more into the concerns of formal methods, I’ve become unsure that ODEs actually exist. These are concerns that did not bother me much when I defined myself as being more in the physics game. How times change. Here’s a rough cut.
A difficulty with ODE error analysis is that it is very confusing how to get the error on something you are having difficulty approximating in the first place.
If I wanted to know the error of using a finite step size dt vs a size dt/10, great. Just compute both and compare. However, no amount of this seems to bootstrap you down to the continuum. And so I thought, you’re screwed in regards to using numerics in order to get true hard facts about the true solution. You have to go to paper and pencil considerations of equations and variables and epsilons and deltas and things. It is now clearer to me that this is not true. There is a field of verified/validated numerics.
A key piece of this seems to be interval arithmetic. https://en.wikipedia.org/wiki/Interval_arithmetic An interval can be concretely represented by its left and right point. If you use rational numbers, you can represent the interval precisely. Interval arithmetic over approximates operations on intervals in such a way as to keep things easily computable. One way it does this is by ignoring dependencies between different terms. Check out Moore et al’s book for more.
What switching over to intervals does is you think about sets as the things you’re operating on rather than points. For ODEs (and other things), this shift of perspective is to no longer consider individual functions, but instead sets of functions. And not arbitrary extremely complicated sets, only those which are concretely manipulable and storable on a computer like intervals. Taylor models are a particular choice of function sets. You are manipulating an interval tube around a finite polynomial. If during integration / multiplication you get higher powers, truncate the polynomials by dumping the excess into the interval term. This keeps the complexity under wraps and closes the loop of the descriptive system.
If we have an iterative, contractive process for getting better and better solutions of a problem (like a newton method or some iterative linear algebra method), we can get definite bounds on the solution if we can demonstrate that a set maps into itself under this operation. If this is the case and we know there is a unique solution, then it must be in this set.
It is wise if at all possible to convert an ODE into integral form. is the same as .
For ODEs, the common example of such an operation is known as Picard iteration. In physical terms, this is something like the impulse approximation / born approximation. One assumes that the ODE evolves according to a known trajectory as a first approximation. Then one plugs in the trajectory to the equations of motion to determine the “force” it would feel and integrate up all this force. This creates a better approximation (probably) which you can plug back in to create an even better approximation.
If we instead do this iteration on an intervally function set / taylor model thing, and can show that the set maps into itself, we know the true solution lies in this interval. The term to search for is Taylor Models (also some links below).
I was tinkering with whether sum of squares optimization might tie in to this. I have not seen SOS used in this context, but it probably has or is worthless.
But that isn’t really what makes Sum of squares special. There are other methods by which to do this.
There are very related methods called DSOS and SDSOS https://arxiv.org/abs/1706.02586 which are approximations of the SOS method. They replace the SDP constraint at the core with a more restrictive constraint that can be expressed with LP and socp respectively. These methods lose some of the universality of the SOS method and became basis dependent on your choice of polynomials. DSOS in fact is based around the concept of a diagonally dominant matrix, which means that you should know roughly what basis your certificate should be in.
This made me realize there is an even more elementary version of DSOS that perhaps should have been obvious to me from the outset. Suppose we have a set of functions we already know are positive everywhere on a domain of interest. A useful example is the raised chebyshev polynomials. https://en.wikipedia.org/wiki/Chebyshev_polynomials The appropriate chebyshev polynomials oscillate between [-1,1] on the interval [-1,1], so if you add 1 to them they are positive over the whole interval [-1,1]. Then nonnegative linear sums of them are also positive. Bing bang boom. And that compiles down into a simple linear program (inequality constraints on the coefficients) with significantly less variables than DSOS. What we are doing is restricting ourselves to completely positive diagonal matrices again, which are of course positive semidefinite. It is less flexible, but it also has more obvious knobs to throw in domain specific knowledge. You can use a significantly over complete basis and finding this basis is where you can insert your prior knowledge.
It is not at all clear there is any benefit over interval based methods.
Here is a sketch I wrote for which has solution . I used raised chebyshev polynomials to enforce positive polynomial constraints and tossed in a little taylor model / interval arithmetic to truncate off the highest terms.
I’m using my helper functions for translating between sympy and cvxpy expressions. https://github.com/philzook58/cvxpy-helpers Sympy is great for collecting up the coefficients on terms and polynomial multiplication integration differentiation etc. I do it by basically creating sympy matrix variables corresponding to cvxpy variables which I compile to cvxpy expressions using lambdify with an explicit variable dictionary.
# seems like even N becomes infeasible.
terms= [sy.chebyshevt(n,t) +1forninrange(N)] # raised chebyshev functions are positive on interval [-1,1]
for i in range(1,4):
ts = np.linspace(-1,1,100)
plt.plot( ts , sy.lambdify(t,terms[i])(ts))
l,d=sos.polyvar(terms) # lower bound on solution
w,d=sos.polyvar(terms, nonneg=True) # width of tube. Width is always positive (nonneg)
u=l+w# upper curve is higher than lower by width
returnsy.integrate(f, [t,-1,t]) +np.exp(-1) # picard integration on [-1,1] interval with initial cond x(-1)=1/e
defsplit(y , N): # split a polynomial into lower an upper parts.
lower=sum([ c*t**pfor (p,), cinyp.terms() ifp<N])
#upper = sum([ c*x**p for (p,), c in yp.terms() if p > N])
terms= [sy.chebyshevt(n,t) +1forninrange(N+1)]
# ui <= u
lowerui, upperui=split(ui, N) # need to truncate highest power of u using interval method
du=upperui.subs(t,1) #Is this where the even dependence of N comes from?
#c += [ du >= sos.cvxify(upperui.subs(t,1), vdict), du >= sos.cvxify(upperui.subs(t,1)] # , upperui.subs(t,-1))
Functional analysis by and large analyzes functions by analogy with more familiar properties of finite dimensional vector spaces. In ordinary 2d space, it is convenient to work with rectangular regions or polytopic regions.
Suppose I had a damped oscillator converging to some unknown point. If we can show that every point in a set maps within the set, we can show that the function
One model of a program is that it is some kind of kooky complicated hyper nonlinear discrete time dynamical system. And vice versa, dynamical systems are continuous time programs. The techniques for analyzing either have analogs in the other domain. Invariants of programs are essential for determining correctness properties of loops. Invariants like energy and momentum are essential for determining what physical systems can and cannot do. Lyapunov functions demonstrate that control systems are converging to the set point. Terminating metrics are showing that loops and recursion must eventually end.
If instead you use interval arithmetic for a bound on your solution rather than your best current solution, and if you can show the interval maps inside itself, then you know that the iterative process must converge inside of the interval, hence that is where the true solution lies.
A very simple bound for an integral is
The integral is a very nice operator. The result of the integral is a positive linear sum of the values of a function. This means it plays nice with inequalities.
Rigorously Bounding ODE solutions with Sum of Squares optimization – Intervals
Intervals – Moore book. Computational functional analaysis. Tucker book. Coqintervals. fixed point theorem. Hardware acceleration? Interval valued functions. Interval extensions.
Banach fixed point – contraction mapping
Brouwer fixed point
Picard iteration vs? Allowing flex on boundary conditions via an interval?
Interval book had an interesting integral form for the 2-D
sympy has cool stuff
google scholar search z3, sympy brings up interesting things
Lyapunov functions. Piecewise affine lyapunov funcions. Are lyapunov functions kind of like a PDE? Value functions are pdes. If the system is piecewise affine we can define a grid on the same piecewise affine thingo. Compositional convexity. Could we use compositional convexity + Relu style piecewise affinity to get complicated lyapunov functions. Lyapunov functions don’t have to be continiuous, they just have to be decreasing. The Lie derivative wrt the flow is always negative, i.e gradeint of function points roughly in direction of flow. trangulate around equilibrium if you want to avoid quadratic lyapunov. For guarded system, can relax lyapunov constrain outside of guard if you tighten inside guard. Ax>= 0 is guard. Its S-procedure.
Idea: Approximate invariants? At least this ought to make a good coordinate system to work in where the dynamics are slow. Like action-angle and adiabatic transformations. Could also perhaps bound the
I have a method that I’m not sure is ultimately sound. The idea is to start with
Error analysis most often uses an appeal to Taylor’s theorem and Taylor’s theorem is usually derived from them mean value theorem or intermediate value theorem. Maybe that’s fine. But the mean value theorem is some heavy stuff. There are computational doo dads that use these bounds + interval analysis to rigorously integrate ODEs. See https://github.com/JuliaIntervals/TaylorModels.jl
The beauty of sum of squares certificates is that they are very primitive proofs of positivity for a function on a domain of infinitely many values. If I give you a way to write an expression as a sum of square terms, it is then quite obvious that it has to be always positive. This is algebra rather than analysis.
. Sum of squares is a kind of a quantifier elimination method. The reverse direction of the above implication is the subject of the positivstullensatz, a theorem of real algebraic geometry. At the very least, we can use the SOS constraint as a relaxation of the quantified constraint.
So, I think by using sum of squares, we can turn a differential equation into a differential inequation. If we force the highest derivative to be larger than the required differential equation, we will get an overestimate of the required function.
A function that is dominated by another in derivative, will be dominated in value also. You can integrate over inequalities (I think. You have to be careful about such things. ) x(t) – x(0) >= y(t) – y(0) $
The derivative of a polynomial can be thought of as a completely formal operation, with no necessarily implied calculus meaning. It seems we can play a funny kind of shell game to avoid the bulk of calculus.
As an example, let’s take with the solution . is a transcendental
The S-procedure is trick by which you can relax a sum of squares inequality to only need to be enforced in a domain. If you build a polynomials function that describes the domain, it that it is positive inside the domain and negative outside the domain, you can add a positive multiple of that to your SOS inequalities. Inside the domain you care about, you’ve only made them harder to satisfy, not easier. But outside the domain you have made it easier because you can have negative slack.
For the domain the polynomial works as our domain polynomial.
We parametrize our solution as an explicit polynomial . It is important to note that what follows is always linear in the .
can be relaxed to with .
So with that we get a reasonable formulation of finding a polynomial upper bound solution of the differential equation
And here it is written out in python using my cvxpy-helpers which bridge the gap between sympy polynomials and cvxpy.
We can go backwards to figure out sufficient conditions for a bound. We want . It is sufficient that . For this it is sufficient that . We follow this down in derivative until we get the lowest derivative in the differential equation. Then we can use the linear differential equation itself . .
. This accounts for the possibility of terms changing signs. Or you could separate the terms into regions of constant sign.
The minimization characterization of the bound is useful. For any class of functions that contains our degree-d polynomial, we can show that the minimum of the same optimization problem is less than or equal to our value.
Is the dual value useful? The lower bound on the least upper bound
Doesn’t seem like the method will work for nonlinear odes. Maybe it will if you relax the nonlinearity. Or you could use perhaps a MIDSP to make piecewise linear approximations of the nonlinearity?
It is interesting to investigtae linear programming models. It is simpler and more concrete to examine how well different step sizes approximate each other rather than worry about the differential case.
We can explicit compute a finite difference solution in the LP, which is a power that is difficult to achieve in general for differential equations.
We can instead remove the exact solution by a convservative bound.
While we can differentiate through an equality, we can’t differentiate through an inequality. Differentiation involves negation, which plays havoc with inequalities. We can however integrate through inequalities.
x(t) >= \int^t_0 f(x) + a$
As a generalization we can integrate over inequalities as long as
In particular x(t) – x(0) >= y(t) – y(0) $
We can convert a differential equation into a differential inequation. It is not entirely clear to me that there is a canonical way to do this. But it works to take the biggest.
Neural networks are all the rage these days. One mundane way of looking at neural networks is that they are a particular class of parametrized functions . What makes them useful is:
They can be used at insane scale due to their simplicity and excellent available implementations and tooling
There are intuitive ways to input abstract structure and symmetry expected of a problem, for example translation symmetry, or a hierarchy of small scale pattern recognition combining into large scale structures. How this all works is very peculiar.
Inspirational analogies can be drawn from nature.
Neural networks made out of just relus (rectified linear units, relu(x) = max(0,x) ) and linear layers are particularly amenable to formal analysis. Regarding the weights as fixed (once a network has be trained), the complete neural network is a piecewise linear function. The regions where it is piecewise define are polyhedra (They are defined by the hyperplanes of the relu transitions as inequality constraints). Such functions are among those the most amenable to automated rigorous analysis.
Most machine learning tasks don’t have a mathematically precise specification. What is the mathematically precise definition of a picture of a horse? We could try to come up with something (this is sort of what good old fashioned AI tried to do), but in my opinion it would be rather suspect.
Tasks that do have a precise spec are questionable areas for machine learning techniques, because how can you know that the network meets the spec? Also, one would suspect such a problem would have structure that you might be better off with a more traditional algorithmic approach.
However, there a a couple areas where one does have reasonable formal questions one might want to ask about a neural network:
Robustness around training and validation data. Finding Adversarial examples or proving there are none.
Games like Go. Alpha Go is a marriage of more traditional algorithmic approaches and machine learning. There is a core of traditional game tree search to it.
Control Systems – There are many control systems which we do have a reasonable formal spec one could write, such as walking robots. These systems are so high dimensional that it is difficult to derive a good controller from the spec, and hence reinforcement learning may be a reasonable option. But we would like to confirm the controller is good and not dangerous
Neural networks as computational accelerators. There are problems which we know how to solve, but are too slow. Neural networks can be evaluated very quickly and easily thanks to modern frameworks. It may be useful to presolve a large number of examples offline using the slow algorithm and train a neural network to give good estimates. We may be able to replace the slow thing entirely if we can verify the neural network always is good enough.
We’re going to use a neural network to fit a chebyshev polynomial. Here we’re picking a Chebyshev polynomial as our base truth because Chebyshev polynomials have some pleasant waviness to them. Why not. I like ’em. Also polynomials are easily understood by z3 as a base spec.
This example of course is a complete toy. How often do you see 1-D input space neural networks? Not often I bet. But it’s nice for a couple reasons:
Because we can visualize it.
It drives home the point about neural nets being a space of piecewise linear function approximators, and how similar training is to curve fitting.
It’s simple enough that z3 can crush it. There is a big question if z3 or other methods can scale to realistic neural nets. Modern top of the line neural nets are insanely huge. As we’ve done it here, I highly doubt it. There are special purpose SMT solvers being built for this task. Also the slightly different technology of mixed integer programming can be used and seems very promising. So this is an area of research. See links at the bottom for more.
Generally speaking, the combination of the capabilities of sympy and z3 give us access to some very intriguing possibilities. I’m not going to investigate this in detail in this post, but I are showing how you can convert a sympy derived polynomial into a python expression using lambdify, which can then be in turn used on z3 variables.
Here we build a very small 3 layers network using Keras. We use a least squares error and an adam optimizer. Whatever. I actually had difficulty getting nice results out for higher order chebyshev polynomials. Neural networks are so fiddly.
And here we extract the weights and reinterpret them into z3. We could also have used z3 floating point capabilities rather than reals if you’re concerned about numerical issues. It was convenient to have my layers be different sizes, so that size mismatch would throw a python error. That’s how I found out the weights are transposed by default. The code at the end extracts a found countermodel and evaluates it. If you want to feel fancier, you can also use the prove function rather than an explicit Solver() instance. Saying you proved the neural network matches the polynomial to a certain tolerance feels really good. If you look at the graphs, the edges at 1 and -1 actually have pretty bad absolute error, around 0.5.
w1, b1, w2, b2, w3, b3=model.get_weights() # unpack weights from model
returnnp.vectorize(lambday: If(y>=0 , y, RealVal(0)))(x)