Petri nets are a framework for modeling dynamical systems that is very intuitive to some people. The vanilla version is that there are discrete tokens at nodes on a graph representing resources of some kind and tokens can be combined according to the firing of transition rules into new tokens in some other location.
This is a natural generalization of chemical reaction kinetics, where tokens are particular kinds of atoms that need to come together. It also is a useful notion for computer systems, where tokens represent some computational resource.
To me, this becomes rather similar to a flow problem or circuit problem. Tokens feel a bit like charge transitions are a bit like current (although not necessarily conservative). In a circuit, one can have such a small current that the particulate nature of electric current in terms of electrons is important. This happens for shot noise or for coulomb blockade for example.
If the number of tokens is very large, it seems intuitively sensible to me that one can well approximate the behavior by relaxing to a continuum. Circuits have discrete electrons and yet are very well approximated by ohm’s laws and the like. Populations are made of individuals, and yet in the thousands their dynamics are well described by differential equations.
It seems to me that mixed integer programming is a natural fit for this problem. Mixed integer programming has had its implementations and theory heavily refined for over 70 years so now very general purpose and performant off the shelf solvers are available. The way mixed integer programs are solved is by considering their quickly solved continuous relaxation (allowing fractional tokens and fractional transitions more akin to continuous electrical circuit flow) and using this information to systematically inform a discrete search process. This seems to me a reasonable starting approximation. Another name for petri nets is Vector Addition Systems, which has more of the matrix-y flavor.
I’m like 70% sure what I did here makes sense, but I’m pretty sure the general idea makes sense with some fiddling.
The big piece is the weighted_block function. It let’s you build a combinator with an internal state and input and output flow variables. You give matrices with entries for every possible transition. Whether transitions occurred between and is indicated by integer variables. There is also possible accumulation of tokens at nodes, which also requires integer variables. Perhaps we’d want to expose the token state of the nodes to the outside too?
When I talked to Zach about this, he seemed skeptical that MIP solvers wouldn’t eat die a horrible death if you threw a moderately large petri net into them. Hard to say without trying.
There is an interesting analogy to be found with quantum field theory in that if you lift up to considering distributions of tokens, it looks like an occupation number representation. See Baez. http://math.ucr.edu/home/baez/stoch_stable.pdf
If you relax the integer constraint and the positivity constraints, this really is a finite difference formulation for capacitor circuits. The internal states would then be the charge of the capacitor. Would the positivity constraint be useful for diodes?
I wonder how relevant the chunky nature of petri nets might be for considering superconducting circuits, which have quantization of quantities from quantum mechanical effects.
Cvxpy let’s you describe convex regions. We can use this to implement a the convex subcategory of Rel which you can ask reachability questions. Relational division won’t work probably. I wonder if there is something fun there with respect the the integerizing operation and galois connections.
Edit: I should have googled a bit first. https://www.sciencedirect.com/science/article/pii/S0377221705009240 mathemtical programming tecchniques for petri net reachability. So it has been tried, and it sounds like the results weren’t insanely bad.
What I did was implement some wrappers around the graphviz python package. It exposes a not very feature rich stateful interface. It is pretty nice that it prints the graphs inline in jupyter notebooks though.
The code is written in a style very similar (and hopefully overloadable with) to that of z3py relation algebra. http://www.philipzucker.com/a-sketch-of-categorical-relation-algebra-combinators-in-z3py/ . There is a fairly general cookbook method here for categorifying dsl. Since graphviz does not directly expose fresh node creation as far as I can tell, I made my own using a random number generator. The actual combinators are graphviz object processors, so we build up a giant functional chain of processors and then actually execute it with run. The inputs and outputs are represented by lists of node names. The is some design space to consider here.
I also had to use class based wrappers Based on the precedent set by the python 3 matrix multiplication operator @, I think it is a requirement that this also be used for category composition. id is a keyword or something in python unfortunately. For monoidal product, I feel like overloading power ** looks nice even if it is a nonsensical analogy, * is also not too bad. I went with * for now.
The graphviz graphs aren’t quite string diagrams. They make no promise to preserve the ordering of your operations, but they seem to tend to.
Here’s some example usage
cup = GraphCat.cup()
cap = GraphCat.cap()
d = cap @ (I * I) @ cup #(I * cap) @ (I * I * I) @ (cup * I)
d = plus @ (GC.const(1) * GC.const(2))
d = d.run()
GC = GraphCat
f = GraphCat.named_simple("f")
g = GraphCat.named_simple("g")
I = GraphCat.idd()
dump = GC.dump()
dup = GC.dup()
diagram = ((f * I) @ dup @ g @ (dump * I) @ (I * f) @ (f * f)) * g
Class based overloading is the main paradigm of overloading in python. You overload a program into different categories, by making a program take in the appropriate category class as a parameter.
# by passing in different category classes, we can make polymorphic functions
# They have to have a uniform interface though, which is hard to constrain in python.
idd = M.idd()
dump = M.dump()
dup = M.dup()
return (idd * dump) @ dup
For example something like this ought to work. Then you can get the graph of some matrix computation to go along with it’s numpy implementation
return f @ g
# and so on.
Floating points are nice and all. You can get pretty far pretending they are actually numbers. But they don’t obey some mathematical properties that feel pretty obvious. A classic to glance through is “What Every Computer Scientist Should Know About Floating-Point Arithmetic” https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
We can check some properties with z3py. Here are a couple simple properties that succeed for mathematical integers and reals, but fail for floating point
In it, she uses FlocQ and Coq to prove a somewhat surprising result that the naive formula actually is correct for the right rounding mode of floating point, something I wouldn’t have guessed.
Z3 confirms for Float16. I can’t get Float32 to come back after even a day on a fairly beefy computer. If I use FPSort(ebits,sbits) rather than a standard size, it just comes back unknown, so i can’t really see where the cutoff size is. This does not bode well for checking properties of floating point in z3 in general. I think a brute force for loop check of 32 bit float properties is feasible. I might even be pretty fast. To some degree, if z3 is taking forever to find a counterexample, I wonder to what to degree the property is probably true.
David Sanders (who lives in Julia land https://github.com/JuliaIntervals ) explained a bit of how interval constraint propagation library worked to me last night. He described it as being very similar to backpropagation, which sets off alarm bells for me.
The interesting thing the backward pass gives you is that everything feels a bit more relational rather than functional. The backward pass allows you to infer new information using constraints given down the line. For example, fuse :: Lens (a,a) a let’s you enforce that two variables we actually equal. The lens pattern lets you store the forward pass intervals in a closure, so that you can intersect it with the backwards pass intervals.
I make no guarantees what I have here is right. It’s a very rough first pass. It compiles, so that is cool I guess.
Lens has a couple problems. It is probably doing way more work than it should, and we aren’t iterating to a fixed point.
Maybe an iterated lens would get us closer?
data Lens s t a b = Lens (a -> (b , (b -> (a, Lens s t a b))))
This is one way to go about the iterative process of updating a neural network in a functional way by evaluating it over and over and backpropagating. The updated weights will be stored in those closures. It seems kind of nice. It is clearly some relative of Iteratees and streaming libraries like pipes and conduit (which are also a compositional bidirectional programming pattern), the main difference being that it enforces a particular ordering of passes (for better or worse). Also I haven’t put in any monadic effects, which is to some degree the point of those libraries, but also extremely conceptually clouding to what is going on.
Another interesting possiblity is the type
type Lens s t a b = s -> (a, b -> t)
Lens s (Interval s) a (Interval a)
This has pieces that might be helpful for talking about continuous functions in a constructive way. It has the forward definition of the function, and then the inverse image of intervals. The inverse image function depends on the original evaluation point? Does this actually make sense? The definition of continuity is that this inverse image function must make arbitrarily small image intervals as you give it smaller and smaller range intervals. Continuity is compositional and plays nice with many arithmetic and structural combinators. So maybe something like this might be a nice abstraction for proof carrying continuous functions in Coq or Agda? Pure conjecture.
I haven’t been able to replicate this behavior in any way, nor can anyone else I’ve asked, upon cursory examination of the code of my blog I can’t find anything, everything is up to date, and scans aren’t finding anything.
Maybe it’s time to finally transfer out of wordpress. I liked that it was easy, but the internet is too scary.
Also, if you’re a hacker, please pick on someone else. I really like my blog.
I’ve discussed relation algebra before. Relations are sets of tuples. There, I implemented the relations naively using lists for sets. This is very simple, and very clean especially with list comprehension syntax. It is however horrifically inefficient, and we could only deal with finitely enumerable domains. The easiest path to fixing these problems is to cash out to an external solver, in this case z3.
There are many beautifully implemented solvers out there and equally beautiful DSL/modeling languages. Examples in mind include sympy, cvxpy, and z3. These modeling languages require you to instantiate variable objects and build expressions out of them and then hand it off to the solver. This is a reasonable interface, but there are advantages to a more categorical/point-free style DSL.
Point-free languages are ones that do not include binding forms that introduce bound/dummy variables. Examples of binding forms like this are . One problem lies in the fact that the names of bound variables don’t matter, and that they end up accidentally smashing into each other. You may have experienced this in physics or math class as the dummy indices or dummy variable problem causing you to screw up your calculation of some cross product identity or some complicated tensor sum. These are surprisingly subtle problems, very difficult to diagnose and get right. de Bruijn indices is a technique for giving the bound variables canonical names, but it sucks to implement in its own way. When you make a DSL point free, it is a joy to manipulate, optimize, and search. I think this may be the core of why category theory is good language for mathematics and programming.
Point-free style also tends to have significant economy of size, for better or worse. Sometimes it is better to have an expression very dense in information. This is important if you are about the algebraically manipulate an expression with paper and pencil. Every manipulation requires a great deal of mind numbing copying as you proceed line by line, so it can be excruciating if your notation has a lot of unnecessary bulk.
Relations are like functions. The two pieces of the tuple can be roughly thought of as the “input” and the “output”. Relations are only loosely directional though. Part of the point of relations is that the converse (inverse) of a relation is easy to define.
When we are implement relations, we have a choice. Do we want the relation to produce its variables, accept its variable, or accept one and produce the other? There are advantages to each. When relations were [(a,b)], a -> b -> Bool, and a -> [b] converting between these forms was a rather painful enumeration process. The sting of converting between them is taken out by the fact that the conversion is no longer a very computationally expensive process, since we’re working at the modeling layer.
When you’re converting a pointful DSL to pointfree DSL, you have to be careful where you instantiate fresh variables or else you’ll end up with secret relations that you didn’t intend. Every instantiation of id needs to be using fresh variables for example. You don’t want the different id talking to each other. Sometimes achieving this involves a little currying and/or thunking.
There is a pattern that I have notice when I’m using modeling languages. When you have a function or relation on variables, there are constraints produced that you have to keep a record of. The pythonic way is to have a Model or Solver object, and then have that objects mutate an internal record of the set of constraints. I don’t particularly enjoy this style though. It feels like too much boiler plate.
In Haskell, I would use something like a Writer monad to automatically record the constraints that are occurring. Monads are not really all that pleasant even in Haskell, and especially a no go in python without “do” syntax.
However, because we are going point free it is no extra cost at all to include this pipework along for the ride in the composition operation.
Here are implementations of the identity and composition for three different styles. Style 1 is fully receptive, style 2 is mixed (function feeling) and style 3 is fully productive of variables.
Fair warning, I’m being sketchy here. I haven’t really tried this stuff out.
z3 is a simply typed language. You can get away with some polymorphism at the python level (for example the == dispatches correctly accord to the object) but sometimes you need to manually specify the sort of the variables. Given these types, the different styles are interconvertible
We can create the general cadre of relation algebra operators. Here is a somewhat incomplete list
Questions about relation algebra expressions are often phrased in term of relational inclusion. You can construct a relation algebra expression, use the rsub in the appropriate way and ask z3’s prove function if it is true.
Z3 has solvers for
Interval Relations – A point I was confused on. I thought interval relations were not interesting. But I was interpetting the term incorrectly. I was thinking of relations on AxB that are constrained to take the form of a product of intervals. In this case, the choice of A has no effect on the possible B whatsoever, so it feels non relational. However, there is also I_A x I_B , relations over the intervals of A and B. This is much closer to what is actually being discussed in interval arithmetic.
Applications we can use this for:
Graph Problems. The Edges can be thought of as a relation between vertices. Relation composition Using the starn operator is a way to ask for paths through the graph.
Safety and liveness of control systems. Again. a transition relation is natural here. It is conceivable that the state space can be heterogenous in time, which is the interesting power of the categorical style. I feel like traditional control systems usually maintain the same state space throughout.
I should try to comply with python conventions, in particular numpy and pandas conventions. @ should be composition for example, since relation composition has a lot of flavor of matrix composition. I should overload a lot of operators, but then I’d need to wrap in a class 🙁
As long as you only use composition, there is a chaining of existentials, which really isn’t so bad.
What we’ve done here is basically analogous/identical to what John Wiegley did compiling to the category of z3. Slightly different in that he only allowed for existential composition rather than relational division. http://newartisans.com/2017/04/haskell-and-z3/
We can reduced the burden on z3 if we know the constructive proof objects that witness our various operations. Z3 is gonna do better if we can tell it exactly which y witness the composition of operators, or clues to which branch of an Or it should use.
It’s a bummer, but when you use quantifiers, you don’t see countermodels? Maybe there is some hook where you can, or in the dump of the proof object.
What about recursion schemes? The exact nature of z3 to handle unbounded problems is fuzzy to me. It does have the support to define recursive functions. Also explicit induction predicates can go through sometimes. Maybe look at the Cata I made in fancy relaion algebra post
I think most proof assistants have implementations of relation algebra available. I find you can do a surprising amount in z3.
As I have gotten more into the concerns of formal methods, I’ve become unsure that ODEs actually exist. These are concerns that did not bother me much when I defined myself as being more in the physics game. How times change. Here’s a rough cut.
A difficulty with ODE error analysis is that it is very confusing how to get the error on something you are having difficulty approximating in the first place.
If I wanted to know the error of using a finite step size dt vs a size dt/10, great. Just compute both and compare. However, no amount of this seems to bootstrap you down to the continuum. And so I thought, you’re screwed in regards to using numerics in order to get true hard facts about the true solution. You have to go to paper and pencil considerations of equations and variables and epsilons and deltas and things. It is now clearer to me that this is not true. There is a field of verified/validated numerics.
A key piece of this seems to be interval arithmetic. https://en.wikipedia.org/wiki/Interval_arithmetic An interval can be concretely represented by its left and right point. If you use rational numbers, you can represent the interval precisely. Interval arithmetic over approximates operations on intervals in such a way as to keep things easily computable. One way it does this is by ignoring dependencies between different terms. Check out Moore et al’s book for more.
What switching over to intervals does is you think about sets as the things you’re operating on rather than points. For ODEs (and other things), this shift of perspective is to no longer consider individual functions, but instead sets of functions. And not arbitrary extremely complicated sets, only those which are concretely manipulable and storable on a computer like intervals. Taylor models are a particular choice of function sets. You are manipulating an interval tube around a finite polynomial. If during integration / multiplication you get higher powers, truncate the polynomials by dumping the excess into the interval term. This keeps the complexity under wraps and closes the loop of the descriptive system.
If we have an iterative, contractive process for getting better and better solutions of a problem (like a newton method or some iterative linear algebra method), we can get definite bounds on the solution if we can demonstrate that a set maps into itself under this operation. If this is the case and we know there is a unique solution, then it must be in this set.
It is wise if at all possible to convert an ODE into integral form. is the same as .
For ODEs, the common example of such an operation is known as Picard iteration. In physical terms, this is something like the impulse approximation / born approximation. One assumes that the ODE evolves according to a known trajectory as a first approximation. Then one plugs in the trajectory to the equations of motion to determine the “force” it would feel and integrate up all this force. This creates a better approximation (probably) which you can plug back in to create an even better approximation.
If we instead do this iteration on an intervally function set / taylor model thing, and can show that the set maps into itself, we know the true solution lies in this interval. The term to search for is Taylor Models (also some links below).
I was tinkering with whether sum of squares optimization might tie in to this. I have not seen SOS used in this context, but it probably has or is worthless.
But that isn’t really what makes Sum of squares special. There are other methods by which to do this.
There are very related methods called DSOS and SDSOS https://arxiv.org/abs/1706.02586 which are approximations of the SOS method. They replace the SDP constraint at the core with a more restrictive constraint that can be expressed with LP and socp respectively. These methods lose some of the universality of the SOS method and became basis dependent on your choice of polynomials. DSOS in fact is based around the concept of a diagonally dominant matrix, which means that you should know roughly what basis your certificate should be in.
This made me realize there is an even more elementary version of DSOS that perhaps should have been obvious to me from the outset. Suppose we have a set of functions we already know are positive everywhere on a domain of interest. A useful example is the raised chebyshev polynomials. https://en.wikipedia.org/wiki/Chebyshev_polynomials The appropriate chebyshev polynomials oscillate between [-1,1] on the interval [-1,1], so if you add 1 to them they are positive over the whole interval [-1,1]. Then nonnegative linear sums of them are also positive. Bing bang boom. And that compiles down into a simple linear program (inequality constraints on the coefficients) with significantly less variables than DSOS. What we are doing is restricting ourselves to completely positive diagonal matrices again, which are of course positive semidefinite. It is less flexible, but it also has more obvious knobs to throw in domain specific knowledge. You can use a significantly over complete basis and finding this basis is where you can insert your prior knowledge.
It is not at all clear there is any benefit over interval based methods.
Here is a sketch I wrote for which has solution . I used raised chebyshev polynomials to enforce positive polynomial constraints and tossed in a little taylor model / interval arithmetic to truncate off the highest terms.
I’m using my helper functions for translating between sympy and cvxpy expressions. https://github.com/philzook58/cvxpy-helpers Sympy is great for collecting up the coefficients on terms and polynomial multiplication integration differentiation etc. I do it by basically creating sympy matrix variables corresponding to cvxpy variables which I compile to cvxpy expressions using lambdify with an explicit variable dictionary.
Seems to work, but I’ve been burned before.
man, LP solvers are so much better than SDP solvers
Random junk and links: Should I be more ashamed of dumps like this? I don’t expect you to read this.
Functional analysis by and large analyzes functions by analogy with more familiar properties of finite dimensional vector spaces. In ordinary 2d space, it is convenient to work with rectangular regions or polytopic regions.
Suppose I had a damped oscillator converging to some unknown point. If we can show that every point in a set maps within the set, we can show that the function
One model of a program is that it is some kind of kooky complicated hyper nonlinear discrete time dynamical system. And vice versa, dynamical systems are continuous time programs. The techniques for analyzing either have analogs in the other domain. Invariants of programs are essential for determining correctness properties of loops. Invariants like energy and momentum are essential for determining what physical systems can and cannot do. Lyapunov functions demonstrate that control systems are converging to the set point. Terminating metrics are showing that loops and recursion must eventually end.
If instead you use interval arithmetic for a bound on your solution rather than your best current solution, and if you can show the interval maps inside itself, then you know that the iterative process must converge inside of the interval, hence that is where the true solution lies.
A very simple bound for an integral is
The integral is a very nice operator. The result of the integral is a positive linear sum of the values of a function. This means it plays nice with inequalities.
Rigorously Bounding ODE solutions with Sum of Squares optimization – Intervals
Intervals – Moore book. Computational functional analaysis. Tucker book. Coqintervals. fixed point theorem. Hardware acceleration? Interval valued functions. Interval extensions.
Banach fixed point – contraction mapping
Brouwer fixed point
Picard iteration vs? Allowing flex on boundary conditions via an interval?
Interval book had an interesting integral form for the 2-D
sympy has cool stuff
google scholar search z3, sympy brings up interesting things
Lyapunov functions. Piecewise affine lyapunov funcions. Are lyapunov functions kind of like a PDE? Value functions are pdes. If the system is piecewise affine we can define a grid on the same piecewise affine thingo. Compositional convexity. Could we use compositional convexity + Relu style piecewise affinity to get complicated lyapunov functions. Lyapunov functions don’t have to be continiuous, they just have to be decreasing. The Lie derivative wrt the flow is always negative, i.e gradeint of function points roughly in direction of flow. trangulate around equilibrium if you want to avoid quadratic lyapunov. For guarded system, can relax lyapunov constrain outside of guard if you tighten inside guard. Ax>= 0 is guard. Its S-procedure.
Idea: Approximate invariants? At least this ought to make a good coordinate system to work in where the dynamics are slow. Like action-angle and adiabatic transformations. Could also perhaps bound the
I have a method that I’m not sure is ultimately sound. The idea is to start with
Error analysis most often uses an appeal to Taylor’s theorem and Taylor’s theorem is usually derived from them mean value theorem or intermediate value theorem. Maybe that’s fine. But the mean value theorem is some heavy stuff. There are computational doo dads that use these bounds + interval analysis to rigorously integrate ODEs. See https://github.com/JuliaIntervals/TaylorModels.jl
The beauty of sum of squares certificates is that they are very primitive proofs of positivity for a function on a domain of infinitely many values. If I give you a way to write an expression as a sum of square terms, it is then quite obvious that it has to be always positive. This is algebra rather than analysis.
. Sum of squares is a kind of a quantifier elimination method. The reverse direction of the above implication is the subject of the positivstullensatz, a theorem of real algebraic geometry. At the very least, we can use the SOS constraint as a relaxation of the quantified constraint.
So, I think by using sum of squares, we can turn a differential equation into a differential inequation. If we force the highest derivative to be larger than the required differential equation, we will get an overestimate of the required function.
A function that is dominated by another in derivative, will be dominated in value also. You can integrate over inequalities (I think. You have to be careful about such things. ) x(t) – x(0) >= y(t) – y(0) $
The derivative of a polynomial can be thought of as a completely formal operation, with no necessarily implied calculus meaning. It seems we can play a funny kind of shell game to avoid the bulk of calculus.
As an example, let’s take with the solution . is a transcendental
The S-procedure is trick by which you can relax a sum of squares inequality to only need to be enforced in a domain. If you build a polynomials function that describes the domain, it that it is positive inside the domain and negative outside the domain, you can add a positive multiple of that to your SOS inequalities. Inside the domain you care about, you’ve only made them harder to satisfy, not easier. But outside the domain you have made it easier because you can have negative slack.
For the domain the polynomial works as our domain polynomial.
We parametrize our solution as an explicit polynomial . It is important to note that what follows is always linear in the .
can be relaxed to with .
So with that we get a reasonable formulation of finding a polynomial upper bound solution of the differential equation
And here it is written out in python using my cvxpy-helpers which bridge the gap between sympy polynomials and cvxpy.
We can go backwards to figure out sufficient conditions for a bound. We want . It is sufficient that . For this it is sufficient that . We follow this down in derivative until we get the lowest derivative in the differential equation. Then we can use the linear differential equation itself . .
. This accounts for the possibility of terms changing signs. Or you could separate the terms into regions of constant sign.
The minimization characterization of the bound is useful. For any class of functions that contains our degree-d polynomial, we can show that the minimum of the same optimization problem is less than or equal to our value.
Is the dual value useful? The lower bound on the least upper bound
Doesn’t seem like the method will work for nonlinear odes. Maybe it will if you relax the nonlinearity. Or you could use perhaps a MIDSP to make piecewise linear approximations of the nonlinearity?
It is interesting to investigtae linear programming models. It is simpler and more concrete to examine how well different step sizes approximate each other rather than worry about the differential case.
We can explicit compute a finite difference solution in the LP, which is a power that is difficult to achieve in general for differential equations.
We can instead remove the exact solution by a convservative bound.
While we can differentiate through an equality, we can’t differentiate through an inequality. Differentiation involves negation, which plays havoc with inequalities. We can however integrate through inequalities.
x(t) >= \int^t_0 f(x) + a$
As a generalization we can integrate over inequalities as long as
In particular x(t) – x(0) >= y(t) – y(0) $
We can convert a differential equation into a differential inequation. It is not entirely clear to me that there is a canonical way to do this. But it works to take the biggest.
Neural networks are all the rage these days. One mundane way of looking at neural networks is that they are a particular class of parametrized functions . What makes them useful is:
They can be used at insane scale due to their simplicity and excellent available implementations and tooling
There are intuitive ways to input abstract structure and symmetry expected of a problem, for example translation symmetry, or a hierarchy of small scale pattern recognition combining into large scale structures. How this all works is very peculiar.
Inspirational analogies can be drawn from nature.
Neural networks made out of just relus (rectified linear units, relu(x) = max(0,x) ) and linear layers are particularly amenable to formal analysis. Regarding the weights as fixed (once a network has be trained), the complete neural network is a piecewise linear function. The regions where it is piecewise define are polyhedra (They are defined by the hyperplanes of the relu transitions as inequality constraints). Such functions are among those the most amenable to automated rigorous analysis.
Most machine learning tasks don’t have a mathematically precise specification. What is the mathematically precise definition of a picture of a horse? We could try to come up with something (this is sort of what good old fashioned AI tried to do), but in my opinion it would be rather suspect.
Tasks that do have a precise spec are questionable areas for machine learning techniques, because how can you know that the network meets the spec? Also, one would suspect such a problem would have structure that you might be better off with a more traditional algorithmic approach.
However, there a a couple areas where one does have reasonable formal questions one might want to ask about a neural network:
Robustness around training and validation data. Finding Adversarial examples or proving there are none.
Games like Go. Alpha Go is a marriage of more traditional algorithmic approaches and machine learning. There is a core of traditional game tree search to it.
Control Systems – There are many control systems which we do have a reasonable formal spec one could write, such as walking robots. These systems are so high dimensional that it is difficult to derive a good controller from the spec, and hence reinforcement learning may be a reasonable option. But we would like to confirm the controller is good and not dangerous
Neural networks as computational accelerators. There are problems which we know how to solve, but are too slow. Neural networks can be evaluated very quickly and easily thanks to modern frameworks. It may be useful to presolve a large number of examples offline using the slow algorithm and train a neural network to give good estimates. We may be able to replace the slow thing entirely if we can verify the neural network always is good enough.
We’re going to use a neural network to fit a chebyshev polynomial. Here we’re picking a Chebyshev polynomial as our base truth because Chebyshev polynomials have some pleasant waviness to them. Why not. I like ’em. Also polynomials are easily understood by z3 as a base spec.
This example of course is a complete toy. How often do you see 1-D input space neural networks? Not often I bet. But it’s nice for a couple reasons:
Because we can visualize it.
It drives home the point about neural nets being a space of piecewise linear function approximators, and how similar training is to curve fitting.
It’s simple enough that z3 can crush it. There is a big question if z3 or other methods can scale to realistic neural nets. Modern top of the line neural nets are insanely huge. As we’ve done it here, I highly doubt it. There are special purpose SMT solvers being built for this task. Also the slightly different technology of mixed integer programming can be used and seems very promising. So this is an area of research. See links at the bottom for more.
Generally speaking, the combination of the capabilities of sympy and z3 give us access to some very intriguing possibilities. I’m not going to investigate this in detail in this post, but I are showing how you can convert a sympy derived polynomial into a python expression using lambdify, which can then be in turn used on z3 variables.
Here we build a very small 3 layers network using Keras. We use a least squares error and an adam optimizer. Whatever. I actually had difficulty getting nice results out for higher order chebyshev polynomials. Neural networks are so fiddly.
And here we extract the weights and reinterpret them into z3. We could also have used z3 floating point capabilities rather than reals if you’re concerned about numerical issues. It was convenient to have my layers be different sizes, so that size mismatch would throw a python error. That’s how I found out the weights are transposed by default. The code at the end extracts a found countermodel and evaluates it. If you want to feel fancier, you can also use the prove function rather than an explicit Solver() instance. Saying you proved the neural network matches the polynomial to a certain tolerance feels really good. If you look at the graphs, the edges at 1 and -1 actually have pretty bad absolute error, around 0.5.
Z3 can be used for proofs. The input language isn’t anywhere near as powerful as interactive theorem provers like Coq, Isabelle, or Agda, but you can ask Z3 to prove pretty interesting things. Although the theorems that follow aren’t hard in interactive theorem provers, they would take beyond complete novice level skills to state or prove.
I like to think of the z3 proving process as “failing to find a counterexample”. Z3py has supplies a function prove which is implemented like this.
Basically, it negates the thing you want to prove. It then tries to find a way to instantiate the variables in the expression to make the statement false. If it comes back unsat, then there is no variable assignment that does it. Another way to think about this is rewriting the as . The first lives at sort of a meta level, where we consider unsat as a success, but the inner is the one appearing in s.add(Not(claim)).
We can prove some simple facts. This is still quite cool, let’s not get too jaded. Manually proving these things in Coq does suck (although is easy if you use the ring, psatz, and lra tactics https://coq.inria.fr/refman/addendum/micromega.html, which you DEFINITELY should. It is a great irony of learning coq that you cut your teeth on theorems that you shouldn’t do by hand).
Ok, here’s our first sort of interesting example. Some properties of even and odd numbers. Even and Odd are natural predicates. What are possible choices to represent predictaes in z3? We can either choose python functions IntSort -> BoolSort() as predicates or we can make internal z3 functions Function(IntSort(), BoolSort())
All well and good, but try to prove facts about the multiplicative properties of even and odd. Doesn’t go through. 🙁
Here’s a simple inductive proof. Z3 can do induction, but you sort of have to do it manually, or with a combinator. Given a predicate f, inductionNat returns
Here’s another cute and stupid trick. Z3 doesn’t have a built in sine or cosine. Perhaps you would want to look into dreal if you think you might be heavily looking into such things. However, sine and cosine are actually defined implicitly via a couple of their formula. So we can instantiate A slightly counterintuitive thing is that we can’t use this to directly compute sine and cosine values. That would require returning a model, which would include a model of sine and cosine, which z3 cannot express. However, we can try to assert false facts about sine and cosine and z3 can prove they are in fact unsatisfiable. In this way we can narrow down values by bisection guessing. This is very silly.
A trick that I like to use sometimes is embedding objects in numpy arrays. Numpy slicing is the best thing since sliced bread. A lot, but not all, of numpy operations come for free, like matrix multiply, dot, sum, indexing, slicing, reshaping. Only some are implemented in terms of overloadable operations. here we can prove the Cauchy Schwartz inequality for a particular vector and some axioms of vector spaces.
Defining and proving simple properties of Min and Max functions
Proving the Babylonian method for calculating square roots is getting close to the right answer. I like the to think of the Babylonian method very roughly this way: If your current guess is low for the square root x/guess is high. If your guess is high, x/guess is low. So if you take the average of the two, it seems plausible you’re closer to the real answer. We can also see that if you are precisely at the square root, (x/res + x)/2 stays the same. Part of the the trick here is that z3 can understand square roots directly as a specification. Also note because of python overloading, babylonian with work on regular numbers and symbolic z3 numbers. We can also prove that babylon_iter is a contractive, which is interesting in it’s own right.
A funny thing we can do is define interval arithmetic using z3 variables. Interval arithmetic is very cool. Checkout Moore’s book, it’s good. This might be a nice way of proving facts related to real analysis. Not sure. This is funny because z3 internally uses interval arithmetic. So what we’re doing is either very idiotically circular or pleasantly self-similar. We could use a similar arrangement to get complex numbers, which z3 does not natively support
Sorting networks are a circuit flavored take on sorting. Although you can build circuits for any size input, any particular circuit works for a fixed sized input. They are like an unrolling of the loops or recursion of more familiar sorting algorithms. They come up also in the context of parallel and gpu sorting
Here’s an interesting thing. We can go to Wikipedia and get a little python snippet for the comparison order of a Batcher odd-even mergesort. Kind of a confusing algorithm. Why does it even work? Is it even right? It’s written in some kind of funky, indexful generator style.
Well we can confirm this relatively straightforwardly using z3 by replacing the implementation of compare_and_swap with its z3 equivalent. We then ask z3 .
This comes back unsat, hence there are no inputs or executions that do not come back sorted. If I delete some elements from pair_to_compare, it comes back sat, showing that it doesn’t always sort.
The trick here is that the circuit is fixed size, so we have no need for induction, one of the main things z3 is rather finicky at.
It’s somewhat interesting to note that the output of odd_even_merge is a sequence of instructions, we can think of this as interpreting a very small 1 instruction programming language.
We can also confirm similarly a simple odd-even bubblesort and other similar algorithms.
Q: What about using uninterpreted sorts rather than integers? Integers is pretty convincing to me.
same_elems is slightly weaker than a permutation predicate. Wasn’t super obvious to me the best way to do a permutation predicate in z3. Would I want to internalize the array?
Edit: Upon further thought, actually the sort IS a nice predicate for permutation. How do we compute if two things are permutations of each other? By sorting them and forcing a zipped equality. Alternatively count the number of each element (a piece of bucket sort). Since this sort is done by composing swaps, it is somewhat intrinsically a permutation
As a bummer though, I think randomized testing on arrays would be equally or perhaps more convincing of the correctness of the algorithm. Oh well.