Stupid Z3Py Tricks Strikes Back: Verifying a Keras Neural Network

Neural networks are all the rage these days. One mundane way of looking at neural networks is that they are a particular class of parametrized functions . What makes them useful is:

  1. They can be used at insane scale due to their simplicity and excellent available implementations and tooling
  2. There are intuitive ways to input abstract structure and symmetry expected of a problem, for example translation symmetry, or a hierarchy of small scale pattern recognition combining into large scale structures. How this all works is very peculiar.
  3. Inspirational analogies can be drawn from nature.

Neural networks made out of just relus (rectified linear units, relu(x) = max(0,x) ) and linear layers are particularly amenable to formal analysis. Regarding the weights as fixed (once a network has be trained), the complete neural network is a piecewise linear function. The regions where it is piecewise define are polyhedra (They are defined by the hyperplanes of the relu transitions as inequality constraints). Such functions are among those the most amenable to automated rigorous analysis.

A relu

Most machine learning tasks don’t have a mathematically precise specification. What is the mathematically precise definition of a picture of a horse? We could try to come up with something (this is sort of what good old fashioned AI tried to do), but in my opinion it would be rather suspect.

Tasks that do have a precise spec are questionable areas for machine learning techniques, because how can you know that the network meets the spec? Also, one would suspect such a problem would have structure that you might be better off with a more traditional algorithmic approach.

However, there a a couple areas where one does have reasonable formal questions one might want to ask about a neural network:

  • Robustness around training and validation data. Finding Adversarial examples or proving there are none.
  • Games like Go. Alpha Go is a marriage of more traditional algorithmic approaches and machine learning. There is a core of traditional game tree search to it.
  • Control Systems – There are many control systems which we do have a reasonable formal spec one could write, such as walking robots. These systems are so high dimensional that it is difficult to derive a good controller from the spec, and hence reinforcement learning may be a reasonable option. But we would like to confirm the controller is good and not dangerous
  • Neural networks as computational accelerators. There are problems which we know how to solve, but are too slow. Neural networks can be evaluated very quickly and easily thanks to modern frameworks. It may be useful to presolve a large number of examples offline using the slow algorithm and train a neural network to give good estimates. We may be able to replace the slow thing entirely if we can verify the neural network always is good enough.

We’re going to use a neural network to fit a chebyshev polynomial. Here we’re picking a Chebyshev polynomial as our base truth because Chebyshev polynomials have some pleasant waviness to them. Why not. I like ’em. Also polynomials are easily understood by z3 as a base spec.

This example of course is a complete toy. How often do you see 1-D input space neural networks? Not often I bet.
But it’s nice for a couple reasons:

  1. Because we can visualize it.
  2. It drives home the point about neural nets being a space of piecewise linear function approximators, and how similar training is to curve fitting.
  3. It’s simple enough that z3 can crush it. There is a big question if z3 or other methods can scale to realistic neural nets. Modern top of the line neural nets are insanely huge. As we’ve done it here, I highly doubt it. There are special purpose SMT solvers being built for this task. Also the slightly different technology of mixed integer programming can be used and seems very promising. So this is an area of research. See links at the bottom for more.

Generally speaking, the combination of the capabilities of sympy and z3 give us access to some very intriguing possibilities. I’m not going to investigate this in detail in this post, but I are showing how you can convert a sympy derived polynomial into a python expression using lambdify, which can then be in turn used on z3 variables.

Our actual chebyshev polynomial

Here we build a very small 3 layers network using Keras. We use a least squares error and an adam optimizer. Whatever. I actually had difficulty getting nice results out for higher order chebyshev polynomials. Neural networks are so fiddly.

Our neural fit of the polynomial

And here we extract the weights and reinterpret them into z3. We could also have used z3 floating point capabilities rather than reals if you’re concerned about numerical issues. It was convenient to have my layers be different sizes, so that size mismatch would throw a python error. That’s how I found out the weights are transposed by default. The code at the end extracts a found countermodel and evaluates it. If you want to feel fancier, you can also use the prove function rather than an explicit Solver() instance. Saying you proved the neural network matches the polynomial to a certain tolerance feels really good. If you look at the graphs, the edges at 1 and -1 actually have pretty bad absolute error, around 0.5.

Links

https://github.com/sisl/NeuralVerification.jl

https://arxiv.org/abs/1711.07356 – Evaluating Robustness of Neural Networks with Mixed Integer Programming

https://github.com/vtjeng/MIPVerify.jl

https://arxiv.org/abs/1702.01135 – reluplex, SMT specifically for neural networks

More Stupid Z3Py Tricks: Simple Proofs

Z3 can be used for proofs. The input language isn’t anywhere near as powerful as interactive theorem provers like Coq, Isabelle, or Agda, but you can ask Z3 to prove pretty interesting things. Although the theorems that follow aren’t hard in interactive theorem provers, they would take beyond complete novice level skills to state or prove.

I like to think of the z3 proving process as “failing to find a counterexample”. Z3py has supplies a function prove which is implemented like this.

Basically, it negates the thing you want to prove. It then tries to find a way to instantiate the variables in the expression to make the statement false. If it comes back unsat, then there is no variable assignment that does it. Another way to think about this is rewriting the \forall y. p(y) as \neg \exists y \neg p (y). The first \neg lives at sort of a meta level, where we consider unsat as a success, but the inner \neg is the one appearing in s.add(Not(claim)).

We can prove some simple facts. This is still quite cool, let’s not get too jaded. Manually proving these things in Coq does suck (although is easy if you use the ring, psatz, and lra tactics https://coq.inria.fr/refman/addendum/micromega.html, which you DEFINITELY should. It is a great irony of learning coq that you cut your teeth on theorems that you shouldn’t do by hand).

Ok, here’s our first sort of interesting example. Some properties of even and odd numbers. Even and Odd are natural predicates. What are possible choices to represent predictaes in z3?
We can either choose python functions IntSort -> BoolSort() as predicates or we can make internal z3 functions Function(IntSort(), BoolSort())

All well and good, but try to prove facts about the multiplicative properties of even and odd. Doesn’t go through. 🙁

Here’s a simple inductive proof. Z3 can do induction, but you sort of have to do it manually, or with a combinator. Given a predicate f, inductionNat returns

Here’s another cute and stupid trick. Z3 doesn’t have a built in sine or cosine. Perhaps you would want to look into dreal if you think you might be heavily looking into such things. However, sine and cosine are actually defined implicitly via a couple of their formula. So we can instantiate
A slightly counterintuitive thing is that we can’t use this to directly compute sine and cosine values. That would require returning a model, which would include a model of sine and cosine, which z3 cannot express.
However, we can try to assert false facts about sine and cosine and z3 can prove they are in fact unsatisfiable. In this way we can narrow down values by bisection guessing. This is very silly.

A trick that I like to use sometimes is embedding objects in numpy arrays. Numpy slicing is the best thing since sliced bread. A lot, but not all, of numpy operations come for free, like matrix multiply, dot, sum, indexing, slicing, reshaping. Only some are implemented in terms of overloadable operations. here we can prove the Cauchy Schwartz inequality for a particular vector and some axioms of vector spaces.

Defining and proving simple properties of Min and Max functions

Proving the Babylonian method for calculating square roots is getting close to the right answer. I like the to think of the Babylonian method very roughly this way: If your current guess is low for the square root x/guess is high. If your guess is high, x/guess is low. So if you take the average of the two, it seems plausible you’re closer to the real answer. We can also see that if you are precisely at the square root, (x/res + x)/2 stays the same. Part of the the trick here is that z3 can understand square roots directly as a specification. Also note because of python overloading, babylonian with work on regular numbers and symbolic z3 numbers. We can also prove that babylon_iter is a contractive, which is interesting in it’s own right.

A funny thing we can do is define interval arithmetic using z3 variables. Interval arithmetic is very cool. Checkout Moore’s book, it’s good. This might be a nice way of proving facts related to real analysis. Not sure.
This is funny because z3 internally uses interval arithmetic. So what we’re doing is either very idiotically circular or pleasantly self-similar.
We could use a similar arrangement to get complex numbers, which z3 does not natively support

Stupid Z3Py Tricks: Verifying Sorting Networks off of Wikipedia

Sorting networks are a circuit flavored take on sorting. Although you can build circuits for any size input, any particular circuit works for a fixed sized input. They are like an unrolling of the loops or recursion of more familiar sorting algorithms. They come up also in the context of parallel and gpu sorting

Here’s an interesting thing. We can go to Wikipedia and get a little python snippet for the comparison order of a Batcher odd-even mergesort. Kind of a confusing algorithm. Why does it even work? Is it even right? It’s written in some kind of funky, indexful generator style.

Source: Wikipedia

Well we can confirm this relatively straightforwardly using z3 by replacing the implementation of compare_and_swap with its z3 equivalent. We then ask z3 .

This comes back unsat, hence there are no inputs or executions that do not come back sorted. If I delete some elements from pair_to_compare, it comes back sat, showing that it doesn’t always sort.

The trick here is that the circuit is fixed size, so we have no need for induction, one of the main things z3 is rather finicky at.

It’s somewhat interesting to note that the output of odd_even_merge is a sequence of instructions, we can think of this as interpreting a very small 1 instruction programming language.

We can also confirm similarly a simple odd-even bubblesort and other similar algorithms.

Q: What about using uninterpreted sorts rather than integers? Integers is pretty convincing to me.

same_elems is slightly weaker than a permutation predicate. Wasn’t super obvious to me the best way to do a permutation predicate in z3. Would I want to internalize the array?

Edit: Upon further thought, actually the sort IS a nice predicate for permutation. How do we compute if two things are permutations of each other? By sorting them and forcing a zipped equality. Alternatively count the number of each element (a piece of bucket sort). Since this sort is done by composing swaps, it is somewhat intrinsically a permutation

As a bummer though, I think randomized testing on arrays would be equally or perhaps more convincing of the correctness of the algorithm. Oh well.

Programming and Interactive Proving With Z3Py

I’ve been fiddling with z3py, figuring out some functionality and realizing some interesting things you could do with it. I think I’m at a point where it is nice to checkpoint myself with a blog post.

I’m a little surprised z3py doesn’t overload the & and | operators and some kind of implies operator for BoolRef. You can insert them later using this.

Functional Programming

Python is not the best functional programming environment imo. And by functional programming I implicitly mean roughly ML-like FP a la Haskell or OCaml. I don’t venture much into lisp land.

The lack of good algebraic datatypes (the class syntax is so ungainly) and a type system hurts. The lack of pattern matching hurts. The lambda keyword is so long it makes me sad.

But you have full access to z3 from the python bindings. Z3 does have algebraic data types, and a type system. It has built in substitution mechanisms and evaluation. And it has insane search procedures and the ability to prove things. Pretty damn cool!

Unfortunately the type system is rather simplistic, being basically simply typed rather than polymorphic or something else. But using python a a schema/macro system for z3 seems a plausible way forward.

To build templated types, you can have constructor functions in python for the appropriate types.

You can access the constructors from the returned types. Check this out. You get detector functions is_Nothing and is_Just , the extractor function fromJust and constructor functions Nothing and Just. I do a lot of dir exploration with z3py. It’s hard to know what’s available sometimes

It’s possible to build a general purpose match combinator on these types since you can introspect the number of constructors of the ADT using num_constructors, constructor, recognizer, and accessor. There might be a match inside z3py somewhere? I think it’s part of the SMTLIB standard now.

Example usage:

Z3 has a substitution mechanism built in. This is useful for instantiating ForAll and for evaluating Lambda. The substitute_vars function is what you want like so substitute_vars(f.body(), x, y, z)

It is possible to reflect the syntax in a fairly straightforward way back into python via a lambdify function, mimicking the equivalent very useful function from sympy. Lambdify is basically an interp function. Here is a start for such a function. I by no means have implemented interpretation of the entirety of z3. Also I feel like this implementation is very clunky. Some kind of CPS?

There is the ability to define recursive functions in z3. It is also plausible to define them via. In this way you can get a reversible functional programming language, maybe some subset of mercury / curry’s power.

Interactive Theorem Proving

Z3 is awesome at thoerem proving. But somethings it just doesn’t handle right and needs human guidance.

Through searching, there are a couple interesting python interactive theorem prover projects. Cody pointed me to a project he worked on a while back, Boole https://github.com/avigad/boole . It has a dependently typed lambda calculus in it with the purpose of gluing together many systems, I think. He implemented a lot of stuff from scratch. I think I want to try to get less and do less. There is also holpy https://arxiv.org/abs/1905.05970 which appears to be being actively developed. It’s roughly a translation of hol to python I think. It’s available from a strange chinese github on the author’s website if you go looking for it.

This suggests an interesting approach. Most interactive theorem provers start unautomated and add it later. Instead we can iteratively build an interface to de-automate z3.

Altogether, this approach is more HOL flavored than Coq/Agda flavored. z3 terms are our logic and python is our manipulation metal language. Ideally, one would want to verify that every.

Python is so unprincipled that I can’t imagine that you could ever build a system up to the trustworthiness of the other theorem provers. But this is freeing in many ways. Since that is off the table, we can just do the best we can.

Using the z3 syntax tree and the z3 proof automation and z3 substitution mechanisms gives us a HUGE step up from implementing them from scratch. Ideally, we’d want to write as little python as possible, and especially as little python as possible that has to be trusted to be implemented correctly.

One big concern is accidental mutation of the proof under our feet by python. Perhaps using hashes and checking them might be a way to at least detect this. I need to have a good think about how to factor out a trusted core from all possible tactics.

I think it helps a little that z3 often will be able to verify the equivalence of small steps in proofs even if it can’t do the entire proof itself.

I think induction principles will need to be injected by hand. Z3 doesn’t really have that built in. There are definitely situations that after you introduce the induction, z3 can slam all the cases no problem. For example, check this out.

Another thing that might be nice is integration/translation to sympy. Sympy has a ton of useful functionality, at the very least differentiation.

Translation and integration with cvxpy for sum of squares proofs would also be quite neat. I already did something with this using sympy. I’m not super sure how you extract exact proofs from the floating point solutions SCS returns? I think there is a thing. I’ve heard the LLL algorithm can be used for this somehow (finding likely algebraic number matches to floating point numbers)?

So here are some sketched out ideas for tactics.

Another question is how to implement an apply tactic gracefully. Fully deconstructing syntax trees and unifying ourselves is not utilizing z3 well. If you have a good idea how to get unification out of z3, I’d be interested to hear from you here. https://stackoverflow.com/questions/59398955/getting-z3-instantiations-of-quantified-variables/59400838#59400838

Here’s an idea though. In the cold light of day, I am still not sure this reasoning makes much sense. Suppose we’re trying to apply forall x. a(x) -> b(x) to a c(y). If forall x. b(x) -> c(y) we’re good and by assumption that is obvious for some reason, like the syntactic instantiation of b gives c. We can ask z3 to prove that and it will hopefully easy. If we can prove forall x. a(x) in the current context, that would be sufficient, but not true typically. It is an overly difficult request. We really only need to prove a(x) for values pertinent to the proof of c(y). Here’s a suspicious strategem. Any a -> b can be weakened to (q -> a) -> (q -> b). In particular we can choose to weaken forall x. a(x) -> b(x) to forall x. ((c(y) -> b(x)) -> a(x)) -> ((c(y) -> b(x)) -> b(x)). Then we can replace the goal with forall x. ((c(y) -> b(x)) -> a(x)) after we prove that (forall x. (c(y) -> b(x)) -> b(x)) -> c(y). Maybe c(y) -> b(x) is sufficient to restrict the values of x? Not sure.

Another rough sketch of induction on Nat. Not right yet.

We could also make a simple induction for ADTs based on the similar introspection we used for match above. It’s ugly but I think it works.

I haven’t really though much about tacticals yet.

Solving the Laplace Equations with Linear Relations

The Laplace equation is ubiquitous in physics and engineering.

$latex \nabla^2 \phi = 0 = \partial_x^2 \phi + \partial_y^2 \phi = 0

It and slight variants of it describes electrostatics, magnetostatics, steady state heat flow, elastic flex, pressure, velocity potentials.

There are a couple reasons for that.

  • It results from minimizing the squared gradient of a field |\nabla \phi |^2 which can make sense from an energy minimization perspective.
  • Similarly it results from the combination of a flow conservation law and a linear constitutive relation connecting flow and field (such as Ohm’s law, Fick’s law, or Hooke’s law).
  • It also gets used even if not particularly appropriate because we know how to mathematically deal with it, for example in image processing.

There are a couple of questions we may want to ask about a Laplace equation system

  • Given the field on the boundary, determine the field in the interior (Dirchlet problem)
  • Given the normal derivative of the field on the boundary determine the field in the interior (Neumann problem)
  • Given sources in the interior and 0 boundary condition, determine the field. The Laplace equation is called the Poisson equation when you allow a source term on the right hand side. \nabla^2 \phi = \rho.
  • Given the field at the boundary, determine the derivative at the boundary. Dirichlet-to-Neumann map or Poincare-Steklov operator.

Given the Dirichlet to Neumann map, you do not have to consider the interior of a region to use it. The Dirichlet to Neumann map is sort of the same thing as an effective resistance or scattering matrix. It gives you a black box representation of a region based solely on the variables at its boundary.

This linear relation algebra is useful for many things that I’d have considered a use case for the Schur complement. The Schur complement arises when you do Gaussian elimination on a blocked matrix. It is good that this pattern has a name, because once you know about it, you’ll see it in many places. Domain decomposition, marginalized gaussian distributions, low-rank update, Scattering matrices.

By composing the linear relations corresponding to the Dirchlet-Neumann relations of regions, we can build the Dirichlet-Neumann relations of larger regions.

To make this more concrete, let us take the example of electrical circuits like before. A grid of resistors is a finite difference approximation to the continuous problem

-\nabla \phi = E Electric field is gradient of potential

E = \rho j continuum ohm’s law

\nabla\cdot j = 0 current conservation

In this post, I mentioned how you can make reasonable 2 dimensional diagrams out of a monoidal category, by sort of arbitrarily flipping one wire up and one wire down as in the diagram below. This defines a horizontal and vertical composition which have to do the required book-keeping (associations) to keep an arrow in canonical form. I had considered this for the management method of weights in neural networks, but it is way more natural as the actual geometrical layout of a finite difference grid of a laplace equation.

So we can reuse our categorical circuit combinators to build a finite difference Laplace equation.

Just showing how you can bend a 4-wire monoidal box into a 2-d diagram. Ignore the labels.

This can be implemented in Haskell doing the following. Neato.

Bits and Bobbles

  • Not the tightest post, but I needed to get it out there. I have a short attention span.
  • Homology and defining simplices as categories. One way of describing Homology is about linear operators that are analogues of finite difference operators (or better yet, discrete differential geometry operators / exterior derivatives). To some degree, it is analyzing the required boundary conditions to fully define differential equations on weirdo topological surfaces, which correspond to geometrical loops/holes. You can figure this out by looking at subspaces and quotients of the difference operators. Here we have here a very category theory way of looking at partial differential equations. How does it all connect?
  • Continuous circuit models – https://en.wikipedia.org/wiki/Distributed-element_model Telegrapher’s equation is classic example.
  • Cody mentioned that I could actually build circuits and measure categorical identities in a sense. That’s kind of cool. Or I could draw conductive ink on carbon paper and actually make my string diagrams into circuits? That is also brain tickling
  • Network circuits
  • I really want to get coefficients that aren’t just doubles. allowing rational functions of a frequency \omega would allow analysis of capacitor/inductor circuits, but also tight binding model systems for fun things like topological insulators and the Haldane model http://www.philipzucker.com/topologically-non-trivial-circuit-making-haldane-model-gyrator/ . I may need to leave Haskell. I’m not seeing quite the functionality I need. Use Sympy? https://arxiv.org/abs/1605.02532 HLinear. Flint bindings for haskell? Looks unmaintained. Could also use a grobner basis package as dynamite for a mouse.
  • This is relevant for the boundary element method. Some really cool other stuff relevant here. http://people.maths.ox.ac.uk/martinsson/2014_CBMS/

Blah Blah blah: The subtlest aspect of differential equations is that of boundary conditions. It is more correct usually to consider the system of the interior differential equation and the boundary conditions as equally essential parts of the statement of the problem you are considering.

Sum of Squares optimization for Minimax Optimal Differential Eq Residuals

Huh. This doesn’t embed very well. Maybe you’re better off just clicking into the thing. It’s nice not to let things rot too long though. *shrug*

Other ideas: Can I not come up with some scheme to use Sum of Squares for rigorous upper and lower bound regions like in https://github.com/JuliaIntervals/TaylorModels.jl ? Maybe a next post.