Thoughts on Differentiation in Z3

I have a physics background, and a lot of physics is in terms of differentiation and integrals. While they show up, these are not an emphasis in computer science or in mathematical logic.

I’d like to model these problems using the Z3 smt solver, which doesn’t have intrinsic support for them. Maybe one could then use z3 for push button or nearly push button verification of hybrid control systems, or for the accuracy of numerical calculations using finite precision numbers (floats, rationals, fixedpoint. Z3 does have floats built in), or using Z3 to confirm the results produced by Sympy/mathematic. These would all be sick as hell.

This blog post doesn’t solve the problem, hence the title “thoughts”. But still, I think they are interesting little nuggets.

Syntax of Differentiation

The syntactic rules of differentiation are very simple and straightforward. A reasonable seeming syntax for differentiating an expression is something like diff(sin(cos(x)), x), which is what you’ll find in things like sympy or mathemetica. I agree that this syntax does contain enough information to know what I mean. There is a question however of whether Z3’s manipulation of this syntax will comply with what I intended. From the perspective of first order logic, I think what we’ve written there is very nuanced and perhaps ill formed. Does it make sense to model a lambda term as the first order logical term Lambda(x, x * x)? Almost certainly not. It gets the point across, but upon almost any manipulation you’ll find yourself somewhere you didn’t mean to go.

When you toss purely syntactic constructions into Z3, I think you’re asking for trouble. As a safety net, you should have some idea of what the semantics of each piece of syntax is intended to be. What is x in this expression? Is it a standing for a concrete a real number like 7? How do you differentiate with respect to 7? That doesn’t even make sense.

What is cos(x)? Is that a real number? No not really, because that expression is really meant to express in some manner a function $ \mathbb{R} \rightarrow \mathbb{R} $. Or maybe just cos is meant to be the function and cos(x) is that said function applied to a concrete argument?

What is diff? I am inclined to think of it in this single variable calculus case as representing a function (R -> R) -> (R -> R). Even this is very fishy. We well know there are many functions that are not differentiable.

So perhaps the expression diff(cos) is sensible with respect to our intended semantics, leaving x implicit.

So I suggest that I should be working with combinators to describe functions if i want to embed differentiation into first order logic, and avoid explicit mentioned the variable x in the syntax.

We can also talk about function composition. diff(comp(sin,cos)) seems to reasonably refer to the same concept as that above, without these uncomfortable.

An interesting trick we can use to make these composition expressions look more natural is define some helpers like so

comp = Function("comp", FUN, FUN, FUN)
id_ = Const("id", FUN)
cosy = lambda x : comp(cos,x)
siny = lambda x : comp(sin,x)
x = id_
id_axioms = [
    ForAll([F], comp(id_,F) == F, patterns=[comp(id_,F)]), #identity absorption axioms
    ForAll([F], comp(F,id_) == F, patterns=[comp(F,id_)])
    # add composition associativty axioms also.
]

This trick let’s us write the point free combinator expressions with a syntax that looks like the more intuitive pointful syntax. This trick is some relative of the Hughes List / Cayley Representation / Yoneda Lemma. It is useful to look at the partial application of associative binary operators and see what you get.

Arrays as Real->Real Functions

There is a slightly unintuitive correspondence we can use. In Z3, it is perfectly reasonable to define Array(RealSort(), RealSort()) The sort Array doesn’t really imply what you might naively think. We often tend to think of Arrays as having a domain of integers or bitvectors, but this is not at all required. Arrays are first orderized function symbols. Select could equally be called Apply, a function that takes a function and an argument and applies the function to the argument.

FUN = Array(RealSort(), RealSort())
cos, sin = Consts("cos sin")

We could perhaps not use the built in Array and define our own custom sort with axioms that are the same as arrays. I am unclear to what degree Z3 has built in smarts for dealing with the array theory or if it gets treated the same as if the user had defined it.

Loose specification of Transcendental functions

It is very difficult to get the bread and butter transcendental functions like exp sin and cos encoded into z3. The idea is to use uninterpreted versions of them with loose constraints. Simple facts like the following.

e, pi = Reals("e pi")
simple_axioms = [
    cos[0] == 1,
    sin[0] == 0,
    sin[pi] == 0,
    #etc other special values.
    exp[0] == e,
    2.718 <= e,
    e <= 2.719,
    3.14 <= pi,
    pi <= 3.15,
    ForAll([x], exp[x + y] == exp[x]*exp[y]),
    ForAll([x], exp[x] => 1 + x),
    ForAll([x], sin[x]*sin[x] + cos[x]*cos[x] == 1)
    # other properties,, trig identities

]

Lifting addition to function addition

Operation on the real numbers can be lifted to work over function point wise. This is a cute trick that almost goes without mention if you’re not being careful, since the notion of adding functions is so similar to the notion of adding codomain values.

mul = Function("mul", FUN, FUN, FUN)
add = Function("add", FUN, FUN, FUN)
axioms = [
    ForAll([F, G, x], mul(F,G)[x] == F[x] * G[x], patterns=[mul(F,G)[x]]), #function multiplication lifted
    ForAll([F, G, x], add(F,G)[x] == F[x] + G[x], patterns=[add(F,G)[x]]),
]

It makes sense to use a python wrapper so that we get can operator overloading.

class Fun():
    def __init__(self, f):
        self.f = f
    def __add__(self,rhs):
        if type(rhs) == int or type(rhs) == float:
            return self + Fun.const(rhs)
        return Fun(add(self.f, rhs.f))
    def __mul__(self,rhs):
        if type(rhs) == int or type(rhs) == float:
            return self * Fun.const(rhs)
        return Fun(mul(self.f, rhs.f))
    def __call__(self,a):
        return Fun(comp(self.f, a.f))
    def __eq__(self,rhs):
        if type(rhs) == int or type(rhs) == float:
            return self == Fun.const(rhs)
        return self.f == rhs.f
    def __getitem__(self, ind):
        return self.f[ind]
    def const(x):
        return Fun(const(RealVal(x)))
    def diff(self):
        return Fun(diff(self.f))
    def __repr__(self):
        return repr(self.f)
    
def fun1(name):
    def res(f):
        return Fun(Function(name,FUN, FUN)(f))
    return res

def FunConsts(names):
    return [Fun(f) for f in Consts(names, FUN)]

cos, sin, exp, sqrt, ln, id_ = FunConsts("cos sin exp sqrt ln id") 
x = id_
sin(cos(x) + 7).diff() == cos(x)

Differentiation Axioms and Induction Schema

We can encode the standard rules of differentiation

diff_axioms = [
    diff(id1) == const(RealVal(1)),
    diff(cos) == mul(const(RealVal(-1)), sin),
    diff(sin) == cos,
    diff(exp) == exp,
    ForAll([c], diff(const(c)) == const(RealVal(0)), patterns=[diff(const(c))]),
    ForAll([F,G], diff(comp(F,G))  == mul(comp(diff(F),G), diff(G)), patterns=[diff(comp(F,G))]),
    ForAll([F,G], diff(mul(F,G)) == add(mul(diff(F),G), mul(F,diff(G))), patterns=[diff(mul(F,G))]),
    ForAll([F,G], diff(add(F,G))  == add(diff(F), diff(G)), patterns=[diff(add(F,G))])
]

There is a couple choices of axiom schema that are close relatives of the axiom schema of induction in peano arithemtic. If we know an initial condition and something about the derivative everywhere, we can know something about a function everywhere. Interestingly, this is kind of a formulation of the fundamental theorem of calculus.

One form of this might be

def diff_schema(f):
    return Implies(And( f[0] == 0, diff(f) == const(RealVal(0))),
           f == const(RealVal(0))
            )

Other forms might work on restricted domains like t >= 0, or use <= rather than ==.

This actually doesn’t need to be a schema (a python metarepresentation of an infinite class of axioms), since we can directly quantify over f inside of Z3, since it is an Array, not a function symbol.

Simple example

s = Solver()
y = Const("y", FUN)
s.add(diff(y) == const(RealVal(1)))
s.add(y[0] == e)
s.add(simple_axioms)
s.add(quant_axioms)
s.add(diff_axioms)

f = add(y, mul(const(RealVal(-1)),id_))
s.add(diff_schema(f))
s.add(Not(f == const(RealVal(0)))) # the theorem
s.check()

Bits and Bobbles

One way to treat differentiation is a validated numerics approach. You can rigorously approximate integrals using interval arithmetic, $\f_{min} T \le le \int f dt \le g_{max} T $ and differential equations can often be brought into an integral form that can be iterated to convergence. I’ve done things related to this in previous blog posts. CEGARing Exponentials into Z3 with Intervals and Python Coroutines. See also dreal and JuliaIntervals.

Andre Platzer’s systems are a very interesting approach to modelling differential systems in a logical way. Here’s his course

$\partial_x$ is perhaps best understood as a binding form, a brother of $\forall x$, or $\sum_x$. There is some discussion in structure and interpetation of classical mechanics to this effect.

Binding forms require some trickery to encode into first order logic. We could perhaps postulate a different kind of logic that contains this binding form in addition to $\forall$ and $\exists$ as a primitive notion. I don’t know what its rules of inference would be though.

diff : (R -> R) -> (R -> R) can also be understood as kind of a functional program. You could quasi implement diff as finite difference for example. diff = lambda f: lambda x: (f(x+0.01) - f(x-0.01)) / 0.02

The questions in this post are very much related to how to properly model differentiation and other binding forms in egraph systems, like egg and metatheory.jl. An uncareful approach will not mean what you think it means. Having a careful grasp of semantics is important I think.

Defunctionalization is kind of like the axiom schema of comprehension. Arrays are a way of giving first order names to higher order functions.

Will ematching patterns save us? Unclear.

Higher dimensional domains: Use Z3 datasorts like R2, and define arrays with that domain.

R2 = Datatype("R2")
R2.declare("R2", ('x', RealSort()), ('y', RealSort()))
R2Sort = R2.create()

Compare and contrast the above with the coq/lean equivalent. It would look like a highly axiomatized version. I guess I’m fighting the temptation to dignify differentiation back to epsilon-deltas and Dirichlet cuts and just starting from the rules I actually want. This highly mathematical definition is not at all how engineers and physicists work with calculus. There’s something sick here. In principle, these starting point can be filled in in a way that connects to deeper definitions.

Coq sections

Something like this. A sketch.

Section realish.

Variable R : Type.
Variable zero : R.
Variable sin : R -> R.
Context sin_zero : sin zero = zero. (* What equality? *)
Variable diff : (R -> R) -> (R -> R).

Theorem sin_sin_zero : sin (sin zero) = zero. yada yada. Qed.

Z3 supports lambda notation for arrays. It was not working at all like I expected though. There be dragons here?