This is further explorations on the theme of using SMT as a core facility for python interactive proving.

https://www.philipzucker.com/sqrt2/ In this post I tried proving the irrationality of sqrt(2) using SMT. There was a nice proof I followed, but these two facts I couldn’t get to prove automatically.

mul_divides = ForAll([x,y,z], Implies(mul(x,y) == z, divides(z,x)))
two_prime = ForAll([x,y], Implies(divides(mul(x,y),2), Or(divides(x,2), divides(y,2))))


It appears to me that SMTs support of Int and Real is a trap for deeper theorems. This is not surprising. SMT solvers are a barrel of decision procedures strapped together via an egraph and SAT solver. There aren’t smart decision procedures for general mathematical questions. Properties of primes are notoriously tough https://en.wikipedia.org/wiki/Goldbach%27s_conjecture . Some properties of course are quite trivial so hope springs eternal.

My assumption is that if you want to tackle these properties, you need to retreat into more general syntactic approaches, using algebraic datatypes or undefined sorts. This is partially because the ematcher does not work well on Real or Int expressions. It normalizes them into something that can’t be matched on like uninterpreted sorts can be matched.

Because I want to use the high automation of z3 for simple questions though, I can defined reflection principles that just say that there is a homomorphism between the Nats and the built in Ints greater than 0.

## Kernel

Here is our proof kernel. I use the built in prove method which basically checks if the negation of the statement is satisfiable. Lemmas can be offered via the by keyword.

Temporarily admitting lemmas can be useful to just keep moving.

Some other features that might be nice in lemma

• Rip off the quantifiers for better countermodels when it fails
• Shorter timeout
• Check unsat cores and recommend which by were unnecessary. Or worse yet, if the lemma in question is not in the sat core, somehow you have unsatisfiable assumptions, which is indicative of an inconsistent axiomatization.

The distinction between theorems and formula is razor thin, which is nice for a fast blog post but not nice for a library or larger develpoment. It is only syntactic discipline. I should only be using named lemmas in the by justifications and the stdout should never say countermodel. It should be a sequence of proved

from z3 import *
B = BoolSort()
Z = IntSort()

prove(Implies(And(by), thm))
else:
return thm



## Nat Definitions

The basic defini9tions of Nat. I do use z3’s built in ADT capabilities. The distinction between this and undefined sorts with injectivity is interesting. ADTs disallow loopy expressions. This corresponds to an occurs check somewhere inside z3.

A (conservative extension) definition mechanism is that you are allowed to defined new symbols/functions if it has a defining equation that is terminating. I am assuming and not checking that these definitions are well founded. In other words, these definitions are axioms.

The importance of conservative extensions for the foundations of mathematics


Nat = Datatype("Nat")
Nat.declare('zero')
Nat.declare('succ', ('pred', Nat))
Nat = Nat.create()

n,m,k  = Consts('n m k', Nat)
reify = Function('reify', Nat, Z)
reify_def = ForAll([n], reify(n) == If(Nat.is_zero(n), 0, reify(Nat.pred(n)) + 1))

x,y = Ints("x y")
reflect = Function('reflect', Z, Nat)
reflect_def = ForAll([x], reflect(x) == If(x <= 0, Nat.zero, Nat.succ(reflect(x-1))))



More interesting perhaps is the induction axiom schema. https://en.wikipedia.org/wiki/Axiom_schema An axiom schema is a infinite family of axiomns.

We could represent this in different possible ways.

• a stream or generator enumerating the axioms. This is too british museum
• a trusted function that takes in some data to pick the axiom out of the schema you’d like
• Fully internalizing this parametrization into z3 with perhaps some kind of axiom schema datatype parametrizing the axiom schema.
• Using some higher order logic ArraySort(Nat,B) to encode the schema as ForAll([P],Implies(And(P(Nat.zero), ForAll([n], Implies(P(n), P(Nat.succ(n))))), ForAll([n], P(n)))) which can be directly asserted.

Anyway, I take the second option. It’s not so bad.

#def recurse(name,base,step):
#    f = Function(name, Nat, IntSort())
#    n = FreshInt()
#    return f, ForAll([n], If(Nat.is_zero(n), base, step(n, f(Nat.pred(n),base,step))))

def induct(P):
return Implies(And(P(Nat.zero), ForAll([n], Implies(P(n), P(Nat.succ(n))))),
#-----------------------------------------------------------
ForAll([n], P(n)))



We can use this axiom schema to get some crucial properties, like that reflect . reify ~> id and that the definition of a homomoprhism with respect to zero and succ.

reflect_reify = lemma(ForAll([n], reflect(reify(n)) == n), [reflect_def, reify_def, induct(lambda n: reflect(reify(n)) == n)])

reify_ge_0 = lemma(ForAll([n], reify(n) >= 0), [reify_def, induct(lambda n: reify(n) >= 0)])

zero_homo = lemma(reify(Nat.zero) == 0, [reify_def])
succ_homo = lemma(ForAll([n], reify(Nat.succ(n)) == 1 + reify(n)), [reify_def, induct(lambda n: reify(Nat.succ(n)) == 1 + reify(n))])


proved
proved
proved
proved


We can also define plus via reification and reflection. Immediately we can derive some computation rules not referring to Int. Maybe we should have taken these as primitive and then made plus_def the theorem?


plus = Function('plus', Nat, Nat, Nat)
plus_def = ForAll([n, m], plus(n, m) == reflect(reify(n) + reify(m)))

plus_0_x = lemma(ForAll([n], plus(Nat.zero, n) == n), [reflect_reify, zero_homo, plus_def])
plus_succ_x = lemma(ForAll([n,m], plus(Nat.succ(n), m) == Nat.succ(plus(n,m))), [reflect_reify, reflect_def, reify_def, succ_homo, plus_def])

plus_homo_case_succ = lemma(ForAll([n,m], Implies(reify(plus(n,m)) == reify(n) + reify(m) ,
reify(plus(Nat.succ(n),m)) == reify(Nat.succ(n)) + reify(m) )), [reify_def, plus_succ_x])
#plus_homo = lemma(ForAll([n,m], reify(reflect(reify(n) + reify(m))) == reify(n) + reify(m)), [reflect_reify, reflect_def, reify_def, induct(lambda n: ForAll([m], reify(reflect(reify(n) + reify(m))) == reify(n) + reify(m)))])
plus_homo = lemma(ForAll([n,m], reify(plus(n,m)) == reify(n) + reify(m)),
[plus_def, reflect_reify, reflect_def, reify_def, plus_homo_case_succ, induct(lambda n: ForAll([m], reify(plus(n,m)) == reify(n) + reify(m)))], admit=False)
#plus_l_expand = lemma(ForAll([n, m, k], plus(plus(n, m), k) == reflect(reify(reflect(reify(n) + reify(m))) + reify(k))), [plus_def])


proved
proved
proved
proved


plus_comm and plus_assoc are immediate from properties of Int. This is not how I started. I started by trying to prove plus_assoc and then realizing that really the concept of plus_homo was an important intermediary. This is often how these things go.

#_2 = lemma(ForAll([n,m,k], reflect(reify(reflect(reify(n) + reify(m))) + reify(k)) == reflect(reify(n) + reify(m) + reify(k))), [reflect_reify, plus_homo])
plus_comm = lemma(ForAll([n, m], plus(n, m) == plus(m, n)), [plus_def])
plus_assoc = lemma(ForAll([n, m, k], plus(plus(n, m), k) == plus(n, plus(m, k))), [plus_def, plus_homo])

one = Nat.succ(Nat.zero)
two = Nat.succ(Nat.succ(Nat.zero))

proved
proved


Second verse same as the first for multiplication.

mul = Function("mul", Nat,Nat, Nat)
mul_def = ForAll([n,m], mul(n,m) == reflect(reify(n) * reify(m)))

mul_0_x = lemma(ForAll([m], mul(Nat.zero,m) == Nat.zero), [reflect_def, reify_def, mul_def])

#mul_succ_x = lemma(ForAll([n,m], mul(Nat.succ(n), m) == reflect(reify(Nat.succ(n))*reify(m))), [plus_homo, mul_def, plus_def, succ_homo, reflect_reify, reflect_def, reify_def])
mul_succ_x_lemma = lemma(ForAll([n,m], reflect(reify(Nat.succ(n)) * reify(m)) == reflect((1 + reify(n)) * reify(m))), [succ_homo])
mul_succ_x_lemma2 = lemma(ForAll([n,m], reflect((1 + reify(n)) * reify(m)) == reflect(reify(m) + reify(n) * reify(m))))

# Hmmm. Need to think about how to dignify this principal.
reify_reflect = lemma(ForAll([x], Implies(x >= 0, reify(reflect(x)) == x)), [reify_def, reflect_def], admit=True)

mul_succ_x_lemma3 = lemma(ForAll([n,m], reflect(reify(m) + reify(n) * reify(m)) == reflect(reify(m) + reify(reflect(reify(n) * reify(m))))),
[reflect_reify, reify_reflect, reflect_def, reify_def])

mul_succ_x_lemma4 = lemma(ForAll([n,m], reflect(reify(m) + reify(reflect(reify(n) * reify(m)))) == plus(m, mul(n,m))),
[plus_def, mul_def])
#mul_succ_x = lemma(ForAll([n,m], mul(Nat.succ(n), m) == plus(m, mul(n,m))), [mul_def, reify_reflect, mul_succ_x_lemma, mul_succ_x_lemma2, mul_succ_x_lemma3, mul_succ_x_lemma2, plus_homo, reflect_reify, plus_def])
mul_succ_x = lemma(ForAll([n,m], mul(Nat.succ(n), m) == plus(m, mul(n,m))), [mul_def, plus_def, mul_succ_x_lemma, mul_succ_x_lemma2, mul_succ_x_lemma3, mul_succ_x_lemma4])

mul_comm = lemma(ForAll([n,m], mul(n,m) == mul(m,n)), [mul_def])


proved
proved
proved
Admitted!! ForAll(x, Implies(x >= 0, reify(reflect(x)) == x))
proved
proved
proved
proved


Could cvc5 show some of the things that z3 couldn’t in my post? Can I get the even odd style proof to go through

https://coq.inria.fr/doc/v8.10/stdlib/Coq.Init.Nat.html

even and odd kind of induct by 2. Maybe I need to derive this principle first?

lt = Function("lt", Nat, Nat, B)
lt_def = ForAll([n,m], lt(n,m) == reify(n) < reify(m))

max_ = Function("max", Nat, Nat, Nat)
max_def = ForAll([n,m], max_(n,m) == If(lt(n,m), m, n))

even = Function("even", Nat, B)
even_def = ForAll([n], even(n) == If(Nat.is_zero(n), True,
If(Nat.is_zero(Nat.pred(n)), False,
even(Nat.pred(Nat.pred(n))))))

odd = Function("odd", Nat, B)
odd_def = ForAll([n], odd(n) == If(Nat.is_zero(n), False,
If(Nat.is_zero(Nat.pred(n)), True,
odd(Nat.pred(Nat.pred(n))))))

# Sanity checking
even_zero = lemma(even(Nat.zero), [even_def])
not_even_one = lemma(Not(even(one)), [even_def])
odd_even = lemma(odd(one), [odd_def])
two_even = lemma(even(two), [even_def])

even_two = lemma(ForAll([n], even(Nat.succ(Nat.succ(n))) == even(n)), [even_def, induct(lambda n: even(Nat.succ(Nat.succ(n))) == even(n))])
odd_two = lemma(ForAll([n], odd(Nat.succ(Nat.succ(n))) == odd(n)), [odd_def, induct(lambda n: odd(Nat.succ(Nat.succ(n))) == odd(n))])

even_one = lemma(ForAll([n], even(n) == Not(even(Nat.succ(n)))), [even_def, even_two, induct(lambda n: even(n) == Not(even(Nat.succ(n))))])
odd_one = lemma(ForAll([n], odd(n) == Not(odd(Nat.succ(n)))), [odd_def, odd_two, induct(lambda n: odd(n) == Not(odd(Nat.succ(n))))])

# I am not sure why I needed these lemmas.
even_or_odd = lemma(ForAll([n], even(n) != odd(n)), [even_def, odd_def, even_two, even_one, odd_one, odd_two, induct(lambda n: even(n) != odd(n))])


proved
proved
proved
proved
proved
proved
proved
proved
proved

plus_x_x_even = lemma(ForAll([n], even(plus(n,n))), [plus_succ_x, even_def, even_two, even_one, induct(lambda n: even(plus(n,n)))], admit=True)
#mul_2_even = lemma(ForAll([n], even(mul(two, n))), by=[mul_succ_x])

failed to prove

---------------------------------------------------------------------------

Z3Exception                               Traceback (most recent call last)

File ~/.local/lib/python3.10/site-packages/z3/z3.py:7130, in Solver.model(self)
7129 try:
-> 7130     return ModelRef(Z3_solver_get_model(self.ctx.ref(), self.solver), self.ctx)
7131 except Z3Exception:

File ~/.local/lib/python3.10/site-packages/z3/z3core.py:4065, in Z3_solver_get_model(a0, a1, _elems)
4064 r = _elems.f(a0, a1)
-> 4065 _elems.Check(a0)
4066 return r

File ~/.local/lib/python3.10/site-packages/z3/z3core.py:1475, in Elementaries.Check(self, ctx)
1474 if err != self.OK:
-> 1475     raise self.Exception(self.get_error_message(ctx, err))

Z3Exception: b'there is no current model'

During handling of the above exception, another exception occurred:

Z3Exception                               Traceback (most recent call last)

/tmp/ipykernel_390381/4126509453.py in ?()
----> 1 plus_x_x_even = lemma(ForAll([n], even(plus(n,n))), [plus_succ_x, even_def, even_two, even_one, induct(lambda n: even(plus(n,n)))])
2 #mul_2_even = lemma(ForAll([n], even(mul(two, n))), by=[mul_succ_x])

----> 7         prove(Implies(And(by), thm))
8     else:
10     return thm

~/.local/lib/python3.10/site-packages/z3/z3.py in ?(claim, show, **keywords)
9087     if r == unsat:
9088         print("proved")
9089     elif r == unknown:
9090         print("failed to prove")
-> 9091         print(s.model())
9092     else:
9093         print("counterexample")
9094         print(s.model())

~/.local/lib/python3.10/site-packages/z3/z3.py in ?(self)
7128         """
7129         try:
7130             return ModelRef(Z3_solver_get_model(self.ctx.ref(), self.solver), self.ctx)
7131         except Z3Exception:
-> 7132             raise Z3Exception("model is not available")

Z3Exception: model is not available

div = Function("div", Nat, Nat, B)
div_def = ForAll([m,n], div(m,n) == Exists([k], mul(k,n) == m))

mul_1_x_1 = lemma(ForAll([n], mul(one, n) == plus(n,Nat.zero)), [mul_0_x, mul_succ_x, plus_0_x, plus_succ_x])
mul_1_x = lemma(ForAll([n], mul(one, n) == n), [mul_1_x_1, plus_0_x, plus_comm])

mul_x_1 = lemma(ForAll([n], mul(n, one) == n), [mul_1_x, mul_comm])

div_x_1 = lemma(ForAll([n], div(n, one)), [div_def, mul_x_1])

mul_two_x = lemma(ForAll([n], mul(two, n) == plus(n,n)), [mul_succ_x, mul_1_x, plus_0_x, plus_succ_x, plus_comm])

even_plus_0_0 = lemma(even(plus(Nat.zero, Nat.zero)), [even_def, plus_0_x])
even_succ_succ = lemma(ForAll([n], Implies(even(plus(n,n)),  even(plus(Nat.succ(n), Nat.succ(n))))), [even_def, even_two, plus_succ_x, plus_comm])
even_plus_x_x = lemma(ForAll([n], even(plus(n,n))), [even_plus_0_0, even_succ_succ, induct(lambda n: even(plus(n,n)))])

mul_two_even = lemma(ForAll([n], even(mul(two, n))), [mul_two_x, even_plus_x_x])


proved
proved
proved
proved
proved
proved
proved
proved
proved

_1 = lemma(ForAll([n], even(mul(two,mul(n,n)))), by=[mul_two_even])

_2 = lemma(ForAll([m,n], Implies(mul(two,mul(n,n)) == mul(m,m), even(mul(m,m)))), by = [_1])

mul_odd_odd   = lemma(ForAll([n], odd(n) == odd(mul(n,n))), [], admit=True)
mul_even_even = lemma(ForAll([n], even(n) == even(mul(n,n))), [], admit=True)

_3 = lemma(ForAll([m,n], Implies(mul(two,mul(n,n)) == mul(m,m), even(m))), by = [_2, mul_even_even])

# Ugh. It still feels far away. Still, it seems like I can chip away at it. Nothing is stopping me.


proved
Admitted!! ForAll(n, odd(n) == odd(mul(n, n)))
Admitted!! ForAll(n, even(n) == even(mul(n, n)))
proved
proved


# Bits and Bobbles

It really does have the flavor of using an interactive assistant. I stated many things which didn’t go through or hung until I went through more carefully and first proved the relevant lemmas.

Z3 handles logical equivalences quite well. This is kind of a whole pile of stuff that you do manually at first in coq etc. We immeditately jump into meatier things.

It has poor induction support. Vampire has some? I could internalize induction as a higher order statement about ArraySort(Nat,B) predicates. It might not do much better even with that.

You can export smtlib formula and consume them in vampire or cvc5. A sledgehammer/why3 spread spectrum approach is possible.

Being so shallow makes it had to have a theorem database to try and search over.

No good simp. Z3 simplify I think can only use it’s built in rules. Could build an external one, or interop with egglog

My inductions we mostly just restating the theorem. I should really deduplicate that somehow. Maybe enforce induction schema to use z3 Lambda instead of python. Eh.

For these equational proofs, if we tossed them into twee, they give quite pretty proofs

I’ve been at this for quite some time on and off.

Is what I’m doing now any better than then?

For equational proofs it;s nice to write t1 == t2 == t3 == t4. The calc combinator can do this. Lean has a calc and so does dafny. Also agda had a style like this.

def calc(vars, *args, by=[]):
lemmas = []
for lhs,rhs in zip(args[:-1],args[1:]) :
lemmas.append(lemma(ForAll(vars, lhs == rhs), by))
return lemma(ForAll(vars, args[0] == args[-1]), by=lemmas)

#calc([n,m,k], one + (a + b), m, k)


counterexample
[]
counterexample
[]
proved


∀n, m, k : n = k

# maps into IntSort() put into linarith
linarith = []

# I_homo as a structure/typeclass.
# put ring axioms into ring? Meh.
# ring = []



Adding pattern matching to z3 post hoc. Very useful if we want to start defining introspective tactics in python. simp, unify, apply for example.

type(n).__mro__

AstRef.childs = property(lambda self: self.children())
match plus(n,n):
print(foo)
print(x)
print(mul)
mul

[n, n]
<__main__.Foo object at 0x7594f3f77eb0>
plus


plus




Point = Datatype("Point")
Point.declare("mk-point", ("x", Z), ("y", Z))
Point = Point.create()

DSegment = Datatype("DSegment")
DSegment.declare("mk-dseg", ("a", Point), ("b", Point))
DSegment = DSegment.create()

# line is equivalence class of segments.
# we can't really give canonical coordinates to lines.
# ax + by = c is scale invariant
# y = mx + b can't represent vertical lines.
# y = lam (a - b) + b  could use other points



Maybe more intriguing is to do projective geomtry first though

Here we can use the same kind of reify/reflect into coordinates, which are a pretty dominant method for geometric proving as I understand it (Wu’s method, grobner, etc).

Point = DeclareSort("Point")
R = RealSort()
x,y,z,p,q,w = Consts("x y z p q w", R)
mkpoint = Function("mk-point", R, R, R, Point)
point_equal = ForAll([x,y,z,p,q,w], And(x*q - y*p == 0, y*w - z*q == 0, z*p - x*w == 0) == (mkpoint(x,y,z) == mkpoint(p,q,w)))
X = Function("X", Point, R)
Y = Function("Y", Point, R)
Z = Function("Z", Point, R)
mkpoint_XYZ = ForAll([p], mkpoint(X(p), Y(p), Z(p)) == p)

# right to left is trivial.
# left to right
coord_unique = ForAll([p],And(X(p) == X(q), Y(p) == Y(q), Z(p) == Z(p))  == (p == q))

# also could do exists lam, v = lam * w as equality.

class Foo():
pass

Bar = Foo
x = Bar()
match x:
case Bar():
print("bar")
case Foo():
print("foo")
case _:
print(type(x))

Fn = z3.ExprRef

match plus(n,m):
case Fn(plus, [y,z]):
print(n,m)

def match_(t, p, vars):
subst = {}
todo = [(t,p)]
while todo:
t,p = todo.pop()
match t,p:
case x, Fn(v, []) if p in vars:
if v in subst:
if subst[v].eq(x):
pass
else:
return None
else:
subst[v] = x
case Fn(f, args1), Fn(f2, args2):
if f1 == f2 and len(args1) == len(args2):
for a1,a2 in zip(args1, args2):
match_(a1, a2, vars)
else:
return None
case _:
print("no match")
return subst

def kbo():
pass
def resolve():
pass
def auto():
pass


bar
n m

#type(ForAll([n,m], plus(n,m) == plus(m,n)))

#PForAll = QuantifierRef(x) if x.is_forall()
# https://www.hillelwayne.com/post/python-abc/
from abc import ABC
class PForAll(ABC):
@classmethod
def __subclasshook__(cls, C):
return isinstance(C, QuantifierRef) and C.is_forall()

z3.z3.QuantifierRef