This is further explorations on the theme of using SMT as a core facility for python interactive proving. In this post I tried proving the irrationality of sqrt(2) using SMT. There was a nice proof I followed, but these two facts I couldn’t get to prove automatically.

mul_divides = ForAll([x,y,z], Implies(mul(x,y) == z, divides(z,x)))
two_prime = ForAll([x,y], Implies(divides(mul(x,y),2), Or(divides(x,2), divides(y,2))))

It appears to me that SMTs support of Int and Real is a trap for deeper theorems. This is not surprising. SMT solvers are a barrel of decision procedures strapped together via an egraph and SAT solver. There aren’t smart decision procedures for general mathematical questions. Properties of primes are notoriously tough . Some properties of course are quite trivial so hope springs eternal.

My assumption is that if you want to tackle these properties, you need to retreat into more general syntactic approaches, using algebraic datatypes or undefined sorts. This is partially because the ematcher does not work well on Real or Int expressions. It normalizes them into something that can’t be matched on like uninterpreted sorts can be matched.

Because I want to use the high automation of z3 for simple questions though, I can defined reflection principles that just say that there is a homomorphism between the Nats and the built in Ints greater than 0.


Here is our proof kernel. I use the built in prove method which basically checks if the negation of the statement is satisfiable. Lemmas can be offered via the by keyword.

Temporarily admitting lemmas can be useful to just keep moving.

Some other features that might be nice in lemma

  • Rip off the quantifiers for better countermodels when it fails
  • Shorter timeout
  • Check unsat cores and recommend which by were unnecessary. Or worse yet, if the lemma in question is not in the sat core, somehow you have unsatisfiable assumptions, which is indicative of an inconsistent axiomatization.

The distinction between theorems and formula is razor thin, which is nice for a fast blog post but not nice for a library or larger develpoment. It is only syntactic discipline. I should only be using named lemmas in the by justifications and the stdout should never say countermodel. It should be a sequence of proved

from z3 import *
B = BoolSort()
Z = IntSort()

def lemma(thm, by=[], admit=False):
    if not admit:
        prove(Implies(And(by), thm))
        print("Admitted!!", thm)
    return thm

Nat Definitions

The basic defini9tions of Nat. I do use z3’s built in ADT capabilities. The distinction between this and undefined sorts with injectivity is interesting. ADTs disallow loopy expressions. This corresponds to an occurs check somewhere inside z3.

A (conservative extension) definition mechanism is that you are allowed to defined new symbols/functions if it has a defining equation that is terminating. I am assuming and not checking that these definitions are well founded. In other words, these definitions are axioms.

The importance of conservative extensions for the foundations of mathematics

Nat = Datatype("Nat")
Nat.declare('succ', ('pred', Nat))
Nat = Nat.create()

n,m,k  = Consts('n m k', Nat)
reify = Function('reify', Nat, Z)
reify_def = ForAll([n], reify(n) == If(Nat.is_zero(n), 0, reify(Nat.pred(n)) + 1))

x,y = Ints("x y")
reflect = Function('reflect', Z, Nat)
reflect_def = ForAll([x], reflect(x) == If(x <= 0,, Nat.succ(reflect(x-1))))

More interesting perhaps is the induction axiom schema. An axiom schema is a infinite family of axiomns.

We could represent this in different possible ways.

  • a stream or generator enumerating the axioms. This is too british museum
  • a trusted function that takes in some data to pick the axiom out of the schema you’d like
  • Fully internalizing this parametrization into z3 with perhaps some kind of axiom schema datatype parametrizing the axiom schema.
  • Using some higher order logic ArraySort(Nat,B) to encode the schema as ForAll([P],Implies(And(P(, ForAll([n], Implies(P(n), P(Nat.succ(n))))), ForAll([n], P(n)))) which can be directly asserted.

Anyway, I take the second option. It’s not so bad.

#def recurse(name,base,step):
#    f = Function(name, Nat, IntSort())
#    n = FreshInt()
#    return f, ForAll([n], If(Nat.is_zero(n), base, step(n, f(Nat.pred(n),base,step))))

def induct(P):
    return Implies(And(P(, ForAll([n], Implies(P(n), P(Nat.succ(n))))),
                   ForAll([n], P(n)))

We can use this axiom schema to get some crucial properties, like that reflect . reify ~> id and that the definition of a homomoprhism with respect to zero and succ.

reflect_reify = lemma(ForAll([n], reflect(reify(n)) == n), [reflect_def, reify_def, induct(lambda n: reflect(reify(n)) == n)])

reify_ge_0 = lemma(ForAll([n], reify(n) >= 0), [reify_def, induct(lambda n: reify(n) >= 0)])

zero_homo = lemma(reify( == 0, [reify_def])
succ_homo = lemma(ForAll([n], reify(Nat.succ(n)) == 1 + reify(n)), [reify_def, induct(lambda n: reify(Nat.succ(n)) == 1 + reify(n))])


We can also define plus via reification and reflection. Immediately we can derive some computation rules not referring to Int. Maybe we should have taken these as primitive and then made plus_def the theorem?

plus = Function('plus', Nat, Nat, Nat)
plus_def = ForAll([n, m], plus(n, m) == reflect(reify(n) + reify(m)))

plus_0_x = lemma(ForAll([n], plus(, n) == n), [reflect_reify, zero_homo, plus_def]) 
plus_succ_x = lemma(ForAll([n,m], plus(Nat.succ(n), m) == Nat.succ(plus(n,m))), [reflect_reify, reflect_def, reify_def, succ_homo, plus_def])

plus_homo_case_succ = lemma(ForAll([n,m], Implies(reify(plus(n,m)) == reify(n) + reify(m) ,
                                                  reify(plus(Nat.succ(n),m)) == reify(Nat.succ(n)) + reify(m) )), [reify_def, plus_succ_x])
#plus_homo = lemma(ForAll([n,m], reify(reflect(reify(n) + reify(m))) == reify(n) + reify(m)), [reflect_reify, reflect_def, reify_def, induct(lambda n: ForAll([m], reify(reflect(reify(n) + reify(m))) == reify(n) + reify(m)))])
plus_homo = lemma(ForAll([n,m], reify(plus(n,m)) == reify(n) + reify(m)), 
                  [plus_def, reflect_reify, reflect_def, reify_def, plus_homo_case_succ, induct(lambda n: ForAll([m], reify(plus(n,m)) == reify(n) + reify(m)))], admit=False)
#plus_l_expand = lemma(ForAll([n, m, k], plus(plus(n, m), k) == reflect(reify(reflect(reify(n) + reify(m))) + reify(k))), [plus_def])


plus_comm and plus_assoc are immediate from properties of Int. This is not how I started. I started by trying to prove plus_assoc and then realizing that really the concept of plus_homo was an important intermediary. This is often how these things go.

#_2 = lemma(ForAll([n,m,k], reflect(reify(reflect(reify(n) + reify(m))) + reify(k)) == reflect(reify(n) + reify(m) + reify(k))), [reflect_reify, plus_homo])
plus_comm = lemma(ForAll([n, m], plus(n, m) == plus(m, n)), [plus_def])
plus_assoc = lemma(ForAll([n, m, k], plus(plus(n, m), k) == plus(n, plus(m, k))), [plus_def, plus_homo])

one = Nat.succ(
two = Nat.succ(Nat.succ(

Second verse same as the first for multiplication.

mul = Function("mul", Nat,Nat, Nat)
mul_def = ForAll([n,m], mul(n,m) == reflect(reify(n) * reify(m)))
mul_0_x = lemma(ForAll([m], mul(,m) ==, [reflect_def, reify_def, mul_def])

#mul_succ_x = lemma(ForAll([n,m], mul(Nat.succ(n), m) == reflect(reify(Nat.succ(n))*reify(m))), [plus_homo, mul_def, plus_def, succ_homo, reflect_reify, reflect_def, reify_def])
mul_succ_x_lemma = lemma(ForAll([n,m], reflect(reify(Nat.succ(n)) * reify(m)) == reflect((1 + reify(n)) * reify(m))), [succ_homo])
mul_succ_x_lemma2 = lemma(ForAll([n,m], reflect((1 + reify(n)) * reify(m)) == reflect(reify(m) + reify(n) * reify(m))))

# Hmmm. Need to think about how to dignify this principal.
reify_reflect = lemma(ForAll([x], Implies(x >= 0, reify(reflect(x)) == x)), [reify_def, reflect_def], admit=True)

mul_succ_x_lemma3 = lemma(ForAll([n,m], reflect(reify(m) + reify(n) * reify(m)) == reflect(reify(m) + reify(reflect(reify(n) * reify(m))))), 
                          [reflect_reify, reify_reflect, reflect_def, reify_def])

mul_succ_x_lemma4 = lemma(ForAll([n,m], reflect(reify(m) + reify(reflect(reify(n) * reify(m)))) == plus(m, mul(n,m))), 
                          [plus_def, mul_def])
#mul_succ_x = lemma(ForAll([n,m], mul(Nat.succ(n), m) == plus(m, mul(n,m))), [mul_def, reify_reflect, mul_succ_x_lemma, mul_succ_x_lemma2, mul_succ_x_lemma3, mul_succ_x_lemma2, plus_homo, reflect_reify, plus_def])
mul_succ_x = lemma(ForAll([n,m], mul(Nat.succ(n), m) == plus(m, mul(n,m))), [mul_def, plus_def, mul_succ_x_lemma, mul_succ_x_lemma2, mul_succ_x_lemma3, mul_succ_x_lemma4])

mul_comm = lemma(ForAll([n,m], mul(n,m) == mul(m,n)), [mul_def])

Admitted!! ForAll(x, Implies(x >= 0, reify(reflect(x)) == x))

Could cvc5 show some of the things that z3 couldn’t in my post? Can I get the even odd style proof to go through

even and odd kind of induct by 2. Maybe I need to derive this principle first?

lt = Function("lt", Nat, Nat, B)
lt_def = ForAll([n,m], lt(n,m) == reify(n) < reify(m))

max_ = Function("max", Nat, Nat, Nat)
max_def = ForAll([n,m], max_(n,m) == If(lt(n,m), m, n))

even = Function("even", Nat, B)
even_def = ForAll([n], even(n) == If(Nat.is_zero(n), True, 
                                  If(Nat.is_zero(Nat.pred(n)), False, 

odd = Function("odd", Nat, B)
odd_def = ForAll([n], odd(n) == If(Nat.is_zero(n), False, 
                                  If(Nat.is_zero(Nat.pred(n)), True, 

# Sanity checking
even_zero = lemma(even(, [even_def])
not_even_one = lemma(Not(even(one)), [even_def])
odd_even = lemma(odd(one), [odd_def])
two_even = lemma(even(two), [even_def])

even_two = lemma(ForAll([n], even(Nat.succ(Nat.succ(n))) == even(n)), [even_def, induct(lambda n: even(Nat.succ(Nat.succ(n))) == even(n))])
odd_two = lemma(ForAll([n], odd(Nat.succ(Nat.succ(n))) == odd(n)), [odd_def, induct(lambda n: odd(Nat.succ(Nat.succ(n))) == odd(n))])

even_one = lemma(ForAll([n], even(n) == Not(even(Nat.succ(n)))), [even_def, even_two, induct(lambda n: even(n) == Not(even(Nat.succ(n))))])
odd_one = lemma(ForAll([n], odd(n) == Not(odd(Nat.succ(n)))), [odd_def, odd_two, induct(lambda n: odd(n) == Not(odd(Nat.succ(n))))])

# I am not sure why I needed these lemmas.
even_or_odd = lemma(ForAll([n], even(n) != odd(n)), [even_def, odd_def, even_two, even_one, odd_one, odd_two, induct(lambda n: even(n) != odd(n))])

plus_x_x_even = lemma(ForAll([n], even(plus(n,n))), [plus_succ_x, even_def, even_two, even_one, induct(lambda n: even(plus(n,n)))], admit=True)
#mul_2_even = lemma(ForAll([n], even(mul(two, n))), by=[mul_succ_x])
failed to prove


Z3Exception                               Traceback (most recent call last)

File ~/.local/lib/python3.10/site-packages/z3/, in Solver.model(self)
   7129 try:
-> 7130     return ModelRef(Z3_solver_get_model(self.ctx.ref(), self.solver), self.ctx)
   7131 except Z3Exception:

File ~/.local/lib/python3.10/site-packages/z3/, in Z3_solver_get_model(a0, a1, _elems)
   4064 r = _elems.f(a0, a1)
-> 4065 _elems.Check(a0)
   4066 return r

File ~/.local/lib/python3.10/site-packages/z3/, in Elementaries.Check(self, ctx)
   1474 if err != self.OK:
-> 1475     raise self.Exception(self.get_error_message(ctx, err))

Z3Exception: b'there is no current model'

During handling of the above exception, another exception occurred:

Z3Exception                               Traceback (most recent call last)

/tmp/ipykernel_390381/ in ?()
----> 1 plus_x_x_even = lemma(ForAll([n], even(plus(n,n))), [plus_succ_x, even_def, even_two, even_one, induct(lambda n: even(plus(n,n)))])
      2 #mul_2_even = lemma(ForAll([n], even(mul(two, n))), by=[mul_succ_x])

/tmp/ipykernel_390381/ in ?(thm, by, admit)
      5 def lemma(thm, by=[], admit=False):
      6     if not admit:
----> 7         prove(Implies(And(by), thm))
      8     else:
      9         print("Admitted!!", thm)
     10     return thm

~/.local/lib/python3.10/site-packages/z3/ in ?(claim, show, **keywords)
   9087     if r == unsat:
   9088         print("proved")
   9089     elif r == unknown:
   9090         print("failed to prove")
-> 9091         print(s.model())
   9092     else:
   9093         print("counterexample")
   9094         print(s.model())

~/.local/lib/python3.10/site-packages/z3/ in ?(self)
   7128         """
   7129         try:
   7130             return ModelRef(Z3_solver_get_model(self.ctx.ref(), self.solver), self.ctx)
   7131         except Z3Exception:
-> 7132             raise Z3Exception("model is not available")

Z3Exception: model is not available
div = Function("div", Nat, Nat, B)
div_def = ForAll([m,n], div(m,n) == Exists([k], mul(k,n) == m))

mul_1_x_1 = lemma(ForAll([n], mul(one, n) == plus(n,, [mul_0_x, mul_succ_x, plus_0_x, plus_succ_x])
mul_1_x = lemma(ForAll([n], mul(one, n) == n), [mul_1_x_1, plus_0_x, plus_comm])

mul_x_1 = lemma(ForAll([n], mul(n, one) == n), [mul_1_x, mul_comm])

div_x_1 = lemma(ForAll([n], div(n, one)), [div_def, mul_x_1])

mul_two_x = lemma(ForAll([n], mul(two, n) == plus(n,n)), [mul_succ_x, mul_1_x, plus_0_x, plus_succ_x, plus_comm])

even_plus_0_0 = lemma(even(plus(,, [even_def, plus_0_x])
even_succ_succ = lemma(ForAll([n], Implies(even(plus(n,n)),  even(plus(Nat.succ(n), Nat.succ(n))))), [even_def, even_two, plus_succ_x, plus_comm])
even_plus_x_x = lemma(ForAll([n], even(plus(n,n))), [even_plus_0_0, even_succ_succ, induct(lambda n: even(plus(n,n)))])

mul_two_even = lemma(ForAll([n], even(mul(two, n))), [mul_two_x, even_plus_x_x])

_1 = lemma(ForAll([n], even(mul(two,mul(n,n)))), by=[mul_two_even])
_2 = lemma(ForAll([m,n], Implies(mul(two,mul(n,n)) == mul(m,m), even(mul(m,m)))), by = [_1])

mul_odd_odd   = lemma(ForAll([n], odd(n) == odd(mul(n,n))), [], admit=True)
mul_even_even = lemma(ForAll([n], even(n) == even(mul(n,n))), [], admit=True)

_3 = lemma(ForAll([m,n], Implies(mul(two,mul(n,n)) == mul(m,m), even(m))), by = [_2, mul_even_even])

# Ugh. It still feels far away. Still, it seems like I can chip away at it. Nothing is stopping me.

Admitted!! ForAll(n, odd(n) == odd(mul(n, n)))
Admitted!! ForAll(n, even(n) == even(mul(n, n)))

Bits and Bobbles

It really does have the flavor of using an interactive assistant. I stated many things which didn’t go through or hung until I went through more carefully and first proved the relevant lemmas.

Z3 handles logical equivalences quite well. This is kind of a whole pile of stuff that you do manually at first in coq etc. We immeditately jump into meatier things.

It has poor induction support. Vampire has some? I could internalize induction as a higher order statement about ArraySort(Nat,B) predicates. It might not do much better even with that.

You can export smtlib formula and consume them in vampire or cvc5. A sledgehammer/why3 spread spectrum approach is possible.

Being so shallow makes it had to have a theorem database to try and search over.

No good simp. Z3 simplify I think can only use it’s built in rules. Could build an external one, or interop with egglog

My inductions we mostly just restating the theorem. I should really deduplicate that somehow. Maybe enforce induction schema to use z3 Lambda instead of python. Eh.

For these equational proofs, if we tossed them into twee, they give quite pretty proofs

I’ve been at this for quite some time on and off.

Is what I’m doing now any better than then?

For equational proofs it;s nice to write t1 == t2 == t3 == t4. The calc combinator can do this. Lean has a calc and so does dafny. Also agda had a style like this.

def calc(vars, *args, by=[]):
    lemmas = []
    for lhs,rhs in zip(args[:-1],args[1:]) :
        lemmas.append(lemma(ForAll(vars, lhs == rhs), by))
    return lemma(ForAll(vars, args[0] == args[-1]), by=lemmas)

#calc([n,m,k], one + (a + b), m, k)


∀n, m, k : n = k

# maps into IntSort() put into linarith
linarith = []

# I_homo as a structure/typeclass.
# put ring axioms into ring? Meh.
# ring = []

Adding pattern matching to z3 post hoc. Very useful if we want to start defining introspective tactics in python. simp, unify, apply for example.


AstRef.childs = property(lambda self: self.children())
AstRef.head = property(lambda self: self.decl())
AstRef.__match_args__ = ("head", "childs")
match plus(n,n):
    case AstRef(head=mul, childs=foo):
[n, n]
<__main__.Foo object at 0x7594f3f77eb0>


What about euclidean geometry?

Point = Datatype("Point")
Point.declare("mk-point", ("x", Z), ("y", Z))
Point = Point.create()

DSegment = Datatype("DSegment")
DSegment.declare("mk-dseg", ("a", Point), ("b", Point))
DSegment = DSegment.create()

# line is equivalence class of segments.
# we can't really give canonical coordinates to lines.
# ax + by = c is scale invariant
# y = mx + b can't represent vertical lines.
# y = lam (a - b) + b  could use other points


Maybe more intriguing is to do projective geomtry first though

Here we can use the same kind of reify/reflect into coordinates, which are a pretty dominant method for geometric proving as I understand it (Wu’s method, grobner, etc).

Point = DeclareSort("Point")
R = RealSort()
x,y,z,p,q,w = Consts("x y z p q w", R)
mkpoint = Function("mk-point", R, R, R, Point)
point_equal = ForAll([x,y,z,p,q,w], And(x*q - y*p == 0, y*w - z*q == 0, z*p - x*w == 0) == (mkpoint(x,y,z) == mkpoint(p,q,w)))
X = Function("X", Point, R)
Y = Function("Y", Point, R)
Z = Function("Z", Point, R)
mkpoint_XYZ = ForAll([p], mkpoint(X(p), Y(p), Z(p)) == p)

# right to left is trivial.
# left to right
coord_unique = ForAll([p],And(X(p) == X(q), Y(p) == Y(q), Z(p) == Z(p))  == (p == q))

# also could do exists lam, v = lam * w as equality.
class Foo():

Bar = Foo
x = Bar()
match x:
    case Bar():
    case Foo():
    case _:
Fn = z3.ExprRef

match plus(n,m):
    case Fn(plus, [y,z]):

def match_(t, p, vars):
    subst = {}
    todo = [(t,p)]
    while todo:
        t,p = todo.pop()
        match t,p:
            case x, Fn(v, []) if p in vars:
                if v in subst:
                    if subst[v].eq(x):
                        return None
                    subst[v] = x
            case Fn(f, args1), Fn(f2, args2):
                if f1 == f2 and len(args1) == len(args2):
                    for a1,a2 in zip(args1, args2):
                        match_(a1, a2, vars)
                    return None
            case _:
                print("no match")
    return subst

def kbo():
def resolve():
def auto():

n m
#type(ForAll([n,m], plus(n,m) == plus(m,n)))

#PForAll = QuantifierRef(x) if x.is_forall()
from abc import ABC
class PForAll(ABC):
    def __subclasshook__(cls, C):
        return isinstance(C, QuantifierRef) and C.is_forall()