See also notes on:

  • SMT
  • prolog
  • datalog


TPTP Thousands of Problems for Theorem Provers is an incredible resource Quick Guide

A couple different input formats. As a human, tff is probably the best. Typed first order formula. cnf is painful, fof is untyped and you’re likely to screw it up imo. tff offers built in ints, which is cool.

TPTP in latex Axiom List Highlights:

  • PUZ - puzzles
  • GEO - geometry
  • GRP - group theory
  • CAT - category theory
  • SET - set theory
  • NUM - number stuff robinson arithmetic
  • TOP
  • DAT - datatypes

Automated reasoning book

  • puzzles
  • circuit design (other sythesis problems?)
  • program verification

Using for higher order or typed systems



Groups Lattices kleene algebra, kat

Categories Set theory Geometry



  • Vampire
  • E Prover
  • Zipperposition
  • on an in memory database
  • Twee: An Equational Theorem Prover


First-Order Theorem Proving and VAMPIRE vampire tutorial

%----All (hu)men are created equal. John is a human. John got an F grade. 
%----There is someone (a human) who got an A grade. An A grade is not 
%----equal to an F grade. Grades are not human. Therefore there is a 
%----human other than John. 
    ! [H1,H2] : 
      ( ( human(H1) 
         & human(H2) ) 
     => created_equal(H1,H2) ) )). 

    human(john) )). 

    grade(john) = f )). 

    ? [H] : 
      ( human(H) 
      & grade(H) = a ) )). 

    a != f )). 

    ! [G] : ~ human(grade(G)) )). 

    ? [H] : 
      ( human(H) 
      & H != john ) )). 
% a silly axiomatization of add and theorem.
% using typed formula

    num: $tType ).
    add: ( num * num ) > num ).
    zero: num ).
    ! [X: num] : add(X, zero) = X ).
    ! [X: num, Y : num] : add(X, Y) = add(Y,X) ).
tff(mytheorem, conjecture, ? [X : num] : add(zero, X) = X).

    list: $tType ).

    nil: list ).

    mycons: ( $int * list ) > list ).

    sorted: list > $o ).

    sorted(nil) ).

    ! [X: $int] : sorted(mycons(X,nil)) ).

    ! [X: $int,Y: $int,R: list] :
      ( ( $less(X,Y)
        & sorted(mycons(Y,R)) )
     => sorted(mycons(X,mycons(Y,R))) ) ).

    sorted(mycons(1,mycons(2,mycons(4,mycons(7,mycons(100,nil)))))) ).

Array axioms. Also axioms for int collections, int map, heaps. They all boil to pretty similar stuff though

    array: $tType ).

    read: ( array * $int ) > $int ).

    write: ( array * $int * $int ) > array ).

    ! [U: array,V: $int,W: $int] : ( read(write(U,V,W),V) = W ) ).

    ! [X: array,Y: $int,Z: $int,X1: $int] :
      ( ( Y = Z )
      | ( read(write(X,Y,X1),Z) = read(X,Z) ) ) ).
    array: $tType ).

    read: ( array * $int ) > $int ).

    write: ( array * $int * $int ) > array ).

    ! [U: array,V: $int,W: $int] : ( read(write(U,V,W),V) = W ) ).

    ! [X: array,Y: $int,Z: $int,X1: $int] :
      ( ( Y = Z )
      | ( read(write(X,Y,X1),Z) = read(X,Z) ) ) ).
fof(ground, axiom,
    edge(1,2) & edge(2,3)
%fof(path1, axiom,
%    ![X,Y]: (edge(X,Y) => path(X,Y))
%fof(path2, axiom,
%    ![X,Y,Z]:  ((edge(X,Y) & path(Y,Z)) => path(X,Z))
fof(clark, axiom,
    ![X,Z]: (((?[Y]: (edge(X,Y) & path(Y,Z))) | edge(X,Z)) <=> path(X,Z))

fof(myquery, conjecture,
    ?[X,Y]: path(X,Y)

vampire -av off –question_answering answer_literal

fof(a,conjecture,?[X]: (prover(X) & tutorial(X))).


inference restrictions --unit_resulting_resolution

echo "
fof(ground, axiom,
    edge(a,b) & edge(b,c)
fof(path1, axiom,
    ![X,Y]: (edge(X,Y) => path(X,Y))
fof(path2, axiom,
    ![X,Y,Z]:  ((edge(X,Y) & path(Y,Z)) => path(X,Z))
" |  vampire  --unit_resulting_resolution on

Hmm. Vampire with the option or not. Not really sure what to take away from that. It might be recogzninig a datalog fragment If I wanted to use this like an egraph, I can’t look inside the equality. It’s entirely unobseravle. Maybe I could use the reflection trick.

E prover

enormalizer is an interesting sounding program. Give it a pile of unit equalities and it will normalize with respect to thm EPR grounder to DIMACS The e calculus is a bit puzzling. I haven’t seen the analog for vampire

2.6 manual It’s also in the github repo if you make doc

I like how –answer mode works a little better for e.

Database printing feature -S. Doesn’t print stuff I would expect though? Kind of prints everything by default right? Early stopping conditions clause size

eprover --help | less

--watchlist - “constructive” proofs that aren’t seeded by refuation training - you can train it if you have a database of indicative theorems. This might be useful if you have a sequence of increasingly hard theorems, or if you are making a tool that spits out formula. -S print saturated clause set -W literaeelection stretagory NoGeneration will inhibit all generating instances “Each of the strategies that do actually select negative literals has a corresponding counterpart starting with P that additionally allows paramodulation into maximal positive literals”

echo "
fof(ground, axiom,
    edge(a,b) & edge(b,c)
fof(path1, axiom,
    ![X,Y]: (edge(X,Y) => path(X,Y))
fof(path2, axiom,
    ![X,Y,Z]:  ((edge(X,Y) & path(Y,Z)) => path(X,Z))
" | eprover  --literal-selection-strategy=SelectNegativeLiterals

--generated-limit=100 Ok this basically did what i wanted. I’m not sure what it is though?

“The most natural clause representation for E is probably a literal disjunction: a=$true;b!=$true;c!=$true.”


Prover9 and Mace4 Prover9 is intriguing. It seems to have interaction modalities that other provers don’t Mace proverx a web gui variant

Some examples I didn’t write:

echo "% Prove that a group in which all elements have order 2 is commutative.

assign(max_seconds, 60).


  % group axioms
  (x * y) * z = x * (y * z).   % associativity
  1 * x = x.                   % left identity
  x * 1 = x.                   % right identity
  x' * x = 1.                  % lef inverse
  x * x' = 1.                  % right inverse
  % special assumption
  x * x = 1.                  % all elements have order 2


  x * y = y * x.
" | prover9
    assign(order, kbo).
    assign(max_weight, 25).


% lattice theory

x v y = y v x.
(x v y) v z = x v (y v z).
x ^ y = y ^ x.
(x ^ y) ^ z = x^ (y ^ z).
x ^ (x v y) = x.
x v (x ^ y) = x.


(x ^ y) v (x ^ z) = x ^ ((y ^ (x v z)) v (z ^ (x v y))) # label(H82).

x ^ (y v (x ^ z)) = x ^ (y v (z ^ ((x ^ (y v z)) v (y ^ z)))) # label(H2).

assign(order, kbo).  % The default (lpo) al works, but it takes longer.


% Prover9 should produce a proof in a few seconds.

% combinators B and W

  a(a(a(B,x),y),z) = a(x,a(y,z))   # label(B).
  a(a(W,x),y) = a(a(x,y),y)        # label(W).



  % existence of a fixed point combinator

  exists Q all x (a(Q,x) = a(x,a(Q,x))) # label(fixed_point_combinator).


% Here is a different way of specifying the goal, which tells us
% what the fixed point combinator is.  We use an answer attribute.
% Unfortunately, answer attributes are not allowed on non-clausal
% (e.g., goal) formulas, so we deny the goal and state it as an assumption.
% Note that this introduces the Skolem function f.
% formulas(assumptions).
% a(x,f(x)) != a(f(x),a(x,f(x))) # answer(x).
% end_of_list.

Otter otter lambda - dumping lambda terms in otter? Otter seemed to offer more control over the resolution process Prover9 is the succssor to otter

otter manual “Although OTTER has an autonomous mode, most work with OTTER involves interaction with the user”

LeanTAP and ilk

See prolog some Build Your Own First-Order Prover


# atomic formula first. Ground is basically the same.

# sets are unhashable, so are lists
#print({[], []})

# One canonical form is sorted, deduped pos,neg lists for clauses
def clause(pos,neg):
    return (tuple(sorted(set(pos))), tuple(sorted(set(neg))))

print(clause(["a","a"], ["c"]))
print(clause(["foo(a)","bar(a)"], ["c"]))

# we could also put a wrapper around set that does have a hash defined

def resolve(C1,C2):

    if len(C1[0].intersect(C2[1])) > 0:

processed = set()
unprocessed = []
while len(unprocessed) > 0:
    C = unprocessed.pop()
    for C1 in processed:

# terms are sexp style lists
# ["f", ["a"], ["g", "b"]]

A sqlized given-clause loop

unproc = []
proc = sqlite3.connect(":memory:") # store processed clauses in DB.
while len(unproc) > 0:
    sql = query(C)
    unproc.append(filter(cur.fetchall(), lambda: do_some_filtering))
    cur.execute("INSERT INTO", C)

Ground Subsumption queyr on clauses. Find all sets that have C as subset

CREATE Table lit (clause INTEGER, lit STRING); -- UNIQUE (clause, lit) gives redundant lit deletion
-- find all clauses c such it contains literal a,b,c.
SELECT DISTINCT clause from lit a, lit b, lit c where a.clause = b.clause and a.clause = c.clause and a.lit = ? and b.lit = ? and c.lit = ?

Find all sets that have C as superset

SELECT DISTINCT clause from lit a 
    group by a.clause 
    HAVING COUNT(CASE WHEN lit = ? THEN 1 END) +  COUNT(CASE WHEN lit = ? THEN 1 END) +  COUNT(CASE WHEN lit = ? THEN 1 END) = count(*);
    count(case when lit != ? or lit != ? or lit != ? then 1 end) = 0
    count(*) <= 3 -- would this help or hurt? redundant info 
    min(case when lit = ? then 1 when lit = ? when else 0 ) = 1

t1 = "fah0"
t2 = "f01"
i1 = 0
i2 = 0
while i1 < len(t1) and i2 < len(t2):
    c1 = t1[i1] 
    c2 = t2[i2]
    if t2 

grammar = '''
E : F | G | Var | A
F : "f" E E
A : "a"

grammar = '''
Term : CNAME | CNAME "("  ("," Term)* ")" | Var

class Unify(Transformer):
    env = {}
    def Var(self, v):
        if v in env:
    def term

Big question is, what is the representation of formulas, literals, clauses.

  • Strings
  • Json
  • Relational Terms / hashconsed
  • Feature vectors.


logtk. Huh. LPO as a constraint solving problem

demo resolution prover

#use "topfind";;
#require "logtk";;
print_endline "hello world";;
open Logtk
(* *)
module T = Term
let f_ = ID.make "f"
let g_ = ID.make "g"
let ty = Type.term
let ty_t = Term.of_ty ty 
let prop = Type.prop
let f_fun = (T.const ~ty:Type.([ty;ty] ==> ty) f_)
let f x y = f_fun [x; y]

let prec = Precedence.default [f_;g_]
(*let ord = Logtk.Ordering.lambda_kbo prec *)
let names = Ordering.names ()

let () = List.iter (fun name -> print_endline name) names

Datalog vs ATP

What makes an ATP that different from a datalog? Both are saturating. If I could limit of prune clauses of size > N, that might emulate a datalog. Or perhaps protect rule-rule resolution from happening.

The given clause algorithm has some flavor of seminaive. Datalog focuss on unit clauses Not at all clear how to do extraction on a paramodulated database.

Given that ATP subsumes in some sense both prolog and datalog, what does it say about Magic Set?

Extraction from ATP. The equality is invisible. The copying trick?

Prolog vs ATP

Set of support strategy with horn clauses in passive set, goal in active set, hyperresolution as only inference rule behaves akin to prolog.

What is the difference between E and a PROLOG system?

Prolog = is unification. ATP = is rewrite/semantical.


Relationship to focusing

-Set of Support - Only infer on clauses that trace provenance to user specified set of support. The Predicate Ordering Syntactic Refinement - order predicates. Only resolve on lowest Locking Refinement - mark each literal in clause and resolve only on lowest one atom ordering

  • Resonance Strategy
  • Weighting
  • Methodology

    Given clause Discount loop


  • see note on unification Two way matching. Is there a substitution for variables that solves an equation. Yes

Anti-unification is an interesting topic. Generalization.

from unification import var, unify
x = var("x")
y = var("y")
y1 = var("y") # same y.
print(unify((x,y), (y1,1))) # returns subst dict

Or can use Maude’s unification. Or prolog’s. Or just write it.


Put clauses in cnf. Each clause is an \/ of positive and negative atoms. Take two clauses that have matching (unifiable) positive and negative pieces

Hyperresolution You are seeking clauses of only positive literals. Only use a “nuecleii” if it covers all of the negative literals.

UR-resolution - unit-resulting resolution. Only perform resultion if it results in nw unit literal (all other chunks line up). UR is essentially datalog stye resolution. It does however allow universals in the unit facts.

SLD resolution is ATP view of prolog.



Cleaning out the database.

given C(X) , C(a) is redundant (subsumed). You can remove it from clause database. A question is how to do this quickly. See Term indexing.

encoding subusmption to SAT


see term rewriting


paramodulation is unification based equational substitution

Paramodulation is dealing with equality pieces in clauses.


rewrite rules basically? destructively simplify terms using them Positive literals can be marked or inferred as demodulators (what we ordinary think of as rewrite rules) By having both paramodulation and demodulation exist, the system has both destructive and conservative rules.


ntroduction to Superposition calculus - Cruanes slides

New results on rewrite-based satisfiability procedures

Term orderings - see term rewriting

Implementation of Saturating Theorem Provers

Superposition 25

backmair and ganzinger 1994

superposition can be used instead of resolution by encoding predicates as functions to true

If I run superposition on unit positive clauses, is it performing completion? Completion of guarded rewrite systems?

E a brainiac theorem prover

The discussion of the encoding of t = s is a bit goofy. Yes, = is commutative. In that sense in is reasoonable to use sets. Could the opposite convention be used? This is kind of like the set-theoretic convention of (a,b) === { {a}, {a,b} __ } (stupid liquid formatting error) which is pretty bizarre too

t = s === { __ {t}, {s} __ } t != s === { __ {t,s} __ } Multiset orderings

Literal Selection

Ordered resolution - order on literals, only resolve on biggest one (you want to get rid of big stuff)

Term Indexing

Triemaps that match SPJ and Graf

type expr = Var of string | App of expr * expr
type 'a exprmap = {var : 'a String.Map.t; app : 'a exprmap exprmap } (* I thing if we play that nested data structure stuff we're gonna have a bad time  in ocaml*)

See Automated reasoning handbook chapter 18 towards efficient subsumption - Tammet Give reasoning a trie Indexes are pretty good for serialized structures. You can serialize a term/tree by . Perfect indexing vs approximate indexing. You don’t necessarily need to index only the terms you need. YOu can iterate over the returned bits and just toss out the stuff you didn’t want. Depends how expensive perfect vs imperfect is

Indexing has different kinds of queries. Is the database ground or not?

  • Variants
  • Unifiers
  • Forward Subsumption
  • Backwards subsumption You may want multi-term indexing

Path Indexing

most suitable for backward subsumption) Consider paths from root to leaf of tree. Can put this in a trie. Look at all paths in a query term of interest and take set intersection from path index trie.

Generalization? Consider just paths through a graph? f(a,b,c), g(a,d,c), … . Up down and around. Kind of reminds me of code2vec

Discrimination Trees

most suitable for unit forward subsumption

A depth first traversal of the whole term

Substitution trees

Feature Vectors

Simple and Efficient Clause Subsumption with Feature Vector Indexing

Fingerprint Indexing for Paramodulation and Rewriting

You can make summary data on terms that is monotonic with respect to instantation.

count and max depth appearance of symbols. A generalization of seperating terms out by their head symbol Imperfect indexing is useful. It does mean you may have to implement algorithms more than once.


Inductiion Superposition with Structural Induction Datatypes Superposition with Datatypes and Codatatypes Arrays Arith Beagle – A Hierarchic Superposition Theorem Prover

Higher Order

Applicative encoding. Turn f(x,y) into app(app(f,x),y). Wasteful and inefficint but it can work. Built in appreciation of higher order structure (currying basically?) is better Higher order lambda free. HORPO - term orderings that appreciate the applicative structure superposition for lambda free higher order Mechanical Mathematicians A new generation of automatic theorem provers eliminate bugs in software and mathematics

Differentiation? antiderivative search

Inside higher order systems Meson hammers blast First-Order Proof Tactics in Higher-Order Logic Theorem Provers - Hurd Metis


Uwe Waldemann - automated reasoning courses

Ordering-Based Strategies for Hor n Clauses* - Dershowitz Equational horn clause. positive is larger than negative in literal selection

33 Basic Test Problems: A Practical Evaluation of Some Paramodulation Strategies

Set of Support, Demodulation, Paramodulation: A Historical Perspective pfenning book

loveland book fitting book theorem proving in otter

wos summary autmated reasoning course krysia

Set of Support prio

Resolution is bottom up, tableau is top down?

Automated Deduction in Equational Logic and Cubic Curves

  • Idea: Convert resolution proof to bus proofs tree. Is this even possible or desirable?

Pelletier Problems

Schulz teaching automated theorem proving polymorphic vampire

An Impossible Asylum - ATP checking of Smullyan puzzle found hypotheses inconsistent

Phillips & Stanovsky loop theory and non associative lagerbas

Searching for a Fixed Point Combinators by Using Automated Theorem Proving: A Preliminary Report

Challenge Problems Focusing on Equality and Combinatory Logic: Evaluating Automated Theorem-Proving Programs

tptp bibiography. It’s insane how much work is here

geometric database horn clauses.

Waldmeister and Twee - UEQ


vampire and the fool

Dedam: A Kernel of Data Structures and Algorithms for Automated Deduction with Equality Clauses

efficient full higher order unification zipperposition

Thesis on Implementation of Higher-Order Superposition

auto2 interesting system that combines egraphs and sturation proving he alos mentions gowers as influence Coming to Terms with Quantified Reasoning algerbaic datatypes require an infinite number of acyclicity axioms. FOL + datatypes is more powerful than just FOL.

from pysmt.shortcuts import get_env
from pysmt.logics import LOGICS
from pysmt.shortcuts import Symbol, get_env, Solver
env = get_env().factory.add_generic_solver("vampire", ["/home/philip/.local/bin/vampire", "--input_syntax", "smtlib2"], LOGICS)
with Solver(name="vampire", logic="QF_UFLRA") as s:
  print(s.is_sat(Symbol("x"))) # True
from z3 import *
s = Solver()
x = Bool("x")
s.add(And(x, Not(x)))
import tempfile
import subprocess
with tempfile.NamedTemporaryFile(suffix=".smt2") as fp:
    fp.write(s.to_smt2().encode() + b"(get-model)")
    res =["/home/philip/.local/bin/vampire",],  capture_output=True)