Category Theory in the E Automated Theorem Prover

At least in the circles I travel in, interactive theorem provers like Agda, Coq, Lean, Isabelle have more mindspace than automatic theorem provers. I haven’t seen much effort to explore category theory in the automatic provers so I thought I’d try it.

Interactive Systems

There are a number of projects formalizing category theory in these systems

All of these systems are using some variant of higher order logic where you can quantify over propositions. This is very expressive, but also are more difficult to automate (They do have significant automation in them though, but this tends to be for filling in the relatively obvious intermediate details of a proof, not complete automation). Perhaps this has some relation to Godel’s incompleteness theorem

Automated Theorem Provers

There are other classes of theorem proving systems: automatic theorem provers and SMT solvers. Can we do category theory in them? Doesn’t Automatic sound real nice in principle?

ATP and SMT are similar in some respects but are architected differently and have slightly different use cases and strengths:

  • SMT solvers are based around SAT solvers and tractable sub problems (theories). They have subsystems that deeply understand linear equations, linear inequalities, systems of polynomials, theory of uninterpreted functions, bit-blasting, others. They combine these facilities via the Nelson-Oppen procedure. They are fairly weak at quantifier reasoning. They are good at problems that require lots of domain specific understanding. Examples of SMT solvers include Z3, CVC4, Alt Ergo, Boolector. You can find more and comparisons at the SMT competition
  • While the term Automatic Theorem Prover (ATP) could mean anything, it has a tendency to denote a class of first order logic solvers based around resolution. Examples of such provers include Vampire, E, and Prover9. You can find more at the CADE competition. They are more oriented to abstract first order logic structures and quantifier reasoning.

A big downside of automatic methods is that once they start to fail, you’re more hosed than the interactive provers. Until then, it’s great though.

Categories in ATP

Category theory proofs have a feeling of being close to trivial (at least the ones I’ve seen, but I’ve mostly seen the trivial ones so…? ), amounting to laboriously expanding definitions and rewrite equations corresponding to commutation conditions. An automatic system to verify these seems useful.

What are the kinds of questions one wants to ask these provers?

  • Confirmation that a concrete mathematical structure (integers, reals, bools, abstract/concrete preorder, group, lattice) obeys the required categorical axioms/interface. The axioms of category structures are the conjectures here
  • Confirmation that abstract categorical constructions do what they’re supposed to. One presupposes categorical axioms and structures and asks conjecture conclusions. For example, given this square, does this other diagram commute? Is this “diagram chasing“?

These are yes/no questions. Although in the case of no, often a counterexample is emitted which can help show where you went awry. A third task that is more exciting to me, but harder and not an obvious stock capability of these provers is

  • Calculate/construct something categorical. For example, we might want to construct a condensed or efficient version of some program given by a categorical spec, or emit a categorical construction that has certain properties. There are clear analogies with program verification vs. program synthesis.

Now it appears that to some degree this is possible. I have noted that in the proof output, one can sometimes find terms that may correspond to the thing you desire, especially if you ask an existential conjecture, “does there exists a morphism with such and such a property”.

Formulating Category Theory in First Order Logic

TPTP is both a problem library and specification language for first order problems for the purposes of computer provers. There is a nice overview video here. There is a nice web interface to explore different provers here. The TPTP library contains four different axiomatizations for categories, and a number of problems

There is a good set of videos explaining how to formalize category axioms in a first order setting. https://www.youtube.com/watch?v=NjDZMWdDJKM&list=PL4FD0wu2mjWOtmhJsiVrCpzOAk42uhdz8&index=6&t=0s he has a couple different formulation actually. It’s interesting. Here’s a stack overflow https://math.stackexchange.com/questions/2383503/category-theory-from-the-first-order-logic-point-of-view along with a small discussion of why it’s wrong headed to even do such a thing. I’m not sure I agree with the second part.

Here is my encoding. I am not 100% confident anything I’ve done here is right. Note that composition is expressed as a ternary relation. This is one way of handling the fact that without stronger typing discipline, composition is a partial binary function. In order to compose, morphisms need to meet on an intermediate object. Categorical “typing” is expressed via logical constraint on the relation.

A trick one can use is to identify the identity arrows and the objects at which they are based. Since every object is required to have an identity arrow and every identity arrow points from and to a single object, they are in isomorphism. There is some conceptual disclarity that occurs from this trick though. I’m not totally sold.

TPTP syntax is mostly straightforward but note that ! [X] is forall X, ? [X] is exists X, capital names are variables, lowercase names are constants. Quantifiers bind tighter than I personally expected, hence my parenthesis explosion.

% axioms of a category
% we would resupply this for every category involved?
% ! [X] is forall X, ? [X] is exists X. Capital names are variables
% lowercase names are constants.
fof( dom_cod, axiom, ![X] : dom(cod(X)) = cod(X)).
fof( cod_dom, axiom, ![X] : cod(dom(X)) = dom(X)).
fof( comp_is_unique, axiom, ![F, G, FG1, FG2] : ((comp(F,G,FG1) & comp(F,G,FG2)) => FG1 = FG2)   ).
fof( comp_objects_middle, axiom, ![F, G] : ((? [FG] : comp(F,G,FG)) <=> dom(F) = cod(G))).
fof( comp_dom, axiom, ![F, G, FG] : (comp(F,G,FG) => dom(G) = dom(FG))).
fof( comp_cod, axiom, ![F, G, FG] : (comp(F,G,FG) => cod(F) = cod(FG))).
fof( left_id, axiom, ![F] : comp(cod(F),F,F) ).
fof( right_id, axiom, ![F] : comp(F,dom(F),F) ).
% I've heard that composition axioms cause churn?
fof( comp_assoc, axiom, ![F, G, H, FG, GH, FGH1, FGH2] : ((comp(F,G,FG) & comp(FG,H,FGH1) & comp(F,GH,FGH2) & comp(G,H,GH)) => FGH1 = FGH2  )).

Here are some definitions. One could also just inline these definitions with a macro system. Uniqueness quantification occurs in universal properties. It’s sort of a subtle idea to encode into ordinary first order logic without uniqueness quantification. Are some encoding better than others? Another place where macros to generate TPTP files would be useful. Uniqueness quantification is naturally expressible as a higher order predicate.

fof(monic_def, axiom,  
    ![M] : (monic(M) <=> (! [F,G] : (( ? [H] : (comp(M, F, H) & comp(M,G,H))) => F = G)))).

fof(commute_square_def, axiom, 
    ![F,G,H,K] : (commute_square(F,G,H,K) <=> (? [M] : (comp(F,G,M) & comp(H,K,M))))).

fof(pullback_def, axiom,
   ![F,G,P1,P2] : (pullback(F,G,P1,P2) <=>
        (commute_square(F,P1,G,P2) &
        (![Q1,Q2] : (commute_square(F,Q1,G,Q2) => 
            (?[U] : (! [U1] : ((comp(P1,U1,Q1) & comp(P2,U1,Q2)) <=> (U1 = U))))
        ))))).

Here are some pretty simple problems:

% should be a trivial statement, but isn't literally an axiom.
fof( codcod, conjecture, ![F] : cod(cod(F)) = cod(F) ).


% paste two commuting squares together gives another commuting square
fof( pasting_square,conjecture,   ![A,B,C,D, I,J,K, BI, CJ] : ((commute_square(B,A,D,C) & commute_square(I,D,K,J) & comp(I,B, IB) & comp( J, C,JC)) 
                                                        =>  commute_square( IB,A, K,JC ) )).

One theorem that is not quite so trivial is the pullback of a monic is monic https://math.stackexchange.com/questions/2957202/proving-the-pullback-of-monics-is-monic. It’s ultimately not that complicated and yet difficult enough that it takes me a lot of head scratching. It crucially uses the uniqueness property of the pullback. Here’s an encoding of the conjecture.

include('cat.tptp').
include('constructions.tptp').

% warmup?
%fof(pullback_monic, conjecture, ![M, P1,P2] : ((monic(M) & pullback(cod(M),M,P1,P2)) => %monic(P2))).

% pullback of monic is monic
fof(pullback_monic, conjecture, ![M, F, P1,P2] : ((monic(M) & pullback(F,M,P1,P2)) => monic(P1))).

Invoking the prover.

eprover --auto-schedule --cpu-limit=60 --proof-object monic_pullback.tptp

Vampire appears to do it faster. Again the easiest way to try it yourself or compare other solvers is the web interface System on TPTP http://www.tptp.org/cgi-bin/SystemOnTPTP. Caveat: given how many times I’ve screwed up writing this post, I’d give a 40% chance that that final theorem is actually expressing what I intended it to.

Bits and Bobbles

Encoding our questions into first order logic, which is powerful but not fully expressive, requires a lot of “macro” like repetitiveness when possible at all. I have found through experience that the extra macro capabilities given by python for emitting Z3 problems to be extremely powerful. For this reason, we should use a real programming language to emit these problems. I think the logical candidate is Julia and the Catlab library. One unknown question, will this repetitiveness choke the theorem prover?

Categorical Constructions that one might want to encode:

  • Monic
  • Epic
  • Commuting Squares
  • Pullbacks
  • Pushouts
  • products
  • coproducts
  • exponential objects
  • Subobject classifiers
  • Finite categories
  • Functors
  • Natural transformations
  • Adjunctions
  • Kan Extensions
  • PreSheaves

Theorems that seem possible. (Surely there are many more.

Some observations on actually using these provers: Just because it says proved is not very convincing. It is very easy to have your axioms and/or conjecture stated incorrectly. Forall ! and Exists ? bind tighter than I naively expect them to in the syntax. I ended up putting parenthesis nearly everywhere. I had a lot of very difficult to debug problems due to bad binding assumptions. Typos are also a disaster. These things are hard to debug. It is helpful to alternatively ask for satisfiability (disproving the conjecture). One should also at least look at which axioms it’s using. If it is using less axioms than makes sense, something is up. These are all good reasons that it might be better to automatically generate these problem files. Ultimately I feel like that is the way to go, because encoding what you’re interested in into first order logic can require some repetitiveness.

Sanity checking my files with http://www.tptp.org/cgi-bin/SystemB4TPTP proved to be helpful. Also looking at the parenthesis structure in the output.

I think using the typed tff format could also help sanity significantly. It really sucks that a typo on one of your predicates or variables can fail silently.

Even ignoring syntax screwups, the scoping of quantifiers is tough to think about.

I suspect that these systems will be very good for proofs that amount to unrolling definitions.

What is the best formulation for category theory properties?

An interesting property is that these provers seem to want a time limit given to them. And they schedule themselves in such a way to use the full time limit, even if they shouldn’t need it.

Categorical “type checking” as an external predicate. In order to compose, morphisms need to meet on an intermediate object.

The proof output is rather difficult to read. It is the equivalent of trying to read assembly code. The high level structure has been transformed into something more amenable to the machine and many names have been mangled. This proof is short enough that I think a person could stare at it for a while an eventually kind of understand it.

Perhaps we want to directly input our constructions in cnf form with skolemization manually applied. This might make the output more readable?

In terms of automated category theory proving, I’m not aware of that much work, but there must be more that I don’t know how to find.

Example proof term: Is this even right? Hard to know.

# Proof found!
# SZS status Theorem
# SZS output start CNFRefutation
fof(pullback_monic, conjecture, ![X11, X13, X14]:((monic(X11)&pullback(cod(X11),X11,X13,X14))=>monic(X14)), file('properties.tptp', pullback_monic)).
fof(pullback_def, axiom, ![X2, X3, X13, X14]:(pullback(X2,X3,X13,X14)<=>(commute_square(X2,X13,X3,X14)&![X15, X16]:(commute_square(X2,X15,X3,X16)=>?[X17]:![X18]:((comp(X13,X18,X15)&comp(X14,X18,X16))<=>X18=X17)))), file('properties.tptp', pullback_def)).
fof(commute_square_def, axiom, ![X2, X3, X7, X12]:(commute_square(X2,X3,X7,X12)<=>?[X11]:(comp(X2,X3,X11)&comp(X7,X12,X11))), file('properties.tptp', commute_square_def)).
fof(comp_objects_middle, axiom, ![X2, X3]:(?[X6]:comp(X2,X3,X6)<=>dom(X2)=cod(X3)), file('cat.tptp', comp_objects_middle)).
fof(dom_cod, axiom, ![X1]:dom(cod(X1))=cod(X1), file('cat.tptp', dom_cod)).
fof(left_id, axiom, ![X2]:comp(cod(X2),X2,X2), file('cat.tptp', left_id)).
fof(comp_is_unique, axiom, ![X2, X3, X4, X5]:((comp(X2,X3,X4)&comp(X2,X3,X5))=>X4=X5), file('cat.tptp', comp_is_unique)).
fof(monic_def, axiom, ![X11]:(monic(X11)<=>![X2, X3]:(?[X7]:(comp(X11,X2,X7)&comp(X11,X3,X7))=>X2=X3)), file('properties.tptp', monic_def)).
fof(comp_cod, axiom, ![X2, X3, X6]:(comp(X2,X3,X6)=>cod(X2)=cod(X6)), file('cat.tptp', comp_cod)).
fof(comp_dom, axiom, ![X2, X3, X6]:(comp(X2,X3,X6)=>dom(X3)=dom(X6)), file('cat.tptp', comp_dom)).
fof(comp_assoc, axiom, ![X2, X3, X7, X6, X8, X9, X10]:((((comp(X2,X3,X6)&comp(X6,X7,X9))&comp(X2,X8,X10))&comp(X3,X7,X8))=>X9=X10), file('cat.tptp', comp_assoc)).
fof(c_0_11, negated_conjecture, ~(![X11, X13, X14]:((monic(X11)&pullback(cod(X11),X11,X13,X14))=>monic(X14))), inference(assume_negation,[status(cth)],[pullback_monic])).
fof(c_0_12, plain, ![X68, X69, X70, X71, X72, X73, X75, X76, X77, X78, X79, X80, X83]:(((commute_square(X68,X70,X69,X71)|~pullback(X68,X69,X70,X71))&((~comp(X70,X75,X72)|~comp(X71,X75,X73)|X75=esk7_6(X68,X69,X70,X71,X72,X73)|~commute_square(X68,X72,X69,X73)|~pullback(X68,X69,X70,X71))&((comp(X70,X76,X72)|X76!=esk7_6(X68,X69,X70,X71,X72,X73)|~commute_square(X68,X72,X69,X73)|~pullback(X68,X69,X70,X71))&(comp(X71,X76,X73)|X76!=esk7_6(X68,X69,X70,X71,X72,X73)|~commute_square(X68,X72,X69,X73)|~pullback(X68,X69,X70,X71)))))&((commute_square(X77,esk8_4(X77,X78,X79,X80),X78,esk9_4(X77,X78,X79,X80))|~commute_square(X77,X79,X78,X80)|pullback(X77,X78,X79,X80))&((~comp(X79,esk10_5(X77,X78,X79,X80,X83),esk8_4(X77,X78,X79,X80))|~comp(X80,esk10_5(X77,X78,X79,X80,X83),esk9_4(X77,X78,X79,X80))|esk10_5(X77,X78,X79,X80,X83)!=X83|~commute_square(X77,X79,X78,X80)|pullback(X77,X78,X79,X80))&((comp(X79,esk10_5(X77,X78,X79,X80,X83),esk8_4(X77,X78,X79,X80))|esk10_5(X77,X78,X79,X80,X83)=X83|~commute_square(X77,X79,X78,X80)|pullback(X77,X78,X79,X80))&(comp(X80,esk10_5(X77,X78,X79,X80,X83),esk9_4(X77,X78,X79,X80))|esk10_5(X77,X78,X79,X80,X83)=X83|~commute_square(X77,X79,X78,X80)|pullback(X77,X78,X79,X80)))))), inference(distribute,[status(thm)],[inference(shift_quantors,[status(thm)],[inference(skolemize,[status(esa)],[inference(variable_rename,[status(thm)],[inference(shift_quantors,[status(thm)],[inference(fof_nnf,[status(thm)],[pullback_def])])])])])])).
fof(c_0_13, negated_conjecture, ((monic(esk11_0)&pullback(cod(esk11_0),esk11_0,esk12_0,esk13_0))&~monic(esk13_0)), inference(skolemize,[status(esa)],[inference(variable_rename,[status(thm)],[inference(fof_nnf,[status(thm)],[c_0_11])])])).
fof(c_0_14, plain, ![X58, X59, X60, X61, X63, X64, X65, X66, X67]:(((comp(X58,X59,esk6_4(X58,X59,X60,X61))|~commute_square(X58,X59,X60,X61))&(comp(X60,X61,esk6_4(X58,X59,X60,X61))|~commute_square(X58,X59,X60,X61)))&(~comp(X63,X64,X67)|~comp(X65,X66,X67)|commute_square(X63,X64,X65,X66))), inference(distribute,[status(thm)],[inference(shift_quantors,[status(thm)],[inference(skolemize,[status(esa)],[inference(variable_rename,[status(thm)],[inference(shift_quantors,[status(thm)],[inference(fof_nnf,[status(thm)],[commute_square_def])])])])])])).
cnf(c_0_15, plain, (commute_square(X1,X2,X3,X4)|~pullback(X1,X3,X2,X4)), inference(split_conjunct,[status(thm)],[c_0_12])).
cnf(c_0_16, negated_conjecture, (pullback(cod(esk11_0),esk11_0,esk12_0,esk13_0)), inference(split_conjunct,[status(thm)],[c_0_13])).
fof(c_0_17, plain, ![X25, X26, X27, X28, X29]:((~comp(X25,X26,X27)|dom(X25)=cod(X26))&(dom(X28)!=cod(X29)|comp(X28,X29,esk1_2(X28,X29)))), inference(shift_quantors,[status(thm)],[inference(skolemize,[status(esa)],[inference(variable_rename,[status(thm)],[inference(shift_quantors,[status(thm)],[inference(fof_nnf,[status(thm)],[comp_objects_middle])])])])])).
cnf(c_0_18, plain, (comp(X1,X2,esk6_4(X1,X2,X3,X4))|~commute_square(X1,X2,X3,X4)), inference(split_conjunct,[status(thm)],[c_0_14])).
cnf(c_0_19, negated_conjecture, (commute_square(cod(esk11_0),esk12_0,esk11_0,esk13_0)), inference(spm,[status(thm)],[c_0_15, c_0_16])).
fof(c_0_20, plain, ![X19]:dom(cod(X19))=cod(X19), inference(variable_rename,[status(thm)],[dom_cod])).
fof(c_0_21, plain, ![X37]:comp(cod(X37),X37,X37), inference(variable_rename,[status(thm)],[left_id])).
cnf(c_0_22, plain, (dom(X1)=cod(X2)|~comp(X1,X2,X3)), inference(split_conjunct,[status(thm)],[c_0_17])).
cnf(c_0_23, negated_conjecture, (comp(cod(esk11_0),esk12_0,esk6_4(cod(esk11_0),esk12_0,esk11_0,esk13_0))), inference(spm,[status(thm)],[c_0_18, c_0_19])).
cnf(c_0_24, plain, (dom(cod(X1))=cod(X1)), inference(split_conjunct,[status(thm)],[c_0_20])).
fof(c_0_25, plain, ![X21, X22, X23, X24]:(~comp(X21,X22,X23)|~comp(X21,X22,X24)|X23=X24), inference(variable_rename,[status(thm)],[inference(fof_nnf,[status(thm)],[comp_is_unique])])).
cnf(c_0_26, plain, (comp(cod(X1),X1,X1)), inference(split_conjunct,[status(thm)],[c_0_21])).
cnf(c_0_27, negated_conjecture, (cod(esk12_0)=cod(esk11_0)), inference(rw,[status(thm)],[inference(spm,[status(thm)],[c_0_22, c_0_23]), c_0_24])).
fof(c_0_28, plain, ![X46, X47, X48, X49, X50]:((~monic(X46)|(~comp(X46,X47,X49)|~comp(X46,X48,X49)|X47=X48))&(((comp(X50,esk2_1(X50),esk4_1(X50))|monic(X50))&(comp(X50,esk3_1(X50),esk4_1(X50))|monic(X50)))&(esk2_1(X50)!=esk3_1(X50)|monic(X50)))), inference(distribute,[status(thm)],[inference(shift_quantors,[status(thm)],[inference(skolemize,[status(esa)],[inference(variable_rename,[status(thm)],[inference(shift_quantors,[status(thm)],[inference(fof_nnf,[status(thm)],[monic_def])])])])])])).
cnf(c_0_29, plain, (X3=X4|~comp(X1,X2,X3)|~comp(X1,X2,X4)), inference(split_conjunct,[status(thm)],[c_0_25])).
cnf(c_0_30, negated_conjecture, (comp(cod(esk11_0),esk12_0,esk12_0)), inference(spm,[status(thm)],[c_0_26, c_0_27])).
fof(c_0_31, plain, ![X34, X35, X36]:(~comp(X34,X35,X36)|cod(X34)=cod(X36)), inference(variable_rename,[status(thm)],[inference(fof_nnf,[status(thm)],[comp_cod])])).
cnf(c_0_32, negated_conjecture, (~monic(esk13_0)), inference(split_conjunct,[status(thm)],[c_0_13])).
cnf(c_0_33, plain, (comp(X1,esk2_1(X1),esk4_1(X1))|monic(X1)), inference(split_conjunct,[status(thm)],[c_0_28])).
cnf(c_0_34, plain, (comp(X1,X2,esk6_4(X3,X4,X1,X2))|~commute_square(X3,X4,X1,X2)), inference(split_conjunct,[status(thm)],[c_0_14])).
cnf(c_0_35, plain, (comp(X1,esk3_1(X1),esk4_1(X1))|monic(X1)), inference(split_conjunct,[status(thm)],[c_0_28])).
cnf(c_0_36, negated_conjecture, (X1=esk12_0|~comp(cod(esk11_0),esk12_0,X1)), inference(spm,[status(thm)],[c_0_29, c_0_30])).
cnf(c_0_37, plain, (cod(X1)=cod(X3)|~comp(X1,X2,X3)), inference(split_conjunct,[status(thm)],[c_0_31])).
cnf(c_0_38, negated_conjecture, (comp(esk13_0,esk2_1(esk13_0),esk4_1(esk13_0))), inference(spm,[status(thm)],[c_0_32, c_0_33])).
cnf(c_0_39, negated_conjecture, (comp(esk11_0,esk13_0,esk6_4(cod(esk11_0),esk12_0,esk11_0,esk13_0))), inference(spm,[status(thm)],[c_0_34, c_0_19])).
cnf(c_0_40, negated_conjecture, (comp(esk13_0,esk3_1(esk13_0),esk4_1(esk13_0))), inference(spm,[status(thm)],[c_0_32, c_0_35])).
fof(c_0_41, plain, ![X31, X32, X33]:(~comp(X31,X32,X33)|dom(X32)=dom(X33)), inference(variable_rename,[status(thm)],[inference(fof_nnf,[status(thm)],[comp_dom])])).
cnf(c_0_42, negated_conjecture, (esk6_4(cod(esk11_0),esk12_0,esk11_0,esk13_0)=esk12_0), inference(spm,[status(thm)],[c_0_36, c_0_23])).
cnf(c_0_43, negated_conjecture, (cod(esk4_1(esk13_0))=cod(esk13_0)), inference(spm,[status(thm)],[c_0_37, c_0_38])).
cnf(c_0_44, negated_conjecture, (cod(esk13_0)=dom(esk11_0)), inference(spm,[status(thm)],[c_0_22, c_0_39])).
cnf(c_0_45, plain, (comp(X1,X2,esk1_2(X1,X2))|dom(X1)!=cod(X2)), inference(split_conjunct,[status(thm)],[c_0_17])).
cnf(c_0_46, negated_conjecture, (cod(esk3_1(esk13_0))=dom(esk13_0)), inference(spm,[status(thm)],[c_0_22, c_0_40])).
cnf(c_0_47, plain, (dom(X2)=dom(X3)|~comp(X1,X2,X3)), inference(split_conjunct,[status(thm)],[c_0_41])).
cnf(c_0_48, negated_conjecture, (comp(esk11_0,esk13_0,esk12_0)), inference(rw,[status(thm)],[c_0_39, c_0_42])).
cnf(c_0_49, negated_conjecture, (cod(esk4_1(esk13_0))=dom(esk11_0)), inference(rw,[status(thm)],[c_0_43, c_0_44])).
fof(c_0_50, plain, ![X39, X40, X41, X42, X43, X44, X45]:(~comp(X39,X40,X42)|~comp(X42,X41,X44)|~comp(X39,X43,X45)|~comp(X40,X41,X43)|X44=X45), inference(variable_rename,[status(thm)],[inference(fof_nnf,[status(thm)],[comp_assoc])])).
cnf(c_0_51, negated_conjecture, (comp(X1,esk3_1(esk13_0),esk1_2(X1,esk3_1(esk13_0)))|dom(X1)!=dom(esk13_0)), inference(spm,[status(thm)],[c_0_45, c_0_46])).
cnf(c_0_52, negated_conjecture, (dom(esk12_0)=dom(esk13_0)), inference(spm,[status(thm)],[c_0_47, c_0_48])).
cnf(c_0_53, negated_conjecture, (comp(X1,esk4_1(esk13_0),esk1_2(X1,esk4_1(esk13_0)))|dom(X1)!=dom(esk11_0)), inference(spm,[status(thm)],[c_0_45, c_0_49])).
cnf(c_0_54, plain, (X5=X7|~comp(X1,X2,X3)|~comp(X3,X4,X5)|~comp(X1,X6,X7)|~comp(X2,X4,X6)), inference(split_conjunct,[status(thm)],[c_0_50])).
cnf(c_0_55, negated_conjecture, (comp(esk12_0,esk3_1(esk13_0),esk1_2(esk12_0,esk3_1(esk13_0)))), inference(spm,[status(thm)],[c_0_51, c_0_52])).
cnf(c_0_56, negated_conjecture, (cod(esk2_1(esk13_0))=dom(esk13_0)), inference(spm,[status(thm)],[c_0_22, c_0_38])).
cnf(c_0_57, plain, (commute_square(X1,X2,X4,X5)|~comp(X1,X2,X3)|~comp(X4,X5,X3)), inference(split_conjunct,[status(thm)],[c_0_14])).
cnf(c_0_58, negated_conjecture, (comp(esk11_0,esk4_1(esk13_0),esk1_2(esk11_0,esk4_1(esk13_0)))), inference(er,[status(thm)],[c_0_53])).
cnf(c_0_59, negated_conjecture, (esk1_2(esk12_0,esk3_1(esk13_0))=X1|~comp(X2,esk3_1(esk13_0),X3)|~comp(X4,X2,esk12_0)|~comp(X4,X3,X1)), inference(spm,[status(thm)],[c_0_54, c_0_55])).
cnf(c_0_60, negated_conjecture, (comp(X1,esk2_1(esk13_0),esk1_2(X1,esk2_1(esk13_0)))|dom(X1)!=dom(esk13_0)), inference(spm,[status(thm)],[c_0_45, c_0_56])).
cnf(c_0_61, plain, (X2=esk7_6(X6,X7,X1,X4,X3,X5)|~comp(X1,X2,X3)|~comp(X4,X2,X5)|~commute_square(X6,X3,X7,X5)|~pullback(X6,X7,X1,X4)), inference(split_conjunct,[status(thm)],[c_0_12])).
cnf(c_0_62, negated_conjecture, (commute_square(X1,X2,esk11_0,esk4_1(esk13_0))|~comp(X1,X2,esk1_2(esk11_0,esk4_1(esk13_0)))), inference(spm,[status(thm)],[c_0_57, c_0_58])).
cnf(c_0_63, negated_conjecture, (cod(esk1_2(esk11_0,esk4_1(esk13_0)))=cod(esk11_0)), inference(spm,[status(thm)],[c_0_37, c_0_58])).
cnf(c_0_64, negated_conjecture, (esk1_2(esk12_0,esk3_1(esk13_0))=esk1_2(esk11_0,esk4_1(esk13_0))|~comp(X1,esk3_1(esk13_0),esk4_1(esk13_0))|~comp(esk11_0,X1,esk12_0)), inference(spm,[status(thm)],[c_0_59, c_0_58])).
cnf(c_0_65, negated_conjecture, (comp(esk12_0,esk2_1(esk13_0),esk1_2(esk12_0,esk2_1(esk13_0)))), inference(spm,[status(thm)],[c_0_60, c_0_52])).
cnf(c_0_66, negated_conjecture, (X1=esk7_6(cod(esk11_0),esk11_0,esk12_0,esk13_0,X2,X3)|~commute_square(cod(esk11_0),X2,esk11_0,X3)|~comp(esk13_0,X1,X3)|~comp(esk12_0,X1,X2)), inference(spm,[status(thm)],[c_0_61, c_0_16])).
cnf(c_0_67, negated_conjecture, (commute_square(cod(esk11_0),esk1_2(esk11_0,esk4_1(esk13_0)),esk11_0,esk4_1(esk13_0))), inference(rw,[status(thm)],[inference(spm,[status(thm)],[c_0_62, c_0_26]), c_0_63])).
cnf(c_0_68, negated_conjecture, (esk1_2(esk12_0,esk3_1(esk13_0))=esk1_2(esk11_0,esk4_1(esk13_0))), inference(cn,[status(thm)],[inference(rw,[status(thm)],[inference(spm,[status(thm)],[c_0_64, c_0_48]), c_0_40])])).
cnf(c_0_69, negated_conjecture, (esk1_2(esk12_0,esk2_1(esk13_0))=X1|~comp(X2,esk2_1(esk13_0),X3)|~comp(X4,X2,esk12_0)|~comp(X4,X3,X1)), inference(spm,[status(thm)],[c_0_54, c_0_65])).
cnf(c_0_70, negated_conjecture, (X1=esk7_6(cod(esk11_0),esk11_0,esk12_0,esk13_0,esk1_2(esk11_0,esk4_1(esk13_0)),esk4_1(esk13_0))|~comp(esk12_0,X1,esk1_2(esk11_0,esk4_1(esk13_0)))|~comp(esk13_0,X1,esk4_1(esk13_0))), inference(spm,[status(thm)],[c_0_66, c_0_67])).
cnf(c_0_71, negated_conjecture, (comp(esk12_0,esk3_1(esk13_0),esk1_2(esk11_0,esk4_1(esk13_0)))), inference(rw,[status(thm)],[c_0_55, c_0_68])).
cnf(c_0_72, negated_conjecture, (esk1_2(esk12_0,esk2_1(esk13_0))=esk1_2(esk11_0,esk4_1(esk13_0))|~comp(X1,esk2_1(esk13_0),esk4_1(esk13_0))|~comp(esk11_0,X1,esk12_0)), inference(spm,[status(thm)],[c_0_69, c_0_58])).
cnf(c_0_73, negated_conjecture, (esk7_6(cod(esk11_0),esk11_0,esk12_0,esk13_0,esk1_2(esk11_0,esk4_1(esk13_0)),esk4_1(esk13_0))=esk3_1(esk13_0)), inference(cn,[status(thm)],[inference(rw,[status(thm)],[inference(spm,[status(thm)],[c_0_70, c_0_40]), c_0_71])])).
cnf(c_0_74, negated_conjecture, (esk1_2(esk12_0,esk2_1(esk13_0))=esk1_2(esk11_0,esk4_1(esk13_0))), inference(cn,[status(thm)],[inference(rw,[status(thm)],[inference(spm,[status(thm)],[c_0_72, c_0_48]), c_0_38])])).
cnf(c_0_75, negated_conjecture, (X1=esk3_1(esk13_0)|~comp(esk12_0,X1,esk1_2(esk11_0,esk4_1(esk13_0)))|~comp(esk13_0,X1,esk4_1(esk13_0))), inference(rw,[status(thm)],[c_0_70, c_0_73])).
cnf(c_0_76, negated_conjecture, (comp(esk12_0,esk2_1(esk13_0),esk1_2(esk11_0,esk4_1(esk13_0)))), inference(rw,[status(thm)],[c_0_65, c_0_74])).
cnf(c_0_77, plain, (monic(X1)|esk2_1(X1)!=esk3_1(X1)), inference(split_conjunct,[status(thm)],[c_0_28])).
cnf(c_0_78, negated_conjecture, (esk3_1(esk13_0)=esk2_1(esk13_0)), inference(cn,[status(thm)],[inference(rw,[status(thm)],[inference(spm,[status(thm)],[c_0_75, c_0_38]), c_0_76])])).
cnf(c_0_79, negated_conjecture, ($false), inference(sr,[status(thm)],[inference(spm,[status(thm)],[c_0_77, c_0_78]), c_0_32]), ['proof']).
# SZS output end CNFRefutation
# Proof object total steps             : 80
# Proof object clause steps            : 57
# Proof object formula steps           : 23
# Proof object conjectures             : 44
# Proof object clause conjectures      : 41
# Proof object formula conjectures     : 3
# Proof object initial clauses used    : 18
# Proof object initial formulas used   : 11
# Proof object generating inferences   : 34
# Proof object simplifying inferences  : 16
# Training examples: 0 positive, 0 negative
# Parsed axioms                        : 14
# Removed by relevancy pruning/SinE    : 0
# Initial clauses                      : 31
# Removed in clause preprocessing      : 0
# Initial clauses in saturation        : 31
# Processed clauses                    : 4871
# ...of these trivial                  : 248
# ...subsumed                          : 2833
# ...remaining for further processing  : 1790
# Other redundant clauses eliminated   : 2
# Clauses deleted for lack of memory   : 0
# Backward-subsumed                    : 68
# Backward-rewritten                   : 917
# Generated clauses                    : 12894
# ...of the previous two non-trivial   : 12112
# Contextual simplify-reflections      : 0
# Paramodulations                      : 12870
# Factorizations                       : 0
# Equation resolutions                 : 24
# Propositional unsat checks           : 0
#    Propositional check models        : 0
#    Propositional check unsatisfiable : 0
#    Propositional clauses             : 0
#    Propositional clauses after purity: 0
#    Propositional unsat core size     : 0
#    Propositional preprocessing time  : 0.000
#    Propositional encoding time       : 0.000
#    Propositional solver time         : 0.000
#    Success case prop preproc time    : 0.000
#    Success case prop encoding time   : 0.000
#    Success case prop solver time     : 0.000
# Current number of processed clauses  : 772
#    Positive orientable unit clauses  : 155
#    Positive unorientable unit clauses: 0
#    Negative unit clauses             : 1
#    Non-unit-clauses                  : 616
# Current number of unprocessed clauses: 5468
# ...number of literals in the above   : 16715
# Current number of archived formulas  : 0
# Current number of archived clauses   : 1016
# Clause-clause subsumption calls (NU) : 303883
# Rec. Clause-clause subsumption calls : 253866
# Non-unit clause-clause subsumptions  : 2890
# Unit Clause-clause subsumption calls : 1530
# Rewrite failures with RHS unbound    : 0
# BW rewrite match attempts            : 1110
# BW rewrite match successes           : 80
# Condensation attempts                : 0
# Condensation successes               : 0
# Termbank termtop insertions          : 267453

Proving some Inductive Facts about Lists using Z3 python

Z3 is an SMT solver which has a good rep. Here are some excellent tutorials.

https://rise4fun.com/z3/tutorial

https://theory.stanford.edu/~nikolaj/programmingz3.html

http://homepage.cs.uiowa.edu/~ajreynol/VTSA2017/

SMT stands for satisfiability modulo theories. The exact nature of power of these kinds of solvers has been and is still hazy to me. I have known for a long time that they can slam sudoku or picross or other puzzles, but what about more infinite or logic looking things? I think I may always be hazy, as one can learn and discover more and more encoding tricks to get problems and proofs that you previously thought weren’t solvable into the system. It’s very similar to learning how to encode to linear programming solvers in that respect.

SMT solvers combine a general smart search procedure with a ton of specialized solvers for particular domains, like linear algebra, polynomials, linear inequalities and more.

The search procedure goes by the name DPLL(T). It is an adjustment of the procedure of SAT solvers, which are very powerful and fast. SAT solvers find an assignment of boolean variables in order to make a complicated boolean expression true, or to eventually find that it can never be made true. SAT solvers work on the principle of guessing and deducing. If a OR b needs to be true and we already know a is false, we can deduce b must be true. When the deduction process stalls, we just guess and then backtrack if it doesn’t work out. This is the same process you use manually in solving Sudoku.

The modern era of SAT solvers came when a couple new tricks vastly extended their power. In particular Conflict Driven Clause Learning (CDCL), where when the solver finds itself in a dead end, it figures out the choices it made that put it in the dead end and adds a clause to its formula so that it will never make those choices again.

https://sahandsaba.com/understanding-sat-by-implementing-a-simple-sat-solver-in-python.html

SMT works by now having the boolean variables of the SAT solver contain inner structure, like the boolean p actually represents the fact x + y < 5. During the search process it can take the pile of booleans that have been set to true and ask a solver (maybe a linear programming solver in this case) whether those facts can all be made true in the underlying theory. This is an extra spice on top of the SAT search.

Something that wasn’t apparent to me at first is how important the theory of uninterpreted formulas is to SMT solvers. It really is their bread and butter. This theory is basically solved by unification, which is the fairly intuitive process of finding assignments to variables to make a set of equations true. If I ask how to make fred(x,4) = fred(7,y), obviously the answer is y=4, x=7. That is unification. Unification is a completely syntax driven way to deduce facts. It starts to give you something quite similar to first order logic.

https://eli.thegreenplace.net/2018/unification/

https://cs.mtsu.edu/~rbutler/courses/sdd/automated_deduction/unification.pdf

I was also under the impression that quantifiers \forall, \exists were available but heavily frowned upon. I don’t think this is quite true. I think they are sort of a part of the entire point of the SMT solver now, although yes, they are rather flaky. There are a number of methods to deal with the quantifier, but one common one is to look for a pattern or parts of the subformula, and instantiate a new set of free variables for all of the quantified ones and add the theorem every time the patterns match. This is called E-matching.

Here are a couple tutorials on proving inductive facts in SMT solvers. You kind of have to hold their hand a bit.

http://lara.epfl.ch/~reynolds/vmcai15.pdf

http://homepage.divms.uiowa.edu/~ajreynol/pres-ind15.pdf

SMT solvers queries usually have the flavor of finding something, in this case a counterexample. The idea is that you try to ask for the first counterexample where induction failed. Assuming that proposition P was true for (n-1), find n such that P is not true. If you can’t find it, then the induction has gone through.

And here is some code where I believe I’m showing that some simple list functions like reverse, append, and length have some simple properties like \forall t. rev (rev(t)) == t .

from z3 import *

# Very useful reference
# https://theory.stanford.edu/~nikolaj/programmingz3.html

f = Function('f', IntSort(), IntSort())
s = Solver()
#s.add(f(True) == False, f(False) == True)
x = Int('x')
s.add(ForAll([x], f(x) >= x)) #> and < do not seem to be returning
print(s.sexpr())
s.check()
print(s.model())

# Rolling my own list data type
# Z3 has a built in which will probably be better?
s = Solver()

L = Datatype("MyList")
L.declare("Nil")
L.declare("Cons", ("hd", IntSort()), ("tail", L))
L = L.create()

t = Const('t', L)
u = Const('u', L)

y = Int('y')


rev = Function('reverse', L, L)
app = Function('append', L, L, L)
leng = Function('length', L, IntSort())

#defining my functions. Micro Haskell, BABY
#length
s.add( leng(L.Nil) == 0 )
s.add(  ForAll([u,y],  leng(L.Cons(y,u)) == 1 + leng(u))) #  patterns = [leng(L.Cons(y,u))] 

#append
s.add(  ForAll([u], app(L.Nil, u) == u)) 
s.add(  ForAll([t, u, y] , app(L.Cons(y,t), u)  ==  L.Cons(y, app(t, u ))))

#reverse
s.add( rev(L.Nil) == L.Nil)
s.add(  ForAll([y,t],  rev(L.Cons(y,t)) == app(rev(t), L.Cons(y, L.Nil))))


s.push()
print("proving leng(t) >= 0")
#s.add( Or(And(t == L.Cons(y,u),  leng(u) >= 0 ), t == L.Nil))
s.add(  Not(leng(t) >= 0 ))
s.add( Implies(t == L.Cons(L.hd(t), L.tail(t)),  leng(L.tail(t)) >= 0 ))
print(s.check())

s.pop()
s.push()
#s.add( leng(app(L.Nil, u)) == leng(u) )

print("prove length is preserved under app.")
s.add( leng(app(t,u)) != leng(t) + leng(u))
s.add( Implies(t == L.Cons(L.hd(t), L.tail(t)),   leng(app(L.tail(t),u)) == leng(L.tail(t)) + leng(u)  ))
print(s.check())

s.pop()
s.push()

print("reverse preserves length")
#Lemma Could place in clause with the above proof of this theorem
s.add( ForAll([t,u], leng(app(t,u)) == leng(t) + leng(u)   )) #subgoal needed
s.add( leng(rev(t)) != leng(t))
s.add( Implies(t == L.Cons(L.hd(t), L.tail(t)),   leng(rev(L.tail(t))) == leng(L.tail(t)) ))


s.pop()
s.push()

print("reverse reverse = id")
s.add( ForAll( [t,u], rev(app(t,u)) ==  app(rev(u), rev(t)) ) ) #subgoal needed
s.add( rev(rev(t)) != t )
s.add( Implies(t == L.Cons(L.hd(t), L.tail(t)),   rev(rev(L.tail(t))) == L.tail(t) ))
print(s.check())


#junk

#s.add(t != L.Nil ) 
#s.add( ForAll([t], rev(L.Cons(y,t)) ==   ) , rev(L.Nil) == L.Nil)
#s.add( leng(L.Cons()) == 0 )
#s.add(  ForAll(y,  leng(L.Cons(y,L.Nil)) == 1 + leng(L.Nil)))
#s.add(  Not( leng(app(t,u))  == leng(t) + leng(u) )) 

# prove length of app + nil is same
#s.add( leng(app(t,L.Nil)) != leng(t))
#s.add( Implies(t == L.Cons(L.hd(t), L.tail(t)),   leng(app(L.tail(t),L.Nil)) == leng(L.tail(t))))

#s.add( app(L.Nil,L.Nil) == L.Nil)
#s.add( app(t,L.Nil) != t)
#s.add( Implies(t == L.Cons(L.hd(t), L.tail(t)),   app(L.tail(t),L.Nil) == L.tail(t)))


#s.add(Or(t == L.Nil , And( t == L.Cons(y, u),   app(u,L.Nil) == u)  ))
#s.add( Implies(t == L.Nil, leng(app(L.Nil, u)) == leng(u) ))
# all of these don't work
#s.add( Implies(u == L.tail(t),  leng(u) >= 0 ))
#s.add( ForAll(u, Implies(t == L.Cons(y,u),  leng(u) >= 0 )))
#print(s.sexpr())
#print(s.check())
#print(s.model())

#print(tactics())
'''
def induction(freevar, f, construtors?):
    x = Int('x')
    return And(Not(f(x)), 
'''

'''
# controller synthesis
pos = Array(Real, 10) 
for t in range(10):
    s.add(pos[t] == pos[t] +  v * dt)
    s.add(v[t] == v[t] + f(pos[t]) * dt)
    s.add(f(pos[t] <= 1)) # constraints on force
s.add(ForAll([init], Implies(And(init <= 1, pos[0] == init), pos[9] <= 0.1) ))
 
s.add(Forall[x], f(x) == a * x) # parametrizing. 
'''
#s.set("produce-proofs", True
#s.set("mbqi", False)
#s.set(mbqi=False)
#print(s.proof())