We have a project at work called CBAT, the Comparative Binary Analysis Tool. It uses Bap, the binary analysis platform, to lift assembly code and run a weakest precondition analysis, translating it to an smtlib query. The slight twist this project has is that it is supposed to be comparative meaning you can compare a patched version of a program to it’s original functionality.

Being totally delusional I have wanted to try replicating the functionality of cbat in a different way. CBAT operates over BIR, which is a CFG reconstructed form of the assembly. Reconstructing a CFG (which is the job of Bap, not CBAT) is a task fraught with peril. Indirect jumps at minimum make this very difficult. As a secondary issue, Some of the complexity (not horrifically complex, but subtle) of the core of CBAT is due to traversing this CFG and dealing with annotating invariants and bounded loop unrolling.

Constrained Horn Clauses (CHC) are a type of logical formula that express naturally certain reachability problems that occur in software verification. I tend to think of them as a prolog style state machine encoding of the imperative program. Z3 supports these formula with a couple of solvers. The main ones of interest are spacer and bmc. You can find these sorts of things by digging through the options z3 -p.

In principle, I thought that some of this complexity of CBAT could be avoided by relying on the constrained horn clause solvers to do the graph traversal and loop unrolling for us. Isn’t that the appeal of solvers? I have implemented a CHC encoding here here

There is a school of thought that human readability is not that important to what you dump into an smt solver. I have found that not to be the case because of debuggability, in particular performance debugging. In trying out new ideas it is significantly easier to hand edit smtlib files than to build all the generation infrastructure and analysis just to find out it doesn’t help. The programmatic APIs to Z3 hurt this goal significantly. When you ask Z3 for an smtlib string representation of a problem after building it with the programmatic bindings it gives you kind of random looking stuff. It is quite hard to interpret.

Something very appealing about CHC is that every clause can correspond closely to a simple notion in the original program, in my case every assembly instruction. My favorite trick of weakest precondition encoding is the assign -> let trick. Ordinary presentations of assign statements in WP show it as a predicate transformer that performs substitution WP(x := E,Q) = Q[E/x] . However, smtlib (but not the z3 programmatic bindings!) support let clauses. In this case you get WP(x := E, Q) = (let ((x E)) Q) what this leaves behind is a nice sequence of let statements that exactly mirror the assignments, making the resulting SMT query significantly more readable.

The basic methodology of my CHC encoding is based upon Horn Clause Solvers for Program Verification. We define a predicate reach-state that describes the reachable states of the program. This predicate has many fields, one for each state variable. Crucially, the state includes the program counter because that makes it easy to encode indirect jumps.

First we encode a fact about the initial states, say at the entry of a function. This states that at program counter 0, every state is allowed.

(set-logic HORN)
(set-option :produce-proofs true)
(set-option :fp.engine spacer)

(assert
 (forall ((R0 (_ BitVec 32)) (R1 (_ BitVec 32))...)
  (reach-state #x00000000 R0 R1...)))

We may place constraints on the initial state like so

(assert
 (forall ((R0 (_ BitVec 32)) (R1 (_ BitVec 32))...)
  (=> (MYCONSTRAINTS) (reach-state #x00000000 R0 R1...))))

Then for every line of assembly we compile a horn clause

(assert
 (forall
   ((R0 (_ BitVec 32)) (R1 (_ BitVec 32))...)
     (=> (reach-state CURRENTPC R0 R1...)
          WP( stmts | (reach-state NEXTPC R0 R1...)))))

Here’s a some more concrete example

;0x1c
;{
;  R0 := 1
;}
(assert
 (forall
  ((%P12582905 (_ BitVec 32)) (%P12582910 (_ BitVec 32)) (CF (_ BitVec 1))
   (LR (_ BitVec 32)) (NF (_ BitVec 1)) (R0 (_ BitVec 32)) (R3 (_ BitVec 32))
   (VF (_ BitVec 1)) (ZF (_ BitVec 1)))
  (=> (reach-state #x0000001c %P12582905 %P12582910 CF LR NF R0 R3 VF ZF)
   (let ((R0 #x00000001))
    (reach-state #x00000020 %P12582905 %P12582910 CF LR NF R0 R3 VF ZF)))))

;0x18
;{
;  jmp LR
;}
(assert
 (forall
  ((%P12582905 (_ BitVec 32)) (%P12582910 (_ BitVec 32)) (CF (_ BitVec 1))
   (LR (_ BitVec 32)) (NF (_ BitVec 1)) (R0 (_ BitVec 32)) (R3 (_ BitVec 32))
   (VF (_ BitVec 1)) (ZF (_ BitVec 1)))
  (=> (reach-state #x00000018 %P12582905 %P12582910 CF LR NF R0 R3 VF ZF)
   (reach-state LR %P12582905 %P12582910 CF LR NF R0 R3 VF ZF))))

;0x14
;{
;  if (~ZF) {
;    jmp 0xC
;  }
;}
(assert
 (forall
  ((%P12582905 (_ BitVec 32)) (%P12582910 (_ BitVec 32)) (CF (_ BitVec 1))
   (LR (_ BitVec 32)) (NF (_ BitVec 1)) (R0 (_ BitVec 32)) (R3 (_ BitVec 32))
   (VF (_ BitVec 1)) (ZF (_ BitVec 1)))
  (=> (reach-state #x00000014 %P12582905 %P12582910 CF LR NF R0 R3 VF ZF)
   (ite
    (= (bvand (ite (bvule ZF #b0) #b1 #b0) (ite (bvule #b0 ZF) #b1 #b0)) #b1)
    (reach-state #x0000000c %P12582905 %P12582910 CF LR NF R0 R3 VF ZF)
    (reach-state #x00000018 %P12582905 %P12582910 CF LR NF R0 R3 VF ZF)))))

Finally you need to encode bad states, which are the negations of assertions or postconditions you are trying to verify.

(assert
 (forall
   ((R0 (_ BitVec 32)) (R1 (_ BitVec 32))...)
     (=> (reach-state ASSERTPC R0 R1...)
          (not BADSTATE))))

What this encoding in can be thought of as doing is backchaining from bad states.

If (check-sat) returns SAT, that means there is a predicate reach-state that overapproximates the reachable states that nevertheless implies the bad states are never reached. You can get this with (get-model).

If (check-sat) returns UNSAT, that means there is a refutation proof that shows how to reach a bad state. This proof can be seen with (get-proof). It contains a counterexample trace of the verification property, but it can be hard to see it.

Implementation

Another break from CBAT is that I have tried to build this as a Bap plugin rather than as a command and to use the new “bap core theory” representation rather than the softly being phased out BIR. Core Theory is a finally tagless intermediate representation, the actual module signature of which is documented here. It is not quite the same tagless final you’ll find on Kiselyov’s site. There is a twist that it is intermixed with use of Bap’s Knowledge Base. The knowledge base supplies an infrastructure for defining extensible records and a lazy state monad to store and compute them in. This intermixing is described in the ocaml workshop talk

There are a lot of concepts floating around here. I was greatly aided by looking at the implementation of the “herbrand” theory, which reflects the finally tagless version into an initial datatype of S-expressions. If you know how to do that, you can do anything (well, a lot anyways).

One difficult of the CHC encoding is that I need to know all the state variables at play. I did this using a core theory pass collecting all variables and storing them in a slot.

Then a knowledge base promise to fill a WP semantics slot is run that calculates a Sexp.t -> Var.t list -> Sexp.t for every Program.t. This function is the weakest precondition predicate transformer, where predicates are represented by smtlib Sexp.t. The var list is the state variables, which is a global quantity.

The top level printer collects all the state variables, collects all the WP, applies the WP to a Fallthrough predicate, wraps the result in a horn clause and prints the result. All of this is done here

You can install with make install and then invoke the chc printer with bap myexample.o -dchc --no-cache.

What next?

I dunno. I just wanted to see this work. Helped me understand core theory better.

My implementation is still incomplete and the translation could be made cleaner by not relying on Theory.Basic.Make. I’m also a little shaky on the semantics of core theory and on the correctness of my encoding.

This is a fairly fine-grained encoding in terms of CHCs. It is desirable to use more analysis to make the encoding simpler or have less clauses. I hope there might be some way to verify the equivalence of a simplification to the most basic encoding. That’d be nice.

I was a bit confused why I had to interpret Core theory bools as 1bit bitvectors. I feel like there was a way to make that work, and it would clean the encoding up.

Currently I have no method to specify postconditions, but these can be manually written without too much difficulty.

It would be desirable to encode only a given portion of the assembly. This is of course doable, I just haven’t. I worked on example files that only contain one function for my ease.

I am actually pretty skeptical this scales at all. It can be pretty easy to make Z3 say unknown and I don’t know where the boundaries are. I’d hoped the bmc mode would be fairly robust. The difficulty of interpretability of UNSAT counterexamples is also really bad. These two points are huge ones in the approach taken by CBAT’s favor.

A larger scale dream of mine would be to put smtlib predicates into an elf section, and then verify them before loading a binary. This is my down to earth version of what “proof carrying code” (really spec carrying code in this case) might mean. There is work in progress out there on proof formats for smt solvers. Spacer does output answers for reach-state so this model is very close to being a proof of the property, since given this model it is comparatively easy to verify the properties.

We could alternatively do btor2, which is a related format to smtlib CHC. btor can be model checked by boolector. https://github.com/chc-comp/chc-tools Is this saying chc2btor exists or is desired to exist?

I had a cute trick that I abandoned by passing rho Z3 arrays from variables names strings to bitvec64 and other types. This was a literal “store” variable as you might see in an operational specification of a language. This avoided the need to do a global pass to collect the state variables, but the encoding looked nasty, and I was skeptical it wouldn’t choke the solver.