I gave a talk at EGRAPHS 2025 - Omelets Need Onions: E-graphs Modulo Theories via Bottom Up E-Matching link slides youtube

The motivation is how to bake in theory specific smarts like SMT solvers do. Discussions about the easiest form of e-matching (bottom up) when it comes to theories and also the idea of slotting in theory specific generalized unions finds, analagous vaguely to Shostak style SMT solvers.

preprint

Went pretty well I think! I had some technical difficulties at the beginning. I had tried my laptop earlier and it wasn’t working. Somehow I assumed that the projector must be locked or something for the break and didn’t question it? Jetlag is a hell of a drug. In hindsight there were a lot of options. Luckily Max let me use his laptop. I feel like gmail was also being wonky.

Some papers this year I would like to revisit in more depth

Had a blast! Having a core of people I know is so clutch. Also giving a talk is a nice ice breaker. I’m getting much better at just chatting with strangers as life goes on. In hindsight, I had career crippling social anxiety in my 20s. Maybe everyone does.

Notes

Started my journey! Was tinkering on my slides Tuesday and they locked into a workable form. Plane was not nearly as horrible as I worried. Stood up every once in a while Announced my coming on the various social media. Good technique. I have a hard time butting into conversations but like when people start one with me.

Algebraic Subtyping and biunification. Very long and full of protens

12

Big day of walking! Started up to the seoul tower. Very uphill.

I feel like I barely ate all day by accident. It was hard to pick a place. I’ve also been a bit uncertain on whether card will be accepted?

I got ginseng chicken soup at this place. I keep getting drawn to lines.

13

Started day a little late. Practicing slides. Exhausted from yesterday. I keep crashing at 7pm and waking up at 2pm. I guess that works.

Bukchon Hanok village https://en.wikipedia.org/wiki/Bukchon_Hanok_Village

Went to https://en.wikipedia.org/wiki/Changdeokgung palace. Very nice walk in the woods, impressive architecture. I was doing this for hours, hard to appreciate at some point. Very very hot.

Polcari Sweat, good little gatorade like drink.

https://en.wikipedia.org/wiki/Cheonggyecheon walk along stream. Ate mung bean pnacakes from https://en.wikipedia.org/wiki/Gwangjang_Market . My belly is confusing me. The frying oil and hot smell really turned me off in that moment. But after a little bite my hunger came back.

Conference day 1

<www.tinyurl.com/pldi25-egglog>

https://stackoverflow.com/questions/18468786/understand-how-to-use-apply-and-unapply unapply

Watched the egglog tutorial. Had chats with Max W unapply LE-matching Talked to Chris and Yihong

Paper on levels should check out Yihong suggested

Weakness of bottom up

Conf Day 2 EGRAPHS

FHE using egraphs

Hybrid approach. Use ordinary rewriting tecnnique but record everything in egraph. More flexible optimal objectvtie side car egraph. Interesting. No ematching. ILP extraction

Maybe I should swap in a python hash cons, use pypy?

Jijimodelin and constraint detection

specific types can be constraints to be exlppite. one hot SOS 1 2 solvers provide dedicated apis https://arxiv.org/abs/2506.06495

https://github.com/orgs/Jij-Inc/repositories?type=all

Molecualr dynamics simulations

notions of equivalence. axiomatic. distirbtuitions equivalence functional equivalance

Interesting. Allowed error equivalence I’ve htought about.

Stochastic search

Nonstanard arith egraph. Bake in something… a =O(n^2) b implies higher powers. Refining towers of equivalences. Lines, oreitned lines. Can be encoded.

eqsat - mlir

lean mlir lean egg benchmarks? herbie becnhmoarls fpcore

mlir dialect to have eclasses

%c = eqsat.eclass use reigions to express cycles rebuilding is a compiler transformation

CSE is used for eclass merging

pdl dialect fro pattern rewriting

Theory instantiation yardbird

bounded model checking

rust bounded model checking

external quantifier instnations

abstract out to z3. use egraph rules to instntiate them.

Automated high level synthesis

destructive rewrites

can delete. garage collect stuff that has no references must make sure egraphs stays groundable, well founded

pruning vs destructive rewrites

Pavel was saying what about anti-extraction that throws away worst terms

chris calls subsume operator.

incremental equality saturation

no sharing

batched eqsat. Pavel

current term.

ematching top level must be only latest version

top k goodness for previous enodes. “ruaway eclass problem”

m wll expllored. do not explore well explored eclasses. m=2

chris says I should pull out “i reserve the right to change anything”

Oatlog

for (x,y,z) in Mul: for (x,y) in Add(x):

overload iter and call. Interesting. by key?

eager commutation applying invariant permutation pruning

what about wasm generation

PLDI day 1

Keynote

Spectec for wasm https://github.com/es-meta/esmeta

Could I print an smt interpreter out of these things?

weval

I wonder if I could use this for pcode2c? I’m not really sure what “this” is though.

UB

great stuff. Actually seeing wha ub matters for perf. Focussing only on perf is smart.

overflow of int shift by too much

The egg example is probably not true / overflows at the edges.

(a + 1) - 1 = a isn’t strcitly speaking an equality?

Staged Minikanren

lambda join

mlir

hall track

Raved at Thomas Gilray about contextual and lambda datalog. The relationship of superposition and reoslution theorem provers to datalog. When he got going, it quite impressive the depth of work being done on gpu joins for datalog.

Chatted with Rudi over at te closest palace. It was funny, was totally oblivious to our surroundings. Very difficult to avoid getting smushed by cars. trying to clarify what slotted egraphs are. Indeed probably what I wrote in my slides was a wrong guess for notion of structured eid. One interesting tidbit is that maybe slotted is more closlely modelling a concept of “lambda” that is multiarity and passes argument by keyword. {x = i, y = j} * e_42 A slotted union find. Rudi has a distinction between renaming and permutation. Permutations being amongst the same small sets of names (or permuting the slots?) whereas renaming is permutation in the inifnite set of names.

from dataclasses import dataclass
@dataclass
class SlotID():
    rename : dict[str, str]
    id_ : int

class SlotUF():
    def __init__(self):
        self.uf = {}
    def fresh(self, nslots):

        eid = SlotId({"x" + i : "x" + i for i in range(nslots)}, len(self.uf))
        self.uf[]
    def find(self, sid):
        eid = s.id_
        rename = s.rename
        (rename1, id1) = self.uf[eid]
            if rename

If the group action was actually an invariance. Negation group 0 = -0 neg(1) = -1

The self edge in the union find should accumulate the symmettrices the things respects with respect to the group. Consider the identity group action. All elements do nothing. const(7, x). It would be the rules job to discver that. If we can prediscover, we should use a subgroup or something

Interestingly for the +n group, an inconsistency needs to be resolved by solving the equation? No.. well… e1 + 7 = e8 + 2 e1 + 9 = e8 + 4

Combo of negation and addition is group. e8 + 4 = e8 + 3 —> 0 = 1 -e8 + 4 = +e8 + 3 —> 2e8 = 1 . Not solvable over integers. But could be ok

Accidentally ended up being sat next to Leo de Moura at a dinner I was late to. That was a trip. I mostly clammed up (the right move?) but i did briefly poke him to see if he was aghast at the idea of knuckeldragger or egraphs modulo theories / bottom up ematching. He was not aghast but quite rightfully wanted to know what was the application. When the man says you should be using lean to your face, it’s very compelling. Lean is an insane steam train. It might change the world.

Met

Other days

I did a bad job of keeping up these notes

Sat next to Alexander Lew at dinner. Interesting conversation about probalistic types, Dario Stein. Information flow and information security vs parametrcitiy vs hygiene vs multistage programming. Information can only flow one way. “The trick” is indriect information leakage.

The probability of “truth” of a proposition vs probablisty of a proof being found / valid. Probablistic dependent typing.

Leo brought the fire in his Keynote. Why are we all not using lean? KLR https://github.com/leanprover/KLR

Pavel called Knuckledragger the KMart of proof asistants. An interesting branding idea.

Long discussion with Rudi on day 2 trying to understand what slotted egraphs are. They might start from the named perspective. i + j is not equal to x + y but they are related via renaming. A distinction between renaming, permutations. Maybe 3 kinds of identifiers, canonical identifiers 0,1,2,3 for canonized shapes, x,y,z for user names, and v,w,u for generated eclass slot names. Renamings go between these things. Maybe a groupoid is a nice model of this. Maybe multia-rity keyword only lambdas. That’s an interesting twist. I think maybe we agreed the slotted hash-cons made sense. I think in the group union find i had never considered how two things oculd be related by more than one group element. This also occurred in the labelled union find work. The root can store the invariances. i + j = j + i is actually a permutation symmettry.

Fancy fancy dinner.

As i was describing co-egraphs https://www.philipzucker.com/coegraph/ to people I was getting rexcited about it. It feels like a good idea! Zach says that a functional pearl is possible if getting into and out of the egraphs could be made easier for PEGs. Also streaming languages like streamit

Floating point verification in knuckeldragger. Alex Aiken work on math.h and FPAN maybe as good places to look

Had a very expensive coffee Ganesh bought me. https://users.cs.utah.edu/~ganesh/ It was a fun chat. He has a really cool seeming textbook on automata in addition to the other stuff he’s go going on. https://drive.google.com/file/d/17ezRWes7OkszARizFBPOqwUUqrGj8h6Q/view?usp=sharing

Spoke with Mark Moller a bit. I wonder if there could be some kind of useful partnership there if I could get co-egraphs to a more ready to chug point. KAT modulo theories came up. I looked into it again, this seems promising. Yeah, if I bake in sequences or semirings, why not bake in a kleene algebra or KAT? The normal form of kleene algerba expressiom is a bit unclear. Minimized automata are a normal form. Probably they don’t play well with general equalities. But KAT modulo theories is kind of about this?

Need a benchmark suite for egraphs. KAT could be one source. Coset enumeration.

Max pointed out that bottom up e-matching can be reconciled with relational ematching if you just brute the database whenever you give up on pruning an intermediate variable. It is ineed the idea of “universes”, just very baked into the system. Universes are a nice backup when there is nothing you can do.

Phone notes

Sympy subst

Telescope mappings. A telescope witha teelscope key

Ast and expr protocol

Dancing links in verilho

Vector telescopes.

Rebidn g the metalanguage name as late bunding. All previous proofs are invalidated

sac. “Mathematics of arrays” nieman? Rank polymorphism for tiling

With loop tensor comprehension

Perlis infinite dimensional arrays

Ordinals as timestamps

Agda generating sac. Huh

Swiestra calculating

Soft polymorphism?

Bi matching biunification.

Biunification prolog

“Rewrite rule” <= vs >=

Function symbols change the variance maybe? Marked co and contravaroant / invariant. Do mlsub im “egraph” like how yihong wanted hinldey miler

What if the “theory” of the egraph sort was these lattice polynomials. Intriguing

Stress linked to bad sleep. Huh.

Shader node editors. Huh

Nikolaj wishlist if hes here

Faster cffi

Parametricity

Suggestions about Recfunction

Inductive relations

Subsorts

Nat

Multimaps vs multisets

Maps and sets in prolog stink

Px4, nuttx

Tosokchon somgyetong. Confusing chicken soup

System F is proving theorems about second order logic?

Knuckledragger reflect comprehension syntax.

Type inference over python ast

A Datalog modulo theories. DMT

Function inversion is ematching

6 = 4 + 2. Bottom up cant find pattern ?x + ?y if decomposition doesn’t exist.

Matching to generate terms. Hmmm. Could. This is side to side I guess. Ephemeral or permanent?

Ok to go a little long

Early stoppage of undecidable. Egraph heals incomplete rebuilding

Universe enumeration

Mbqe. Egraph is a model. Hmm. Bottom up really is mbqe, but the domain is not int. Its Int[x]. Dolan said this is right for some reason

Cost functions. Use cost to prune patterns. Yeah, sure

Partial functions as a model of dtt? Then universes can be working better.

Universe should be an open type. Frozenset closes. Iterator is kind of open.

Le-matching

Homomorphism

Dependent egraphs. Telescopes are tries

refinement egraph.

Unapply and join

Max willsey promatch

Pure subtype systems

Max ground at intermediate points

Dario stein

Nu calculus for name generation

Fresh names, randomly pick. There shouldn’t be information flow from the bame. That is hygiene

Alwxander lew

Probablistic proofs

Trith modality is hard to get a hold of.

Try to model belief that proofs will be found instead? Model social processes.

Model whether an atp will find proof?

Maybe a decay factor in proofs so the thing converges?

Proablsitic dependent typing?

Lazy bdds

Egraphs modulo egraphs

Lean knuckeldragger. Soundish

Esmeta

Spectec

Code combinators for wasm. Hmm.

Compile python to javascript

Integer overflow as ub

Big shifts are ub

Unapply on z3. Register. Hmmmm

My let var trick. Monadic let? Also bind the thing you are in the context of?

What if you didnt put vats in slots

a + 3 a+ 4. Two enodes. Can turn into one enode in two slotted eids? E(3) and e(4). I guess there are many

Lambda ematching. Single level pattern.

Misaal uses egraphs

Parse egglog.

Max says cartsiean product is maximally dumb coninctovr query. What I did is projecting that a little smarter. Like datalog, there is a maximal substitution set so it becomes possible

Relational model of hoas. In different worlds, different names are used.

e1 + 7 = e6 + 14

Code get

ee6

Slotted eids have self symmettries

e1(x=i, y=j)

{x=i,y=j}e1(x,y)

M*e1

M* e1_x_y

{x=i}e1 = {x=j}e1

variabel eclass

Permuations have same codoaimin

Renamings change names

Lean on colab

Datalog on lean

Copy Harrison into lean

Subtyping stuff into lean

What is the eclass? What is the eid

Translate oleg staged to lean

Housefly lean

Klr lean tensor

Make a datalog

Multipl group elements on every union find edge

Explicit named rep. And explicit renaming combinators. Try to bake in explicit renaming

Dumbass semantics. Global enivonrment that binds actual names. Dynamic scope?

Swap operator actually is mutation that swaps environment

Bake those rules in

Z3 modelling string -> env -> v

Detecting hygiene with the double egraph?

Rename: str str -> (env -> a) -> (env -> a)

Env -> a. Baking in the semantics that everything can depend on env

Need to store list environment to convert. Fresh names

Euf in sympy

Math.h verification aiken

Recent abstract interpretation

Finch and firedrake. Pde refinement to code

Fortran translation

Isiosyncracies of the html parser

Import goto into python. Can i do continuations?

Undefined behavior as a hoare rule where you get to delete a precondition

Mlf

Boxy types

Static

Freezeml

Pantograph

Names per eclass

Groupid of names per eclass

Names on exlasses so when they merge it can make sense

var

Slotted knuth bendix simple vs compound.

Self loops are selff symmettries

Difference algebra

Actually expose lean via ffi to python

Easy export and dynlink in python

Dynamic linking lean

Knuckle lean

Only expose hol

Only grind

Co egraohs extraction. Pegs getting out and s hardest part

Compile to a streaming language - zach

Streamit

Abstract congruence closure bachmair

Twosum

Fpal aiken. Fpan network

Benji

Postage stamp problem or coin problem

Df sat. Learning

Pcp

Post correpsodnance problem

Chevyshev points

Pulse

Do some experiments on the chebyshev number system. If chebyshev is good sampler on domain, and we want to plug together functions, kind of makes sense right?

Lattice values on each eid in group union fidn huh.

Get_relation(z,x)

Proof relevant is_eq

Only need to store one relational per eclass. Nice. Maybe rudi’s multiple group elements in a lattice values?

Finding all actual equalities. To proagate to other domains

Use persistent array to compute union find diffs

Put parametric term in random context / testbench. Like noninterference testbench

Abstract interpretation in knickeldragger. Or lean. Hmm. That sounds fun/useful

Bdd to store inequality info. If var order is close to a topological order, good right?

Knuckle hoare

Interval is proof of shape And(a <= x, x <= b). With sequent precond? Prob.

I say “what if logic was inteinsically stateful” but that kind of is hoare logic. An issue is that the pre post are themselves regilar logic. What if they were hoare again?

{ {P} C {q} } C {q}

Its kind of the insight of cbmc. We don’t need another logic

Bdd as an automata that accepts sat models. Huh

Automata equality vs language (minimal aitomata) equality. Capture some computational context

Chebyshev numbers. Work at higher precision then lower. That’s the lanczos thing

Somehow derive a condition that says function composition is as accurate as possible.

Try sample compitations at low precision

Pick truncation not pointwise but to minimize minimax error

Maybe if you’re doing mathematical programming, the bernstein polys are relavant?

For a datasey of problems, one could solve for minimal error diescretization. Maybe there’s an answer for lipshitz bound functions or something.

Iteratively add transition relation and eliminate to fixpoint. Set of reachable states

Build a bdddbbd

Extraction from a bdd?

Interpolants from a bdd?

K induction?

Dfa for lists. Observations vs recursive components.

Maybe if nominal sets works over a groupoid that cleans freshness up

Kind of like set + set of names model. Carrying free vars. Hmm. Ctx, set(stuff)

No term fumctions like not. Then [x,y] - x : A is sensible. Type families can still kind of check on terms. And really, bolting in true / false is like having them in context

Nominal as a groupoid

Generic join is quantifier elim

Bdd is quantifier elim for bools.

Hmm existential elimination in query body

“ symbolic” database

Do bdddbdbds with linear constraints. Do it with polynomials?

Fourier mtzkin into grobner bases.

A dsl of datatype declarations using type = .

Cross stage persistence.

All and any should become

i do have enough infrastructure to make match work. Keyword arguments. So maybe () is right

Cosnider enodes are linear. Consider enodes are tome invariant functionals? Some are some arent.

I don’t want to propagate just simple equalities. I want to propagate the least common refinable equivalence realtion describable. Which is often simple equality. This is a generalization of refinement towers

Group union find could take in proof to canon (?). Canon(self, x, prf) because cant fenerate itself?

A group union find for cincrete group. But group union find for finitely generated group is in undecidable.

Interpetation of types as extensible sets. Fun env -> set. bool takes in other variables at play in context. Egraph extended to maintain explicit sets for set vars

Setids take place of Int etc. setids also need to be rebuilt.

I can indeed use z3 exprs as setids. Why not.

We do have a notion of a first class set now? Is that why we’re beyond egglog?

Getting a proof object could be getting the (possibly many) group element connecting.

Definitional equality is on the nose equal union find. Prop is modulo group action or other theory notion.

Long doc strings in kncukeldragger to start towards arip

Is_type, has_type, is_eq

Must_type . Query and assertion forms of all judgements

Union is defeq. Canon and equal is defeq

I guess its kind of a setoid model.

If I use. Lambdax True to represent Int, then lambda egraph has subset relationships.

Brite force egraph is theory. Rebuild propagates. Could collect learned

Using cvc5 as z3 user propagator.

Z3 to pysmt via serialize to smt

Setoid model would map each dependent type to a binary equaivalence relation type = lambda xy. R(x,y). The materialized form of this ought to be involve a first class union find yeah? Partial setoid

Refinement types are the multisorted to dependent types single sorted. Huh.

Frex

Partial eval of z3.

Term banks and smart constructors in the context of staged metaproframming