PLDI 2025 and E-Graphs Modulo Theories Talk
I gave a talk at EGRAPHS 2025 - Omelets Need Onions: E-graphs Modulo Theories via Bottom Up E-Matching link slides youtube
The motivation is how to bake in theory specific smarts like SMT solvers do. Discussions about the easiest form of e-matching (bottom up) when it comes to theories and also the idea of slotting in theory specific generalized unions finds, analagous vaguely to Shostak style SMT solvers.
Went pretty well I think! I had some technical difficulties at the beginning. I had tried my laptop earlier and it wasn’t working. Somehow I assumed that the projector must be locked or something for the break and didn’t question it? Jetlag is a hell of a drug. In hindsight there were a lot of options. Luckily Max let me use his laptop. I feel like gmail was also being wonky.
Some papers this year I would like to revisit in more depth
- Staged Mininkanren https://doi.org/10.1145/3729314
- Exploiting undefined behavior. Limitting use of undef behavior didn’t hurt perf much https://doi.org/10.1145/3729260 once you enhance the analysis
- Slotted egraphs https://doi.org/10.1145/3729326
- Lambda join https://doi.org/10.1145/3729299 functional streaming. Evocative of datafun kind of and stuff max is into
- Misaal https://doi.org/10.1145/3729301 used egraphs? Syntehsis of compiler
- https://doi.org/10.1145/3729309 First-Class Verification Dialects for MLIR
- https://doi.org/10.1145/3729308 Principal Type Inference under a Prefix: A Fresh Look at Static Overloading
- https://doi.org/10.1145/3729338 Practical Type Inference with Levels. Are levels a levelled union find? Using vector arenas allocated when you go insde a let
- https://doi.org/10.1145/3729298 Relational Abstractions Based on Labeled Union-Find
Had a blast! Having a core of people I know is so clutch. Also giving a talk is a nice ice breaker. I’m getting much better at just chatting with strangers as life goes on. In hindsight, I had career crippling social anxiety in my 20s. Maybe everyone does.
Notes
Started my journey!
Was tinkering on my slides Tuesday and they locked into a workable form.
Plane was not nearly as horrible as I worried. Stood up every once in a while
Announced my coming on the various social media. Good technique. I have a hard time butting into conversations but like when people start one with me.
Algebraic Subtyping and biunification. Very long and full of protens
12
Big day of walking! Started up to the seoul tower. Very uphill.
I feel like I barely ate all day by accident. It was hard to pick a place. I’ve also been a bit uncertain on whether card will be accepted?
I got ginseng chicken soup at this place. I keep getting drawn to lines.
13
Started day a little late. Practicing slides. Exhausted from yesterday. I keep crashing at 7pm and waking up at 2pm. I guess that works.
Bukchon Hanok village https://en.wikipedia.org/wiki/Bukchon_Hanok_Village
Went to https://en.wikipedia.org/wiki/Changdeokgung palace. Very nice walk in the woods, impressive architecture. I was doing this for hours, hard to appreciate at some point. Very very hot.
Polcari Sweat, good little gatorade like drink.
https://en.wikipedia.org/wiki/Cheonggyecheon walk along stream. Ate mung bean pnacakes from https://en.wikipedia.org/wiki/Gwangjang_Market . My belly is confusing me. The frying oil and hot smell really turned me off in that moment. But after a little bite my hunger came back.
Conference day 1
<www.tinyurl.com/pldi25-egglog>
https://stackoverflow.com/questions/18468786/understand-how-to-use-apply-and-unapply unapply
Watched the egglog tutorial. Had chats with Max W unapply LE-matching Talked to Chris and Yihong
Paper on levels should check out Yihong suggested
Weakness of bottom up
Conf Day 2 EGRAPHS
FHE using egraphs
Hybrid approach. Use ordinary rewriting tecnnique but record everything in egraph. More flexible optimal objectvtie side car egraph. Interesting. No ematching. ILP extraction
Maybe I should swap in a python hash cons, use pypy?
Jijimodelin and constraint detection
specific types can be constraints to be exlppite. one hot SOS 1 2 solvers provide dedicated apis https://arxiv.org/abs/2506.06495
https://github.com/orgs/Jij-Inc/repositories?type=all
Molecualr dynamics simulations
notions of equivalence. axiomatic. distirbtuitions equivalence functional equivalance
Interesting. Allowed error equivalence I’ve htought about.
Stochastic search
Nonstanard arith egraph. Bake in something… a =O(n^2) b implies higher powers. Refining towers of equivalences. Lines, oreitned lines. Can be encoded.
eqsat - mlir
lean mlir lean egg benchmarks? herbie becnhmoarls fpcore
mlir dialect to have eclasses
%c = eqsat.eclass use reigions to express cycles rebuilding is a compiler transformation
CSE is used for eclass merging
pdl dialect fro pattern rewriting
Theory instantiation yardbird
bounded model checking
rust bounded model checking
external quantifier instnations
abstract out to z3. use egraph rules to instntiate them.
Automated high level synthesis
destructive rewrites
can delete. garage collect stuff that has no references must make sure egraphs stays groundable, well founded
pruning vs destructive rewrites
Pavel was saying what about anti-extraction that throws away worst terms
chris calls subsume operator.
incremental equality saturation
no sharing
batched eqsat. Pavel
current term.
ematching top level must be only latest version
top k goodness for previous enodes. “ruaway eclass problem”
m wll expllored. do not explore well explored eclasses. m=2
chris says I should pull out “i reserve the right to change anything”
Oatlog
for (x,y,z) in Mul: for (x,y) in Add(x):
overload iter and call. Interesting. by key?
eager commutation applying invariant permutation pruning
what about wasm generation
PLDI day 1
Keynote
Spectec for wasm https://github.com/es-meta/esmeta
Could I print an smt interpreter out of these things?
weval
I wonder if I could use this for pcode2c? I’m not really sure what “this” is though.
UB
great stuff. Actually seeing wha ub matters for perf. Focussing only on perf is smart.
overflow of int shift by too much
The egg example is probably not true / overflows at the edges.
(a + 1) - 1 = a
isn’t strcitly speaking an equality?
Staged Minikanren
lambda join
mlir
hall track
Raved at Thomas Gilray about contextual and lambda datalog. The relationship of superposition and reoslution theorem provers to datalog. When he got going, it quite impressive the depth of work being done on gpu joins for datalog.
Chatted with Rudi over at te closest palace. It was funny, was totally oblivious to our surroundings. Very difficult to avoid getting smushed by cars. trying to clarify what slotted egraphs are. Indeed probably what I wrote in my slides was a wrong guess for notion of structured eid.
One interesting tidbit is that maybe slotted is more closlely modelling a concept of “lambda” that is multiarity and passes argument by keyword.
{x = i, y = j} * e_42
A slotted union find.
Rudi has a distinction between renaming and permutation. Permutations being amongst the same small sets of names (or permuting the slots?) whereas renaming is permutation in the inifnite set of names.
from dataclasses import dataclass
@dataclass
class SlotID():
rename : dict[str, str]
id_ : int
class SlotUF():
def __init__(self):
self.uf = {}
def fresh(self, nslots):
eid = SlotId({"x" + i : "x" + i for i in range(nslots)}, len(self.uf))
self.uf[]
def find(self, sid):
eid = s.id_
rename = s.rename
(rename1, id1) = self.uf[eid]
if rename
If the group action was actually an invariance. Negation group 0 = -0 neg(1) = -1
The self edge in the union find should accumulate the symmettrices the things respects with respect to the group. Consider the identity group action. All elements do nothing. const(7, x). It would be the rules job to discver that. If we can prediscover, we should use a subgroup or something
Interestingly for the +n group, an inconsistency needs to be resolved by solving the equation? No.. well… e1 + 7 = e8 + 2 e1 + 9 = e8 + 4
Combo of negation and addition is group. e8 + 4 = e8 + 3 —> 0 = 1 -e8 + 4 = +e8 + 3 —> 2e8 = 1 . Not solvable over integers. But could be ok
Accidentally ended up being sat next to Leo de Moura at a dinner I was late to. That was a trip. I mostly clammed up (the right move?) but i did briefly poke him to see if he was aghast at the idea of knuckeldragger or egraphs modulo theories / bottom up ematching. He was not aghast but quite rightfully wanted to know what was the application. When the man says you should be using lean to your face, it’s very compelling. Lean is an insane steam train. It might change the world.
Met
Other days
I did a bad job of keeping up these notes
Sat next to Alexander Lew at dinner. Interesting conversation about probalistic types, Dario Stein. Information flow and information security vs parametrcitiy vs hygiene vs multistage programming. Information can only flow one way. “The trick” is indriect information leakage.
The probability of “truth” of a proposition vs probablisty of a proof being found / valid. Probablistic dependent typing.
Leo brought the fire in his Keynote. Why are we all not using lean? KLR https://github.com/leanprover/KLR
Pavel called Knuckledragger the KMart of proof asistants. An interesting branding idea.
Long discussion with Rudi on day 2 trying to understand what slotted egraphs are. They might start from the named perspective. i + j
is not equal to x + y
but they are related via renaming. A distinction between renaming, permutations. Maybe 3 kinds of identifiers, canonical identifiers 0,1,2,3 for canonized shapes, x,y,z for user names, and v,w,u for generated eclass slot names. Renamings go between these things. Maybe a groupoid is a nice model of this. Maybe multia-rity keyword only lambdas. That’s an interesting twist. I think maybe we agreed the slotted hash-cons made sense. I think in the group union find i had never considered how two things oculd be related by more than one group element. This also occurred in the labelled union find work. The root can store the invariances. i + j = j + i
is actually a permutation symmettry.
Fancy fancy dinner.
As i was describing co-egraphs https://www.philipzucker.com/coegraph/ to people I was getting rexcited about it. It feels like a good idea! Zach says that a functional pearl is possible if getting into and out of the egraphs could be made easier for PEGs. Also streaming languages like streamit
Floating point verification in knuckeldragger. Alex Aiken work on math.h and FPAN maybe as good places to look
Had a very expensive coffee Ganesh bought me. https://users.cs.utah.edu/~ganesh/ It was a fun chat. He has a really cool seeming textbook on automata in addition to the other stuff he’s go going on. https://drive.google.com/file/d/17ezRWes7OkszARizFBPOqwUUqrGj8h6Q/view?usp=sharing
Spoke with Mark Moller a bit. I wonder if there could be some kind of useful partnership there if I could get co-egraphs to a more ready to chug point. KAT modulo theories came up. I looked into it again, this seems promising. Yeah, if I bake in sequences or semirings, why not bake in a kleene algebra or KAT? The normal form of kleene algerba expressiom is a bit unclear. Minimized automata are a normal form. Probably they don’t play well with general equalities. But KAT modulo theories is kind of about this?
Need a benchmark suite for egraphs. KAT could be one source. Coset enumeration.
Max pointed out that bottom up e-matching can be reconciled with relational ematching if you just brute the database whenever you give up on pruning an intermediate variable. It is ineed the idea of “universes”, just very baked into the system. Universes are a nice backup when there is nothing you can do.
Phone notes
Sympy subst
Telescope mappings. A telescope witha teelscope key
Ast and expr protocol
Dancing links in verilho
Vector telescopes.
Rebidn g the metalanguage name as late bunding. All previous proofs are invalidated
sac. “Mathematics of arrays” nieman? Rank polymorphism for tiling
With loop tensor comprehension
Perlis infinite dimensional arrays
Ordinals as timestamps
Agda generating sac. Huh
Swiestra calculating
Soft polymorphism?
Bi matching biunification.
Biunification prolog
“Rewrite rule” <= vs >=
Function symbols change the variance maybe? Marked co and contravaroant / invariant. Do mlsub im “egraph” like how yihong wanted hinldey miler
What if the “theory” of the egraph sort was these lattice polynomials. Intriguing
Stress linked to bad sleep. Huh.
Shader node editors. Huh
Nikolaj wishlist if hes here
Faster cffi
Parametricity
Suggestions about Recfunction
Inductive relations
Subsorts
Nat
Multimaps vs multisets
Maps and sets in prolog stink
Px4, nuttx
Tosokchon somgyetong. Confusing chicken soup
System F is proving theorems about second order logic?
Knuckledragger reflect comprehension syntax.
Type inference over python ast
A Datalog modulo theories. DMT
Function inversion is ematching
6 = 4 + 2. Bottom up cant find pattern ?x + ?y if decomposition doesn’t exist.
Matching to generate terms. Hmmm. Could. This is side to side I guess. Ephemeral or permanent?
Ok to go a little long
Early stoppage of undecidable. Egraph heals incomplete rebuilding
Universe enumeration
Mbqe. Egraph is a model. Hmm. Bottom up really is mbqe, but the domain is not int. Its Int[x]. Dolan said this is right for some reason
Cost functions. Use cost to prune patterns. Yeah, sure
Partial functions as a model of dtt? Then universes can be working better.
Universe should be an open type. Frozenset closes. Iterator is kind of open.
Le-matching
Homomorphism
Dependent egraphs. Telescopes are tries
refinement egraph.
Unapply and join
Max willsey promatch
Pure subtype systems
Max ground at intermediate points
Dario stein
Nu calculus for name generation
Fresh names, randomly pick. There shouldn’t be information flow from the bame. That is hygiene
Alwxander lew
Probablistic proofs
Trith modality is hard to get a hold of.
Try to model belief that proofs will be found instead? Model social processes.
Model whether an atp will find proof?
Maybe a decay factor in proofs so the thing converges?
Proablsitic dependent typing?
Lazy bdds
Egraphs modulo egraphs
Lean knuckeldragger. Soundish
Esmeta
Spectec
Code combinators for wasm. Hmm.
Compile python to javascript
Integer overflow as ub
Big shifts are ub
Unapply on z3. Register. Hmmmm
My let var trick. Monadic let? Also bind the thing you are in the context of?
What if you didnt put vats in slots
a + 3 a+ 4. Two enodes. Can turn into one enode in two slotted eids? E(3) and e(4). I guess there are many
Lambda ematching. Single level pattern.
Misaal uses egraphs
Parse egglog.
Max says cartsiean product is maximally dumb coninctovr query. What I did is projecting that a little smarter. Like datalog, there is a maximal substitution set so it becomes possible
Relational model of hoas. In different worlds, different names are used.
e1 + 7 = e6 + 14
Code get
ee6
Slotted eids have self symmettries
e1(x=i, y=j)
{x=i,y=j}e1(x,y)
M*e1
M* e1_x_y
{x=i}e1 = {x=j}e1
variabel eclass
Permuations have same codoaimin
Renamings change names
Lean on colab
Datalog on lean
Copy Harrison into lean
Subtyping stuff into lean
What is the eclass? What is the eid
Translate oleg staged to lean
Housefly lean
Klr lean tensor
Make a datalog
Multipl group elements on every union find edge
Explicit named rep. And explicit renaming combinators. Try to bake in explicit renaming
Dumbass semantics. Global enivonrment that binds actual names. Dynamic scope?
Swap operator actually is mutation that swaps environment
Bake those rules in
Z3 modelling string -> env -> v
Detecting hygiene with the double egraph?
Rename: str str -> (env -> a) -> (env -> a)
Env -> a. Baking in the semantics that everything can depend on env
Need to store list environment to convert. Fresh names
Euf in sympy
Math.h verification aiken
Recent abstract interpretation
Finch and firedrake. Pde refinement to code
Fortran translation
Isiosyncracies of the html parser
Import goto into python. Can i do continuations?
Undefined behavior as a hoare rule where you get to delete a precondition
Mlf
Boxy types
Static
Freezeml
Pantograph
Names per eclass
Groupid of names per eclass
Names on exlasses so when they merge it can make sense
var
Slotted knuth bendix simple vs compound.
Self loops are selff symmettries
Difference algebra
Actually expose lean via ffi to python
Easy export and dynlink in python
Dynamic linking lean
Knuckle lean
Only expose hol
Only grind
Co egraohs extraction. Pegs getting out and s hardest part
Compile to a streaming language - zach
Streamit
Abstract congruence closure bachmair
Twosum
Fpal aiken. Fpan network
Benji
Postage stamp problem or coin problem
Df sat. Learning
Pcp
Post correpsodnance problem
Chevyshev points
Pulse
Do some experiments on the chebyshev number system. If chebyshev is good sampler on domain, and we want to plug together functions, kind of makes sense right?
Lattice values on each eid in group union fidn huh.
Get_relation(z,x)
Proof relevant is_eq
Only need to store one relational per eclass. Nice. Maybe rudi’s multiple group elements in a lattice values?
Finding all actual equalities. To proagate to other domains
Use persistent array to compute union find diffs
Put parametric term in random context / testbench. Like noninterference testbench
Abstract interpretation in knickeldragger. Or lean. Hmm. That sounds fun/useful
Bdd to store inequality info. If var order is close to a topological order, good right?
Knuckle hoare
Interval is proof of shape And(a <= x, x <= b). With sequent precond? Prob.
I say “what if logic was inteinsically stateful” but that kind of is hoare logic. An issue is that the pre post are themselves regilar logic. What if they were hoare again?
{ {P} C {q} } C {q}
Its kind of the insight of cbmc. We don’t need another logic
Bdd as an automata that accepts sat models. Huh
Automata equality vs language (minimal aitomata) equality. Capture some computational context
Chebyshev numbers. Work at higher precision then lower. That’s the lanczos thing
Somehow derive a condition that says function composition is as accurate as possible.
Try sample compitations at low precision
Pick truncation not pointwise but to minimize minimax error
Maybe if you’re doing mathematical programming, the bernstein polys are relavant?
For a datasey of problems, one could solve for minimal error diescretization. Maybe there’s an answer for lipshitz bound functions or something.
Iteratively add transition relation and eliminate to fixpoint. Set of reachable states
Build a bdddbbd
Extraction from a bdd?
Interpolants from a bdd?
K induction?
Dfa for lists. Observations vs recursive components.
Maybe if nominal sets works over a groupoid that cleans freshness up
Kind of like set + set of names model. Carrying free vars. Hmm. Ctx, set(stuff)
No term fumctions like not. Then [x,y] | - x : A is sensible. Type families can still kind of check on terms. And really, bolting in true / false is like having them in context |
Nominal as a groupoid
Generic join is quantifier elim
Bdd is quantifier elim for bools.
Hmm existential elimination in query body
“ symbolic” database
Do bdddbdbds with linear constraints. Do it with polynomials?
Fourier mtzkin into grobner bases.
A dsl of datatype declarations using type = .
Cross stage persistence.
All and any should become
i do have enough infrastructure to make match work. Keyword arguments. So maybe () is right
Cosnider enodes are linear. Consider enodes are tome invariant functionals? Some are some arent.
I don’t want to propagate just simple equalities. I want to propagate the least common refinable equivalence realtion describable. Which is often simple equality. This is a generalization of refinement towers
Group union find could take in proof to canon (?). Canon(self, x, prf) because cant fenerate itself?
A group union find for cincrete group. But group union find for finitely generated group is in undecidable.
Interpetation of types as extensible sets. Fun env -> set. bool takes in other variables at play in context. Egraph extended to maintain explicit sets for set vars
Setids take place of Int etc. setids also need to be rebuilt.
I can indeed use z3 exprs as setids. Why not.
We do have a notion of a first class set now? Is that why we’re beyond egglog?
Getting a proof object could be getting the (possibly many) group element connecting.
Definitional equality is on the nose equal union find. Prop is modulo group action or other theory notion.
Long doc strings in kncukeldragger to start towards arip
Is_type, has_type, is_eq
Must_type . Query and assertion forms of all judgements
Union is defeq. Canon and equal is defeq
I guess its kind of a setoid model.
If I use. Lambdax True to represent Int, then lambda egraph has subset relationships.
Brite force egraph is theory. Rebuild propagates. Could collect learned
Using cvc5 as z3 user propagator.
Z3 to pysmt via serialize to smt
Setoid model would map each dependent type to a binary equaivalence relation type = lambda xy. R(x,y). The materialized form of this ought to be involve a first class union find yeah? Partial setoid
Refinement types are the multisorted to dependent types single sorted. Huh.
Frex
Partial eval of z3.
Term banks and smart constructors in the context of staged metaproframming