E-graphs
- E-Graph
- My E-Graph Blog Posts
- Union Finds
- Hash Cons
- E-matching
- Equality Saturation
- Proof Production
- E Graph Tactics
- Applications
- PEG Program Expression Graphs
- Egglog
- Lambda Encoding
- Contextual EGraphs
- Misc
E-Graph
What is an e-graph?
What is a summer’s day? A rainbow?
E-graphs are a data structure that efficiently holds terms and equalities between terms.
It is useful for algebraic simplification, program transformations and proving equivalences in equational reasoning
Destructive term rewriting can be used in
My E-Graph Blog Posts
- E-graphs in Julia (Part I)
- E-Graph Pattern Matching (Part II)
- Progress on Automated Reasoning for Catlab with Metatheory.jl Egraphs
- Rewriting Monoidal Categories in the Browser with Egg
- Union Find Dicts: Dictionaries Keyed on Equivalence Classes
- A Simplified E-graph Implementation
- Partial Evaluation of a Pattern Matcher for E-graphs
- Encoding E-graphs to Souffle Datalog
- Egglog: a Prolog Syntax for Egg, Checkpoint I
- JuliaCon 2021 Talk on Metatheory.jl and Snippets From the Cutting Room Floor
- Egglog 2: Automatically Proving the Pullback of a Monic is Monic
- Egglog Examples: Pullbacks, SKI, Lists, and Arithmetic
- Naive E-graph Rewriting in Souffle Datalog
- A Questionable Idea: Hacking findParent into Souffle with User Defined Functors
- Embedding E-graph Rewriting and Egglog in Constraint Handling Rules
Union Finds
Union finds are also called disjoint set datastructures.
You can take a set an group it into equivalence classes. One place this is useful is finding the connected components of a graph.
There are different styles of union finds and both make one thing in different and interesting ways.
At a level of fairly high abstraction, a union find is a contracting map. You iterate the map to a fixed point. Each equivalence class is the basin of a fixed point.
See coq post https://www.philipzucker.com/simple-coq-union-find/
How essential is path compression? It is the thing that gets you that inverse ackermann complexity. It requires mutation so far as I know.
Reference union finds
A chain of pointers that lead up to a root. Often the root is a pointer to itself.
# no path compression
# `is` is python reference equality
class BadUF():
def __init__(self):
self.parent = self
def find(self):
x = self
while x.parent is not x:
x = x.parent
return x
def union(self,rhs):
p = self.find()
p2 = rhs.find()
p.parent = p2
return p2
x = BadUF()
y = BadUF()
z = BadUF()
x.union(y)
print(x.find() is y.find())
print(x.find() is z.find())
Union find arrays and ints
What is a pointer but an index? What is an index but a pointer? In some sense your ram is just a giant array that you hand out. And vice versa you can build the analog of pointer based algorithms by using an array as ram.
In some languages you don’t have access to raw pointers easily. The array style also makes it easier/more natural to copy union finds. This style also makes it more clear how to talk about the union find in a purely functional way.
class BadUF():
# no path compression. no depth stroage.
def __init__(self):
self.parent = []
def find(self,i):
p = self.parent[i]
while p != self.parent[p]: # walrus?
p = self.parent[p]
return p
def makeset(self):
x = len(self.parent) # right?
self.parent.append(x)
return x
def set_size(self,n):
while len(self.parent) < n:
self.makeset()
return
def union(self,i,j):
i = self.find(i)
j = self.find(j)
self.parent[i] = j
return j
uf = BadUF()
x = uf.makeset()
y = uf.makeset()
z = uf.makeset()
uf.union(x,y)
assert(uf.find(x) == uf.find(y))
assert(uf.find(x) != uf.find(z))
Variations
Union Find Dicts: Dictionaries Keyed on Equivalence Classes. You can make a dict from union find to lattices. It needs to know how to merge stuff in the range. Relatedly, you could also have union find keyed on union find.
Union find with group elements on edges. kmett talk. Yihong points out http://poj.org/problem?id=2492 as a competitive programming exercise that can use this.
Scoped Union find
Colored Union find
Hash Cons
E-matching
Searching over the egraph. It is both simple and complex. It is a brother of pattern matching. The branching factor of each eclass means that it is more nondeterministic than tree pattern matching. It is similar to database search. Relational E-matching.
Consider
- pattern matching at the root of a single tree
- pattern matching at ever node in a tree
-
pattern matching over a database of trees (forest)
- pattern matching over a DAG
- pattern matching over a graph (hard in principle (subgraph isomorphism), in practice maybe not that bad)
Simplify de Moura and Bjorner Their VM is quire similar to WAM.
Equality Saturation
Proof Production
Proof producing congruence closure
A union find is a data structure useful for finding connected components in a graph. The “proof” that two vertices are connected is the path between them. We need to maintain an incremental spanning tree of some kind.
We also need to record “reasons” why each edge got added. Reasons may be rewrite rules, or congruence closure.
A proof might be:
- a term rewriting sequence
- a grounded confluent terminating term rewriting system
- The ground equalities to assert to an egraph that are enough to push the proof through with no egraph rewrite rules
See also Z3’s proof production
E Graph Tactics
- Samuel Gruetter and Dustin Jamner - Coq.
congruence
tactic is an egraph. Too slow? - Lean? lean tatic
Applications
- Denali https://courses.cs.washington.edu/courses/cse501/15sp/papers/joshi.pdf
- Herbie - improve accuracy of floating point expressions
- Szalinksi shrink 3d cad programs
- Vectorization for Digital Signal Processors via Equality Saturation
- High-performance symbolic-numerics via multiple dispatch
-
PEG Program Expression Graphs
https://ztatlock.net/pubs/2009-popl-equality-saturation-optimizations-egraphs.pdf https://rosstate.org/publications/eqsat/
Control flow graph (CFG) is just like, a thing. Its denotational semantics are a bit confused.
Egraphs like talking about functions. How to encode something cfg like into functions?
Dataflow graphs are basically just a DAG functional representation. So that intra block dataflow is not really a problem.
Being in SSA make one close to purely functional. YOu never assign to the sam name twice (although let binding shadowing could also model this purely functionally).
SSA Phi nodes are slightly bizarre. Gated Phi nodes make explicit the condition on which the merging happens. This makes them more like a purely functional if-then-else.
One can consider using streams as a functional representation of loop behavior. There are operations like tail, head, other on the stream.
PEGs have some clues on how to treat summation symbols $\Sigma$ in an egraph. Surely you can write a sum function asa CFG. Partial Sums / indefinite integrals rather than closed sum, definite integrals.
Is there a relationship between pegs and timely dataflow?
You only Grep once by koppel - uses PEG-like representation for semantic code search.
RVSDG https://github.com/jameysharp/optir/
Loops in egraphs and Landin’s knot.
Equality Saturation: a New Approach to Optimization
Quiche quiche python egraph for manipulating python AST
Tree Automata
https://github.com/ondrik/libvata
E-Graphs, VSAs, and Tree Automata: a Rosetta Stone slides merge only rules
https://en.wikipedia.org/wiki/Tree_automaton
Egglog
# I should try souffle using subsumption UF. Maybe it'll work now?
class BadUF():
# no path compression. no depth stroage.
def __init__(self):
self.parent = []
def find(self,i):
p = self.parent[i]
while p != self.parent[p]: # walrus?
p = self.parent[p]
return p
def makeset(self):
x = len(self.parent) # right?
self.parent.append(x)
return x
def set_size(self,n):
while len(self.parent) < n:
self.makeset()
return
def union(self,i,j):
i = self.find(i)
j = self.find(j)
self.parent[i] = j
return j
uf = BadUF()
uf.set_size(5)
'''
add = {(0,1,2), (1,0,3), (0,1,4)} # x + y, y + x
while True:
newadd = {(y,x,xy) for (x,y,xy) in add if (y,x,xy) not in add}
if newadd.issubset(add):
break
add.update(newadd)
# rebuild
while True:
# normalize
add = {(uf.find(x), uf.find(y), uf.find(xy)) for (x,y,xy) in add}
# congruence close
fresh = { (x,y,uf.union(xy,xy1)) for (x,y,xy) in add for (x1,y1,xy1) in add if x == x1 and y == y1 and xy != xy1}
if len(fresh) == 0:
break
print(add)
'''
add = {(0,1,2), (1,0,3), (0,1,4)} # x + y, y + x
while True:
newadd = {(y,x,xy) for (x,y,xy) in add if (y,x,xy) not in add}
if newadd.issubset(add):
break
add.update(newadd)
# rebuild
while True:
# normalize
add = {(uf.find(x), uf.find(y), uf.find(xy)) for (x,y,xy) in add}
addind = {(x,y) : xy for (x,y,xy) in add}
# congruence close
fresh = [uf.union(xy,addind[(x,y)]) for (x,y,xy) in add if addind[(x,y)] != xy]
if len(fresh) == 0:
break
print(add)
'''
add = {(0,1) : 2, (1,0) : 3 , (0,1) : 4} # x + y, y + x
while True:
newadd = {(y,x,xy) for (x,y),xy in add.items() if (y,x) not in add or add[(y,x)] != xy}
if newadd.issubset(add):
break
add.update(newadd)
# rebuild
while True:
# normalize
add = {(uf.find(x), uf.find(y), uf.find(xy)) for (x,y,xy) in add}
# congruence close
fresh = { (x,y,uf.union(xy,xy1)) for (x,y,xy) in add for (x1,y1,xy1) in add if x == x1 and y == y1 and xy != xy1}
if len(fresh) == 0:
break
'''
print(add)
Lambda Encoding
Might work:
- Combinators. SKI or categorical. Many projects are doing something like this. Relational Algebra or matrix algebra are the same thing.
- Succ lifting. pavel oleg. Do de bruijn but let Succ float out of Var. Let’s you do de bruijn index lifting as local rewrite rule. Succ has interpretation as a higher order function.
Arguable:
- Co de-bruijn. Hash Cons modulo alpha. Mcbride’s Do Be doo be doo. Doesn’t seem to work. Correlated rewrites required
- Graph compilation. Optimal graph reduction
Contextual EGraphs
No one know what this means. Everyone wants it.
Perhaps related to backtracking
Misc
What would be a mvp egraph in C specialized for the comm/assoc problem look like. Use reference based union find with tag bits?
Is ematching a wam? I suppose so. Ematching over a fixed egraph is easily expressible in prolog.
We could implement egraph as assert
We can’t use unification vars?
Does tabling help?
f(1,2,3).
g(1,2,3).
-? plus(A,B,AB), plus(AB,C,ABC)
Sketch-Guided Equality Saturation Scaling Equality Saturation to Complex Optimizations in Languages with Bindings de buijn indexes with extraction. Rise compiler
EGRAPH 2022 workshop youtube feed
Yihong trick to make ematching faster
wasm mutate - fuzzing wasm with egraphs
abstract interp for egraphs improving interval boounds
sketch guided equality satruation Give sketches (~ patterns?) which one the ergaph reaches, transition to a new set of rewriting rules / clear the egraphs.
Caviar: an e-graph based TRS for automatic code optimization halide
Automatic Datapath Optimization using E-Graphs Samuel Coward, George A. Constantinides, Theo Drane
https://en.wikipedia.org/wiki/E-graph