Primitive, Leveled, and Quantifier Union Finds
There is an interesting and simple union find variation that allows primitives into the union find.
I think this could make for a nicer experience possibly in egg or egglog, where you don’t need functions to transfer between primitives and uninterpreted sorts. SMT does not require such a distinction and it is a small inelegance IMO to require it. eclass-like entities should behave more like SMT constants (like smt.Int("x")) and primitive entities should be like SMT values (like smt.IntVal(1)). SMT allows these things to be equated.
There is also a concept called levels I’ve seen in Hindley Milner type checking which I think can be reified as a union find variation.
If you combine these two things, you get a union find that works for atomic equational problems under a quantifier prefix.
Basic Union Find
A basic union find can be built in an arena style. We use integers as identifiers, wrapped in a tuple to distinguish them from primitive integers to come next.
find looks up in the parent table iteratively to find a root. I ignore path compression but it’s nice to do.
union finds the roots and then sets one root equal to the other. There is a lot of flexibility in the ordering which can be used for various purposes (as we’ll see some of).
A regular union find is a useful thing for finding minimal (least equations) /maximal (most elements) models for atomic equations a = b /\ b = c /\ d = e but also similarly find atomic equational proofs.
The parents relationship can be written in RefCell, Dictionary, or Arena style which are basically equivalent but can make the mind go in different directions. It is tougher to bundle up the entire union find in RefCell form.
from dataclasses import dataclass, field
from typing import NamedTuple
class EId(NamedTuple):
id : int
@dataclass
class UFArena():
parents : list[EId] = field(default_factory=list)
def makeset(self):
eid = EId(len(self.parents))
self.parents.append(eid)
return eid
def find(self, x : EId):
while self.parents[x.id] != x:
x = self.parents[x.id]
return x
def union(self, x, y):
x,y = self.find(x), self.find(y)
if x != y:
if x < y:
x,y = y,x
self.parents[x.id] = y
return y
Some usage
uf = UFArena()
x,y,z,w = [uf.makeset() for i in range(4) ]
uf.union(x, y)
uf.union(y, z)
assert uf.find(z) == uf.find(x)
uf
UFArena(parents=[EId(id=0), EId(id=0), EId(id=0), EId(id=3)])
Primitive Union Finds
We can extend the previous union find by allowing more than just eids into the parents table.
Now we have to tie break when we union an eid to a primitive such that the primitive becomes the root. Primitives “win”.
If we union two distinct primitives, we need some method to resolve them or throw an error. If we resolve tree-like primitives structurally, we get something similar to a syntactic unification routine.
In a more low level implementation, primitives may be required to either fit into a machine word or be held in a separate interning table.
Other names that I like are “Atoms” or “Rigids” rather than “primitive”. Primitive implies that it is something like i64 but there is utility to other examples that are abstract uninterpreted entities and yet still rigid, whereas eids are flexible.
@dataclass
class PrimUF():
parents : list[EId | object] = field(default_factory=list)
def makeset(self) -> EId:
eid = EId(len(self.parents))
self.parents.append(eid)
return eid
def find(self, x : EId | object):
while isinstance(x, EId) and self.parents[x.id] != x:
x = self.parents[x.id]
return x
def resolve(self, x, y):
# You can add in something that replicates unification without occurs check
if isinstance(x, tuple) and isinstance(y, tuple) and len(x) == len(y):
return tuple(self.union(a,b) for a,b in zip(x,y))
else:
raise Exception("Conflict", x, y)
def union(self, x : EId | object, y : EId | object):
x,y = self.find(x), self.find(y)
if x == y:
return y
else:
if not isinstance(x, EId):
if isinstance(y, EId):
x,y = y,x
else:
return self.resolve(x,y)
self.parents[x.id] = y
return y
uf = PrimUF()
x,y,z,w = [uf.makeset() for i in range(4)]
uf.union(x, y)
uf.union("fred", y)
uf.find(x)
'fred'
A little bit of unification. We do not implement an occurs check, which is overplayed anyway IMO.
uf.union((w,1), (2,1))
uf.find(w)
2
uf
PrimUF(parents=[EId(id=1), 'fred', EId(id=2), 2])
Leveled Union Find
In classic Hindley Milner type inference, when you exit let nodes, you generalize the type variables introduced inside that let. You don’t generalize however if the type variable has been unified to something introduced at a higher scope.
Levels are a somewhat mysterious technique for doing this.
See:
- https://dl.acm.org/doi/10.1145/3729338 Practical Type Inference with Levels
- Like many things, this let-generalization is a choice not a god given thing
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tldi10-vytiniotis.pdf Let Should not be Generalised
- https://okmij.org/ftp/ML/generalization.html How OCaml type checker works – or what polymorphism and garbage collection have in common
I think the following union find captures some of the things here. Oleg’s article mentions that levels are evocative of region based memory management. Here the levels of our union finds are indeed maintained in separate vector arenas. If we want to, we can scan the level as we pop it to do something like generalize. We also ensure when we pop a level, nothing from a higher level ever appears in the lower level tables. Still, the unions of the higher levels may transitively effect the lower level tables. It’s an interesting variation of context.
Like many structured eids, it may be desirable to store the extra data not as part of the eid, but in a separate table mapping eids to their levels levels: list[level_num]. This is the relationship between structured eids and egg style “analyses”. The difference is something like a “structure of arrays” transformation. Similar to that, I think conceptually the structured eid concept is one notch simpler and the analyses style separate table is probably more efficient.
TODO: Actually implement Hindley Milner using this thing.
class EId(NamedTuple):
level : int
id : int
@dataclass
class LevelUF():
levels : list[list[EId]] # a list of parent lists
def __init__(self):
self.levels = [[]]
def makeset(self, level=None) -> EId:
if level is not None:
while len(self.levels) <= level:
self.make_level()
else:
level = len(self.levels) - 1
uf = self.levels[level]
eid = EId(level, len(uf))
uf.append(eid)
return eid
def make_level(self) -> int:
self.levels.append([])
return len(self.levels) - 1
def pop_level(self):
return self.levels.pop()
def find(self, x : EId): # level cutoff?
while True:
y = self.levels[x.level][x.id]
if x == y:
return x
else:
x = y
def union(self, x : EId, y : EId):
x,y = self.find(x), self.find(y)
if x != y:
if x.level <= y.level:
x,y = y,x
self.levels[x.level][x.id] = y
return y
uf = LevelUF()
x,y,z = uf.makeset(0), uf.makeset(0), uf.makeset(1)
#uf.union(x,y)
uf.union(x,z)
uf.union(y,z)
assert uf.find(x) == uf.find(z)
uf
LevelUF(levels=[[EId(level=0, id=1), EId(level=0, id=1)], [EId(level=0, id=0)]])
Alternating Quantifier Union Find
We can combine these two union find variations by also giving constants levels they appear at. Now a lower eid may not be unioned to a higher rigid primitive.
Basically, universal variables should not unify/equate to things at higher scope. There is something goofy about exists x, forall y, x = y. It collapses the model to a single entity and is not provable from base equality axioms since any other model is a countermodels. For some purposes it should be disallowed as a scope extrusion.
Lean has a related mechanism in its unification https://leanprover-community.github.io/lean4-metaprogramming-book/main/04_metam.html#metavariable-depth , and so does lambda prolog and other systems that deal with quantifiers and unification.
By comparison forall x, exists y, y = x is not problematic in the same way and has many models for which we may be interested in a minimal one. x should be rigid in some sense (herbrandization / “intro” tactic).
I have wondered if this sort of scope extrusion might be a logically dignified way of modelling “disallowed” constants in e-graph extraction. Extraction probably happens at scope 0. Constants we do not want to appear in the result could be put at a larger level and then a logically correct extraction mechanism would note this.
class EPrim(NamedTuple):
level : int
id : int
@dataclass
class QuantUF():
levels : list[list[EId]] # a list of parent lists
freshcounter : int = 0
def __init__(self):
self.levels = [[]]
def makeset(self, level=None) -> EId:
if level is not None:
while len(self.levels) <= level:
self.make_level()
else:
level = len(self.levels) - 1
uf = self.levels[level]
eid = EId(level, len(uf))
uf.append(eid)
return eid
def make_level(self) -> int:
self.levels.append([])
return len(self.levels) - 1
def make_eprim(self, level):
self.freshcounter += 1
return EPrim(level, self.freshcounter)
def pop_level(self):
return self.levels.pop()
def find(self, x : EId | object):
while True:
if not isinstance(x, EId):
return x
y = self.levels[x.level][x.id]
if x == y:
return y
else:
x = y
def resolve(self, x, y):
assert not (isinstance(x, EId) and isinstance(y, EId))
raise Exception("Conflict")
def union(self, x : EId, y : EId):
x,y = self.find(x), self.find(y)
if x == y:
return y
else:
if isinstance(x, EId) and isinstance(y, EId):
if x.level <= y.level:
x,y = y,x
self.levels[x.level][x.id] = y
return y
if isinstance(y, EId):
x,y = y,x
if isinstance(x, EId) and isinstance(y, EPrim):
if x.level > y.level:
raise Exception("Scope Extrusion", x, y)
else:
self.levels[x.level][x.id] = y
return y
else:
self.resolve(x,y)
uf = QuantUF()
x,y = [uf.makeset(0) for i in range(2)]
z,w = [uf.makeset(1) for i in range(2)]
a,b = [uf.make_eprim(1) for i in range(2)]
uf.union(z,a)
uf
QuantUF(levels=[[EId(level=0, id=0), EId(level=0, id=1)], [EId(level=1, id=0), EId(level=1, id=0)]], freshcounter=2)
Bits and Bobbles
This post is part of a series of “generalized union finds”
- Poset / inequality union finds for refinement https://www.philipzucker.com/le_find
- Knuth Bendix / grobner / gaussian elimination as a union find https://www.philipzucker.com/linear_grobner_egraph https://www.philipzucker.com/multiset_rw It is desirable to look for stuff that is cheaper and less complicated that knuth bendix stuff. The gaussian ones are an edge case
- group / groupoid union finds <www.philipzucker.com/union-find-groupoid>, lemerre paper, Kmett talks
- semigroup / lattice / uf dict
- proof producing union finds
- backtrackable union finds
- Other examples in my EMT arxiv preprint https://arxiv.org/abs/2504.14340
- I’m still not 100% sure if a slotted union find makes any sense. It probably does. Yes. It’s semantically strange though.
I started this blog post intending to write about “contextual union finds”. Maybe I’ll still do that at a later date. What I thought worked turned out to not work. Quantifier UFs are a different notion of “context” / scope.
I thought you could store a “delta union find” that holds some new unions in a sparse structure but inherits all the other unions from a parent union find. When you do find, you first find in the delta uf, then you find in the parent uf. This turns out to not be guaranteed to be canonical, at least without freezing the parent. It requires a search procedure to find the canonical root. or accepting non canonicalness until a rebuild phase (which is ok I suppose).
Another thing to write about is using union finds as keys in a dictionaries. One can do so by fully compressing them and using min as a tie breaker. This gives a canonical tuple representing the partition. This can be useful perhaps for hypothetical union finds that have unions that only occur under the assumption of other unions.
A Quantifier Union find + a backtracking union find is the kind of thing you’d want for unification in a lambda prolog.
A quantifier Uf shouldn’t really just have integer levels becasue one could have quntifier scopes that are parallel and not nested.
exists x, forall y, yada /\ forall z, yada)
So there is a tree of scopes.
You could put the occurs check into resolve. Or a automata minimization.
@dataclass
class BackTrackUF():
trail : list[list[int]] = field(default_factory=lambda: [[]])# things that used to be roots
parents : list[int] = field(default_factory=list)
def makeset(self):
x = len(self.parents)
self.parents.append(x)
return x
def pop(self):
ps = self.trail.pop()
for x in ps:
self.parents[x] = x
def push(self):
self.trail.append([])
def find(self, x):
while self.parents[x] != x:
x = self.parents[x]
return x
def union(self, x, y):
x,y = self.find(x), self.find(y)
if x != y:
self.trail[-1].append(x)
self.parents[x] = y
return y
uf = BackTrackUF()
x,y,z = [uf.makeset() for _ in range(3)]
uf.push()
uf.union(x,y)
uf.push()
uf.union(y,z)
uf
BackTrackUF(trail=[[], [0], [1]], parents=[1, 2, 2])
uf.pop()
uf
BackTrackUF(trail=[[], [0]], parents=[1, 1, 2])
Using the QuantUF to build a model for an alternating quantififer:
class Q(QuantUF):
def forall(self, lam):
lam(self.makeprim())
def exists(self, lam):
l = self.makelevel()
eid = self.makeset()
lam(eid)
q = Q()
q.forall(lambda x: q.exists(lambda y: q.union(x, y)))
A regular union find is a useful thing for atomic equational “problems” like a = b /\ b = c /\ d = e. By “problems” I am being vague. One such problems is producing a best model (minimal / maximal) for the equations.
Constants can usually be seen as implicitly bound by an existential quantifier exists a b c d e, a = b /\ b = c /\ d = e.
In what sense is it a Universal? None?
forall x y z, exists a b c d e, a = b /\ b = c /\ d = e /\ x = a. can be solved by the primitive union find, where the x y z variables are considered primitives.
This can be generalized to a mixed prefix. Very often one does this by skolemizing, pushing existentials outside of the universal and adding the universal as parameters. We don’t have to do this though. Implicitly everything of the lower level is a parameter.
title: “Context Union Finds: Keyed, Leveled, “
I’ve made it a point to accumulate union find variations.
If you forget about term structure, a fancy e-graph becomes a fancy union find. If you add term structure back in, a fancy union find becomes a fancy egraph
Basic
A union find is a very simple thing, obscured in the article by the path compression and the tie breaking mechanisms https://en.wikipedia.org/wiki/Disjoint-set_data_structure
class UF(Protocol):
def find(self, x:object) -> object: ...
def union(self, x:object, y:object) -> object: ...
def rebuild(self) -> None: ...
#makeset() -> Key?
#type Key
def rebuild(self):
for i in range(len(self.parents)):
self.parents[i] = self.find(i)
A little usage
uf = UFArena()
for i in range(10):
uf.makeset()
uf.union(0, 1)
uf.union(1, 2)
uf.rebuild()
uf
UFArena(parents=[0, 0, 0, 3, 4, 5, 6, 7, 8, 9])
Another form of the union find is instead of using integers as keys, we can allow for arbitrary objects as keys.
We can have a convention of not being in the dict as being a root or have a convention of mapping to None as being the root. Not being in the dict is nice in that we can be talking about some implicit infinite universe of stuff with just a small finite amount of equality added on.
from dataclasses import dataclass, field
@dataclass
class UFDict():
uf : dict[object,object] = field(default_factory=dict)
def find(self, x):
while x in self.uf:
x = self.uf[x]
return x
def union(self, x, y):
x,y = self.find(x), self.find(y)
if x != y:
self.uf[x] = y
return y
def rebuild(self):
for k in self.uf.keys():
self.uf[k] = self.find(k)
A different style of making a union find is using ref cells. This style makes a global view of the union find more difficult. One could do this by storing the cells in an array, but then we’re back to the previous style. A garbage collector also can recollect up all the Cells.
That this is a common picture of what a union find is makes things more confusing. It is abusing too many features of the host language.
from typing import Optional
class Cell():
parent : Optional["Cell"]
def __init__(self):
self.parent = None
def find(self):
x = self
while x.parent is not None:
x = x.parent
return x
def union(self, other:"Cell"):
x,y = self.find(), other.find()
if x != y:
x.parent = y
return y
def __eq__(self, other):
return self.find() is other.find()
A little usage
x,y,z = Cell(), Cell(), Cell()
x.union(y)
assert x == y
assert x != z
y.union(z)
assert x == z
Context by Key
We need something like convexity to ensure that bouncing between two maps gives us a unique solution. submodularity Scan only in between x and find(x)? Seem goofy. Iterate only over equivalence class in higher uf. Build entire eclass on the fly. Hmm.
We can maintain a delta union find of a layer of unions on top of some base union find. The union find is pretty forgiving
When we union, we union into a particular union find, All child union finds will immediately inherit the unions of it’s parents.
We anticipate in use that the Delta’s will often be sparse, so using the dictionary form makes sense (or some other sparse data structure)
I’m not convinced finding a fixed point is sufficient maybe with full scan on union.
@dataclass
class DeltaUF():
parent_uf : UF
duf : dict[object,object] = field(default_factory=dict)
def find(self, x):
#while x in self.duf:
# x = self.duf[x]
#return self.parent_uf.find(x)
while True:
cur_x = x
while x in self.duf:
x = self.duf[x]
x = self.parent_uf.find(x)
if x == cur_x:
return x
def union(self, x, y):
x,y = self.find(x), self.find(y)
if x != y:
if x < y:
x,y = y,x
self.duf[x] = y
return y
def rebuild(self):
for k in self.duf.keys(): # There's some monkey business here about accidental self loops
self.duf[k] = self.find(k)
0 1 2
|
| \ root uf
|
| \
| \ |
| \ delta |
taking delta first makes us miss that root can take 2->0 so we odn’t get a minimum for the eclass. Rerooting rebuilding We could miss stuff and fix it up to find later.
Or maintain an iterator over the eclass and search it on find
Or freeze the branches and we’re only able to edit leaves
# Don't support lazy deltas.
class DeltaUF():
parent_uf : UF
duf : dict[object,object] = field(default_factory=dict)
def find(self, x):
v1 = self.duf[x]
return min([self.parent.find(v) for k,v in self.duf.items() if v == v1])
def union(self, x, y):
x,y = self.find(x), self.find(y)
if x != y:
self.duf[x] = y
return y
def rebuild(self):
for k in self.duf.keys():
self.duf[k] = self.find(k)
uf = UFArena()
x,y,z,w = [uf.makeset() for _ in range(4)]
uf.union(x,y)
assert uf.find(x) == uf.find(y)
uf1 = DeltaUF(uf)
uf1_1 = DeltaUF(uf1)
uf1_2 = DeltaUF(uf1)
assert uf1.find(x) == uf1.find(y)
assert uf1_1.find(x) == uf1_1.find(y)
assert uf1_2.find(x) == uf1_2.find(y)
# we inherit from the parent
uf.union(z,w)
assert uf.find(z) == uf.find(w)
assert uf1.find(z) == uf1.find(w)
assert uf1_1.find(z) == uf1_1.find(w)
uf1_1.union(y,z)
#print(uf1_1)
#print(uf1_1.find(x),uf1_1.find(w))
assert uf1_1.find(x) == uf1_1.find(w)
assert uf1.find(x) != uf1.find(w)
uf1_1
DeltaUF(parent_uf=DeltaUF(parent_uf=UFArena(parents=[0, 0, 2, 2]), duf={}), duf={2: 0})
DeltaUF as written is in a funny kind of way a combo of the Cell pointer methodology and the arena style.
Leaning into arenas everywhere, the parent_uf should be an index into a ufs array.
class UFUniverse():
ufs : list[UF] = field(default_factory=list)
def make_uf(self, parents):
if len(parents) == 0:
uf = UFArena()
else:
class UFUniverse():
root : UFArena
dufs : DeltaUF
def find(self, dufind, x):
x = root.find(x)
while True:
dufind
duf = dufs[dufind]
x = duf.find(x)
if duf.parent_uf is not None:
dufind = duf.parent_uf
else:
return x
def rebuild(self):
for duf in self.dufs:
duf.rebuild()
self.root.rebuild()
Canonizing Union Finds
A union find, similar to many data structures like dicts and sets, does not represent it’s semantic meaninig in a structurally cnaonical way.
For example, one could consider a simple list as a set data structure
[1,3,1,3] == [1,3]
False
set([1,3,1,3]) == set([1,3])
True
def canon_set(l):
return tuple(sorted(set(l)))
canon_set([1,3,1,3]) == canon_set([1,3])
True
Likewise for association lists as representations of mappings
[(1,2), (1,2), (2,3)] == [(1,2), (2,3)]
False
dict([(1,2), (1,2), (2,3)]) == dict([(1,2), (2,3)])
True
def canon_dict(d):
if isinstance(d,dict):
d1 = d
else:
d1 = dict(d)
assert all(v == d1[k] for k,v in d), "Not a valid mapping"
return tuple(sorted((k,v) for k,v in d1.items()))
canon_dict([(1,2), (1,2), (2,3)]) == canon_dict([(1,2), (2,3)])
True
A union find is mostly a canonized dict but we also need to make sure that we fully rebuild / compress the uf and have a deterministic root selection method. Tie breaking by using the minimum or maximum element works
def find(uf,x):
uf1 = dict(uf)
while x in uf1:
x = uf1[x]
return x
def canon_uf(uf):
uf1 = dict(uf)
https://peps.python.org/pep-0814/ frozendict pep
It’s still non canonical in the sense that the eids are given out in a non det way. That’s maybe why levelled union find is interested
@dataclass(frozen=True)
class CanonUF():
parents : tuple[int]
from frozendict import frozendict
from dataclasses import dataclass
@dataclass(frozen=True)
class CanonUF():
parents : frozendict
def find(self, x):
while x in self.parents:
x = self.parents[x]
return x
def union(self, x, y):
x,y = self.find(x), self.find(y)
if x == y:
return self
else:
if x < y:
x,y = y,x
res = []
for k,v in self.parents:
if
def find(z):
z1 = self.find(z)
if z1 == x:
return y
else:
return z
CanonUF(frozendict(for k,v in ))
from frozendict import frozendict
@dataclass(frozen=True)
class CanonUF():
parents : frozendict
def __init__(self, uf):
min_elem = {}
for k in uf:
min_elem[uf.find(k)] = min(min_d.get(uf.find(k), k), k)
parents = {}
for k in uf:
parents[k] = min_elem[uf.find(k)]
self.parents = frozendict(parents)
def find(self, x):
while x in self.parents:
x = self.parents[x]
return x
class CanonUF():
parents : tuple[tuple[object,object], ...]
def __init__(self, parents):
@staticmethod
def empty():
return CanonUF(parents=())
def union(self, x, y):
return CanonUF(parents=self.parents + ((x,y),))
def find(self, x):
uf1 = dict(self.parents) # could cache this. Or cache dict, and hash, but don't store.
while x in uf1:
x = uf1[x]
return x
def __eq__(self, other):
return self.parents == other.parents
def __hash__(self):
return hash(self.parents)
Even if we fully expand without keeping base layer out separate, we still need the union of the key uf and the
Could do it by search? yuck. Huh. So this whole thing falls apart. You need to scan to union. Huh. And scan both?
We could make determinsitic roots (min method) and then hash combine the roots
([roots], frozenset(ids)) no, clashes a lot.
Context by Levels
Using min ordering on int ids, we could consider every single eid to have a level equal to it’s eid. But i we consider object style union find instead of int arena style, it is not clear that we can always minimum compare different objects. The level has semantic meaning.
Comparing lexicographically level then eid evokes well orderings / total orderings. Perhaps levels could be extended to to multilevels or even eids that are nested multisets using multiset ordering. Then there is always space to place stuff. We could also use fractions as eids
If I write let identity x = x a type inference system should infer this as something like identity : forall 'a, 'a -> 'a
A general strategy of type inference is to traverse the term and generate constraints to be solved about the types of the subterms.
In the classic Hindley-Milner system, there is something called the let-generalization rule that when you enter and exit, it is a good time to actually do a little mini solve
Remy
https://dl.acm.org/doi/10.1145/3729338 Practical Type Inference with Levels
Like many things, this let-generalization is a choice not a god given thing https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tldi10-vytiniotis.pdf Let Should not be Generalised
https://okmij.org/ftp/ML/generalization.html How OCaml type checker works – or what polymorphism and garbage collection have in common
a structured eid is (level, int) pairs telling us which union find to go into.
UF(uf=[2, 2, 2, 3, 4, 5, 6, 7, 8, 9])
Does the level uf work but DiffUF doesn’t? eid belonging to levels fixes the issue before where we need to loop find?
Hmm. what if genericity lived at the Tree level instead of in TypeVar…
Hmm. Level UF doesn’t get non interference at all really. I guess it inherits from upper levels, and we can delete lower levels someday and only retain their implicit influence.
Is a level union find easier to backtrack? No not really.
type EId = tuple[int,int] # level, index
@dataclass
class LevelUF():
levels : list[list[EId]] # use dict union find because probably sparse
def __init__(self):
self.levels = [[]]
def makeset(self, level=None) -> EId:
if level is not None:
while len(self.levels) <= level:
self.make_level()
else:
level = len(self.levels) - 1
uf = self.levels[level]
eid = (level, len(uf))
uf.append(eid)
return eid
def make_level(self) -> int:
self.levels.append([])
return len(self.levels) - 1
def find(self, x : EId): # level cutoff?
level, key = x
while True:
level, key = x
y = self.levels[level][key]
if x == y:
return x
else:
x = y
def union(self, x : EId, y : EId):
x,y = self.find(x), self.find(y)
if x != y:
if x[0] <= y[0]:
x,y = y,x
self.levels[x[0]][x[1]] = y
return y
uf = LevelUF()
x,y,z = uf.makeset(0), uf.makeset(0), uf.makeset(1)
#uf.union(x,y)
uf.union(x,z)
uf.union(y,z)
uf
LevelUF(levels=[[(0, 1), (0, 1)], [(0, 0)]])
Interspersing leveled atom intern tables. atoms : list[list[str]] # a / 1 fresh consts. in scope
forall atoms0, exists eid0, forall atoms1, exists eid2, /\ /\ a = e /\ e1 = e2 /\ as a question. If we skolemize it becomes unification-ish? levels counting up like de bruijn?
uf = LevelUF()
x,w = [uf.makeset() for _ in range(2)]
y,z = [uf.makeset(1) for _ in range(2)]
uf.union(x,y)
assert uf.find(x) == uf.find(y)
uf.union(z,w)
assert uf.find(z) == uf.find(w)
uf.union(y,z)
assert uf.find(x) == uf.find(w)
uf
LevelUF(levels=[[(0, 0), (0, 0)], [(0, 0), (0, 1)]])
class LevelUF():
base : list[int]
levels : list[dict[int, int]] # use dict union find because probably sparse
def makeset(self):
eid = len(self.base)
self.base.append(eid)
return eid
def find(self, level, x):
while level >= 0:
uf = self.levels[level]
while x in uf:
x = uf[x]
level -= 1
while self.base[x] != x:
x = self.base[x]
return x
def union(self, levelx, x, levely, y): # two levels?
if levelx < levely: # ensure levelx is the higher level
levelx, x, levely, y = levely, y, levelx, x
x,y = self.find(levelx, x), self.find(levely, y)
if x != y:
if levelx == -1:
self.base[x] = y
else:
self.levels[levelx][x] = y
class QuantAltUF():
fresh : int
foralls : list[set]
exists : list[dict]
"""
Things happening in prefixes forall x, exists y, forall z, ...
exists y, forall x, exists z, z = y should point z to y, not vice versa.
"""
def makeset(self, level):
eid = self.fresh
self.fresh += 1
self.exists[level][eid] = eid
return eid
def gensym(self, level):
eid = self.fresh
self.fresh += 1
self.foralls[level].add(eid)
return eid
This is kind of prefix normal form coded. What about telescope?
https://leanprover-community.github.io/lean4-metaprogramming-book/main/04_metam.html#metavariable-depth The idea that you can only union lower vars to higher vars
If you took the “primitive union find” That has rigid atoms p,q,r and unionable atoms a,b,c The rigid have to win. If you stratify this into layers of interspersing p,q,r and a,b,c, then you get
Hmm. But then when I call makeset, the variable needs to belong to that level… so i guess this is a different concept
let generalization and levels in hindley milner inference.
A persistent union find retains all old versions of the union find
a colored / linked / refiniing / hiearchical union find has multiple refining union finds hanging around. Unions higher in the herarchy are inherited by ufs lower in the hierarchy.
A contextual union find is like this with the added feature that the ufs are identified with contexts. When contexts become equivalent, the appropriate unuion finds should become equivalent. When a context is subsumed,it should inherit the equality infromation of the subsumee.
Subsumption is a better mechanism than lattices for contexts. While there is a lattice, it’s a weird one. Subsumption is more of a direct translation
Equating two nodes in a colored union find makes a new node? Multiparent. Maybe you have to ping pong until convergence. I think the contracting nature of the maps means this’ll work. You only have to ping pong up to common ancestor. Equating two color nodes goes into a meta-union find or color nodes? You need to ping pong through the entire tree, or normalize everything in the root node. a meta meta union find? Really you can assert node1 <= node2.
class DiffUF(): ufs: list[UF] self.duf
class DiffUF(): parent | self.uf, self.duf # either points to equivalent parent UF or actually has the data and pointer to inehritance uf.
class DiffUF():
def __init__(self, uf : UF):
self.uf = uf
self.duf = {}
def makeset(self):
return self.uf.makeset()
def find(self, x):
x = self.uf.find(x)
while x in self.duf:
x = self.duf[x]
return x
def union(self, x, y):
x,y = self.find(x), self.find(y)
if x != y:
self.duf[x] = y
return y
def __hash__(self):
return hash(tuple(sorted(self.duf.items())))
def __le__(self, other):
if len(self.duf) > len(other.duf): # fast check
return False
return all(other.find(k) == other.find(v) for k,v in self.duf.items())
def __eq__(self, other):
return len(self.duf) == len(other.duf) and self <= other # ? Right?
def __or__(self, other): # over same base
"""
Union find join. Finest partition greater than both.
"""
assert id(self.uf) == id(other.uf)
new = DiffUF(self.uf)
new.duf = self.duf.copy() # or pick biggest one
for k,v in other.duf.items():
new.union(k,v)
return new
def __and__(self, other): # over same base
"""union find meet. Coarsest partition less than both."""
assert id(self.uf) == id(other.uf)
new = DiffUF(self.uf)
for k,v in self.duf.items():
if other.find(k) == other.find(v):
new.union(k,v)
return new
def rebuild(self): # canon
old_duf = {}
self.duf = {}
self.uf.rebuild()
for k,v in old_duf.items():
self.union(k,v)
https://usr.lmf.cnrs.fr/~jcf/publis/puf-wml07.pdf A Persistent Union-Find Data Structure
Proof producing
- Tie braking
- Path compress or no
- edge storage, attributed storage
- Pointers, array, or dict
- Lazy vs eager
A purely function union find using purely functional distionaries
Union finds solve connectivity in graphs. Proof producing UF stores a spanning tree also so that it can return a path when you ask for it.
Layered union finds are like layered theories
M1 |= M2 |= M3 |= M4 ...
A canonizer uf would want to update all the children to be updated too.
@dataclass
class EagerUF():
uf : list[int] = field(default_factory=list)
def makeset(self):
eid = len(self.uf)
self.uf.append(eid)
return eid
def find(self, x):
return self.uf[x]
def union(self, x, y):
x,y = self.find(x), self.find(y)
if x != y:
self.uf[x] = y
self.rebuild()
return y
def rebuild(self):
for i in range(len(self.uf)):
self.uf[i] = self.find(i)
class FrozenDict()
You don’t have to structurally canonicalize to implement a correct hash. But you kind of might as well
@dataclass(frozen=True)
class CanonUF():
uf : tuple[int,...]
@classmethod
def create(cls):
return CanonUF(())
def makeset(self):
eid = len(self.uf)
self.uf.append(eid)
return CanonUF(self.uf + (eid,))
def find(self, x):
return self.uf[x]
def union(self, x, y):
x,y = self.find(x), self.find(y)
if x != y:
self.uf[x] = y
self.rebuild() # rebuild immeidately
return y
def __hash__(self):
return hash(self.uf)
def __eq__(self, other):
def rebuild(self):
for i in range(len(self.uf)):
self.uf[i] = self.find(i)
class ContextUF():
base : UF
ctxs : dict[CanonUF, DiffUF] # ctx |- res
def __init__(self):
self.base = UF()
self.ctxs = {}
def rebuild(self):
newctx = {}
for ctx, uf in self.ctxs.items():
ctx1 = ctx.rebuild()
for ctx1, uf1 in self.ctxs.items():
if ctx <= ctx1:
uf1.merge(uf)
newctx[ctx.rebuild()] = uf.rebuild()
self.base.rebuild()
@dataclass
def UFUnion():
ufs : list[UF] = field(default_factory=list)
def find(self, x:object) -> object:
while True:
cur_x = x
for uf in self.ufs:
x = uf.find(x)
if x == cur_x:
return x
Persistent
https://github.com/MagicStack/immutables https://github.com/tobgu/pyrsistent https://discuss.python.org/t/pep-603-adding-a-frozenmap-type-to-collections/2318/219 frozendict https://pypi.org/project/frozendict/
! python3 -m pip install immutables
Collecting immutables
Downloading immutables-0.21-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.5 kB)
Downloading immutables-0.21-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (104 kB)
Installing collected packages: immutables
Successfully installed immutables-0.21
from immutables import Map
d = Map()
d1 = d.set(1,2)
for x in d1.items():
print(x)
(1, 2)
! python3 -m pip install pyrsistent
Collecting pyrsistent
Downloading pyrsistent-0.20.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (27 kB)
Downloading pyrsistent-0.20.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (122 kB)
Installing collected packages: pyrsistent
Successfully installed pyrsistent-0.20.0
from pyrsistent import pmap
d = pmap()
d1 = d.set(1,2)
for x in d1.items():
print(x)
(1, 2)
d = {}
d.set(1,2)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[16], line 2
1 d = {}
----> 2 d.set(1,2)
AttributeError: 'dict' object has no attribute 'set'
Bits and Bobbles
I called it base. Are fibers leaking into my thinking?
I have meet and join of uf. A heyting algerba if I can define an A => B? Probably the finest partition that when merged with A gives B or is less than B? C /\ A <= B
def diff(self, other):
partition refinement https://en.wikipedia.org/wiki/Partition_of_a_set
Primitive UF using Cells
from typing import Optional
class Cell():
parent : Optional["object | Cell"]
def __init__(self):
self.parent = None
def find(self):
x = self
while x.parent is not None and isinstance(x.parent, Cell):
x = x.parent
return x
def union(self, other:"Cell"):
x,y = self.find(), other.find()
if x == y:
return y
if x != y and isinstance(x, Cell):
x.parent = y
elif x != y and isinstance(y, Cell):
y.parent = x
elif x != y:
raise Exception("Cannot union two different non-cells")
def __eq__(self, other):
if isinstance(other, Cell):
y = other.find()
return self.find() is other.find()
def ground(self, value):
x = self.find()
x,y,z = Cell(), Cell(), Cell()
x.union(y)
assert x == y
assert x != z
y.union(z)
assert x == z
z.ground(42)
assert
old
Simple union find
compressing
colored union finds - use min to ping pong. label the union find. DAG hierarchy is fine.
union finds as converging functions
proof pdoucting union find
using dictionary vs using ids vs using pointers
The z3 egraph and doubly linked lists. If we want to retain down pointers it is abboying because there mightb e multiple children to one parent. But you can insert yourself into a doubly linked list via the dasncing link technique. Hmm. Maybe this is why z3 does it this way. For fast backtracking https://z3prover.github.io/papers/z3internals.html#sec-equality-and-uninterpreted-functions
from dataclasses import dataclass
class UF():
uf : list[int]
def find(self, x):
while x != self.uf[x]:
x = self.uf[x]
return self.uf[x]
def makeset(self):
self.uf.append(len(self.uf))
return len(self.uf) - 1
def union(self, x, y):
self.uf[self.find(x)] = self.find(y)
return self.find(y)
uf = UF()
a, b, c = uf.makeset()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[1], line 16
13 return self.find(y)
15 uf = UF()
---> 16 uf.ma
AttributeError: 'UF' object has no attribute 'ma'
from dataclasses import dataclass
@dataclass
class Cell():
name: str
id: int
parent: 'Cell'
def __init__(self, name="_"):
self.parent = self
self.id = id(self)
self.name = name
def find(self):
x = self
while x.parent is not x:
x = x.parent
return x
def union(self,y):
self.find().parent = y.find()
return y.find()
x = Cell("x")
y = Cell("y")
z = Cell("z")
print(x.parent)
x.union(y)
y.union(z)
print(x.find() == z.find())
print(x)
print(z)
Cell(name='x', id=136449433498864, parent=...)
True
Cell(name='x', id=136449433498864, parent=Cell(name='y', id=136449433498432, parent=Cell(name='z', id=136449433492432, parent=...)))
Cell(name='z', id=136449433492432, parent=...)
reasons = []
trace = []
def union_reason(x, y, reason):
reasons[find(x)] = (x, y, reason)
trace.append((x, y, reason)) # this is sufficient. we don't need to store find(x) and find(y)
uf[find(x)] = find(y)
return find(y)
def explain(x,y):
Visualizing as a graph. The union find is part of kruskal’s algorithm https://en.wikipedia.org/wiki/Kruskal%27s_algorithm
So for example if you had a bunch of equalities and you know how painful each one was to get, you could devise a minimum spanning tree for that.
Term rewriting as a graph.
Secret congruence edges
import networkx as nx
# random graph with multiple components
G = nx.Graph()
#G.add_nodes_from(range(10))
# color the edges in the union find
# vectorized normalization.
# For egraph purposes, not being fully normalized isn't really a problem.
import numpy as np
uf = np.arange(10)
uf[8] = 0
uf[0] = 4
def normstep(uf):
return uf[uf]
normstep(uf)
def step2(uf):
return uf[uf[uf]]
step2(uf)
array([4, 1, 2, 3, 4, 5, 6, 7, 4, 9])