Something I’ve been seeking for a while is a notion of a “refinement e-graph”.

What precisely I mean by the term is unclear, but I think there is definitely something here for multiple reasons (the pre-existing topics/techniques of birewriting and subtyping). This is an instance of what CF has called “terminology driven development” where one come’s up with a phrase for a thing that should exists and meditate upon what it means and how to build it.

The biggest motivator for such a thing is dealing with the phase ordering of rewrites that are not bidirectional (they aren’t actually oriented equations. They are for serious rewrites where it is okay to replace the left hand side with right anywhere, but not vice versa). I was at a table at PLDI 2022 where Zach was pitching to John Regehr using e-graphs in Alive or LLVM and he said the thing SMT solvers aren’t supplying are refinement from less defined expressions to more defined ones (also the need for arbitrary bitwidth solvers). The idea being that there is a perspective that the equality x * y / y = x is not correct. The issue is that y=0 has a possible divide by zero. Nevertheless, the definitions of undefinedness and correct compilation in compilers is negotiated such that the rule is allowed in one direction, the right hand side being well behaved in strictly more situations than the left hand side. So this is a unidirectional rewrite. Likewise for bit shifting by constants larger than the bitwidth, and integer overflow. There are a couple examples like this that really have nothing to do with state/memory, are basically pure dataflow, but have some funkiness with regards to undefinedness.

Also Zach asked my in the QA section of my talk last year https://www.philipzucker.com/egraph2024_talk_done/ whether the ground Knuth Bendix view of the egraph (which has directional edges between the enodes in the eclass) has any bearing on this question.

The year before at Oliver and Yihong’s talk about Tree automata views of the e-graph, the same question came up.

As a baby step to support this concept, we can discuss a union find that supports inequality assertions and queries in addition to the equational ones. I think slotting in union find variations is a decent recipe for building egraph variations.

The inequality union find is interesting in it’s own right for program analyses and I suspect solving subtyping constraints.

Inequality Union Find

The basic idea of an inequality union find is brutally simple once you ask the question, and I’m not sure there is anything super fancy/clever you’ll ever be able to do to improve on the basic version of it. You’re going to pay super linear cost for dealing with the inequalities. I don’t think there is a free lunch here. The question is more so how to avoid materialization and how to make an elegant low level data structure (which I am only partially addressing since I’m sticking to python).

I’ll note that it is really important to realize when there is no free lunch so you can just buckle down to do the obvious, hacky, but not so satisfying thing. That’s why it matters to point out something is undecidable or np-complete or super linear. It’s freeing. That was also the insight here https://www.philipzucker.com/canon_search/ that once you realize a canonization problem is equivalent to graph canonization, just give up on being search-free.

A good inequality union find should elegantly degrade in performance compared to a regular union find. If you don’t use inequalities much, the thing should be basically the same as a regular union find.

Basically, you want to maintain in addition to the parents of the union find, upper and lower sets of bounds. There is an interesting opportunity for information propagation when you assert both x <= y and y <= x at which point it may make sense to union anything squished between the two (in the connected component with those nodes).

With the upper and lower sets, you have at least 2 possible design choices. Either make them grow as much as possible to cache/materialize the transitive closure of the <= relation, or make them as thin as possible to avoid duplicating information (in other words if a <= b and b <= c do not bother directly storing a <= c and instead compute it on demand by DFS).

A different axis of design is whether to require that you never miss a <= once you’ve learned it, or whether it is ok to require an amortized rebuild before you can be sure all <= are discovered, or even if sometimes not registering the <= is ok if it’s too expensive to. One way of doing this last point would be to restrict an id’s upper set to some fixed size like 3 or stop the transitivity DFS once it takes more than 10ns or something. These are pretty ad hoc restrictions, but so be it. By the general saturation paradigm / nonacceptance of negation as failure, failing to confirm <= is not the same as concluding > or not <=.

As I was looking through my notes, I saw that a “poset union find” from the Go compiler had come across my feed 6 months ago. So the concept is not unknown, although where is a real reference on it?

https://x.com/abhi9u/status/1826527029858103501 Posets in Go. https://mastodon.social/@cfbolz/113017389946004834 https://github.com/golang/go/blob/master/src/cmd/compile/internal/ssa/poset.go#L100 . CF suggests this may have something to do with the Pentagon domain

Here is a version of this.

from dataclasses import dataclass
from collections import defaultdict
@dataclass
class LEFind():
    parents : dict
    upper : defaultdict(set)
    lower : defaultdict(set)
    def __init__(self):
        self.parents = {}
        self.upper = defaultdict(set)
        self.lower = defaultdict(set)
    def assert_le(self, x, y): # assert to LEFind that x <= y
        x,y = self.find(x), self.find(y)
        if x == y:
            return
        self.upper[x].add(y)
        self.lower[y].add(x)
        if self.is_le(y, x): # propagate antisymmettry x <= y and y <= x implies x == y
            self.union(x, y)
            for z in self.le_set(x) & self.ge_set(y): # anything between the two is squeezed
                self.union(z, y)
            for z in self.le_set(y) & self.ge_set(x): # anything between the two is squeezed. Is this redundant?
                self.union(z, x)
    def assert_ge(self, x, y): # assert to LEFind that x >= y
        self.assert_le(y, x)
    def union(self, x, y): # assert that x == y
        x, y = self.find(x), self.find(y)
        if x != y:
            self.parents[x] = y # refular union find
            self.upper[y].update(self.upper[x]) # merge upper sets
            self.lower[y].update(self.lower[x]) # merge lower sets
    def find(self, x : int) -> int:
        while x in self.parents:
            x = self.parents[x]
        return x
    # The next 3 functions are very similar. is_le can early stop when it hits y.
    def is_le(self, x, y) -> bool:
        # DFS search for y in upper set of x
        x,y = self.find(x), self.find(y)
        if x == y:
            return True
        todo = [x]
        seen = set(todo)
        while todo:
            x = todo.pop() # invariant is that x is already representative
            for z in self.upper[x]:
                # Is there a way to use lower set for pruning?
                z = self.find(z)   # compression could be updating z in place in upper[x]
                if z == y:
                    return True
                elif z not in seen:
                    seen.add(z)
                    todo.append(z)
        return False
    def le_set(self, x) -> set[int]: # all solutions to x <= ?
        x = self.find(x)
        todo = [x]
        seen = set(todo)
        while todo:
            x = todo.pop()
            for z in self.upper[x]:
                z = self.find(z)
                if z not in seen:
                    seen.add(z)
                    todo.append(z)
        return seen
    def ge_set(self, x) -> set[int]: # all solutions to x >= ?
        x = self.find(x)
        todo = [x]
        seen = set(todo)
        while todo:
            x = todo.pop()
            for z in self.lower[x]:
                z = self.find(z)
                if z not in seen:
                    seen.add(z)
                    todo.append(z)
        return seen
le = LEFind()
le.assert_le(1, 2)
le.assert_le(2, 3)
le.assert_le(3, 4)
assert le.is_le(1,3)
assert le.le_set(1) == {1,2,3,4}
assert le.ge_set(2) == {1,2}
le.assert_le(3, 1)
le

The Go version I believe is using something akin to the union nodes in the aegraph to store the upper and lower sets as DAGs in an arena (like unodes here https://www.philipzucker.com/smart_constructor_aegraph/ ). This is probably a better low level implementation but using python sets makes what is going on more clear.

An objection to using an inequality union find in the context of e-graphs / egglog is that datalog is already the perfect transitive closure engine isn’t it?

Maybe, but there are a couple arguments against this perspective

  • This only supports the fully materialized version of transitive closure
  • Egglog is fully expressive (turing complete). You can encode literally anything into it. For really deep encodings, it’s obvious that the encoding sucks and will immediately choke even if correct. It is the analog of saying that you can encode all of math to ZFC or peano arithmetic via many coding tricks, but from an automation perspective, this is nearly useless. Transitive closure encoded to rules is a mild / pretty shallow encoding, so you are only mildly screwed by using it.
  • This is part of a general paradigm of treating concepts as macro rewrites / encodings on egglog programs. I feel this approach is hitting a wall. Which is why I’m exploring ways of doing egraph’s modulo theories. I find having a runtime mechanism more satisfying in many respects to macro encoding something away

An idea that I think was pretty interesting in this direction though is adding language level syntactic support to datalog for “mediation” by user defined relations. The pattern Z = foo(bar(X)) considered as a datalog query flattens to Z = W, foo(Y,W), bar(X,Y). A nice normal form of datalog doesn’t use repeated variables, but instead makes them all unique, but related by =. Z = W, foo(Y1,W), bar(X,Y), Y = Y1. A user defined mediation would allow you to replace = with a different relation like eq(Z,W), foo(Y1,W), bar(X,Y), eq(Y,Y1) or in the refinement matching case a user defined inequality relation le(Z,W), foo(Y1,W), bar(X,Y), le(Y,Y1). This mediation idea was tossed around during the development of egglog especially when we were still attempting encodings into Souffle and other systems. Basically the explicit eq form was how such an encoding works.

By the usual dictionary of concepts, the inequality union find should correspond to a ground atomic theory of equality and inequality statements like a = b /\ c <= d /\ a = c /\ p = q. “Solving” this system corresponds to running a generalized knuth bendix, saturation of ground knuth bendix + ground ordered chaining. This is overwrought in terms of conceptual complication though.

Comments on Refinement E-Graphs and Connections to Algebraic Subtyping

A new angle that has only come on my radar recently is that of subtyping ,which I became interested in for completely different reasons related to doing semantical refinement typing in Knuckledragger. In algebraic subtyping https://www.cs.tufts.edu/~nr/cs257/archive/stephen-dolan/thesis.pdf , Stephen Dolan speaks of the concepts of bisubstitutions and biunification. A significant distinction is made between terms and variables appearing in positive and negative / covariant and contravariant positions. This is a distinction that is not made or apparent in the equational counterpart. A good challenge problem for a refinement egraph is implement Hindley Milner with subtyping akin to Yihong’s implementation of Hindley Milner. https://egraphs-good.github.io/egglog-demo/?example=typeinfer

On a seemingly parallel but disjoint track is work in automated reasoning onm Ordered Chaining https://dl.acm.org/doi/pdf/10.1145/293347.293352 (Bachmair & Ganzinger) and Birewriting (Levy & Agusti) https://www.sciencedirect.com/science/article/pii/S074771719690053X in the 90s. See also Part 2 in this tutorial https://people.mpi-inf.mpg.de/alumni/ag2/2011/hg/index/index7.html (Ganzinger was goated). As usual with term rewriting, engaging with the literature here is pretty difficult. So far as I can tell, Stephen Dolan is not influenced or referencing this work, although they are clearly related to my eye. “Birewriting” is perfect counterpart terminology to “bisubstitution” and “biunification”

I think stating which arguments of functions are covariant/contravariant / monotonic / anti-monotonic is a key bit of extra data needed for refinement egraph rewriting.

Refinement egraph rewrite rules should look something like this foo(X+) <= bar(X+). One seeks a bisubstitution of the pattern variables foo(X) and a term t in the egraph such that t <= bisubst[foo(X+)] from which one can assert the inequality t <= bisubst[bar(X+)] into the refinement e-graph. This is a more complicated operation one every step than equational matching and it seems that even bottom-up e-matching become non-determinstic. The reason is that at every point while building your term, you may transition into your upper set in order to build t. t was a function of the pattern and substitution in the equational case of bottom up e-matching but this is no longer the case. The ability of the LEFind to iterate over all id’s larger than a given one is something you need to do this.

We could call this matching problem either “bi-matching” or “le-matching” (in reference to “e-matching”). It is in the general arena of e-matching, requiring search through all “larger” or “smaller” terms depending on whether you are in a contravariant or covariant position in the pattern.

Contextual Union Finds vs Inequality Union Finds

A different thing to talk about is a contextual e-graph or a contextual/colored union find. I believe this is a distinct thing from the refinement e-graph / inequality union find, but they are similar in some respects.

A contextual egraph https://www.philipzucker.com/egraph-mt2/ lets you make equality assertions under hypothetical situations like y != 0 |- x * y / y = x. The more constrained hypotheses inherit the equalities from the less constrained contexts and there is a partial ordering to contexts. x > 42 ==> x > 0 or x != 0 /\ y != 0 ===> y != 0.

Whereas the refinement egraph doesn’t make you say when or why something is <=. It just gives you the ability to make unidirectional rewrite assertions.

They do seem similar in that they are both reasonable ways to model the more subtly constrained rewrites that may be desirable in a compiler.

Disequality

If you add “forbid” lists, you can also support disequality != https://dl.acm.org/doi/10.1145/3704913 , which feels like a natural rounding out of capabilities. It is interesting to note that there is almost always counterparts to generalized union finds, hash conses, and egraphs. Take your favorite weird egraph, only let terms be atomic, you’ve got a union find. Take you favorite union find twist, toss on a hash cons, you’ve got an interesting egraph. Take an egraph, don’t ever call union, you’ve got a hash cons.

The paper goes into quite some detail comparing baking in disequality to the union find vs an encoding approach. They seem to think it’s worth baking disequality in.

Disequality is pretty much just an example of the “union find dict” https://www.philipzucker.com/union-find-dict/ or something people call Abelian Semigroup annotated union finds, which merge values according to some binary operation when their id’s are unioned. The forbid sets are unioned. This is also related to Lattice analyses from egg1.

from dataclasses import dataclass
from collections import defaultdict
@dataclass
class NEFind():
    parents : dict
    forbid : defaultdict(set)  # forbids are unioned
    def __init__(self):
        self.parents = {}
        self.forbid = defaultdict(set)
    def union(self, x, y): # assert that x == y
        x, y = self.find(x), self.find(y)
        if x != y:
            self.parents[x] = y # refular union find
            self.forbid[y].update(self.forbid[x]) # merge forbid sets
    def assert_ne(self, x, y): # assert to NEFind that x != y
        x,y = self.find(x), self.find(y)
        if x == y:
            raise ValueError(f"Cannot assert {x} != {y}, they are already equal")
        self.forbid[x].add(y)
        self.forbid[y].add(x)
    def find(self, x : int) -> int:
        while x in self.parents:
            x = self.parents[x]
        return x
    def is_ne(self, x, y) -> bool:
        x,y = self.find(x), self.find(y)
        if x == y:
            return False
        for z in self.forbid[x]:
            if self.find(z) == y:
                return True
        else:
            return False

I specifically chose to model nonstrict inequality <= in the LEFind because I feel like that meshes best with the general monotonic flavor of learning positive information in e-graphs. Learning x <= y does not offer an opinion on whether x = y might eventually be true. It is impossible to reach an inconsistent state using only <= and = assertions. If you add disequality != or strict inequality < , it does become possible to reach an inconsistent state.

Bits and Bobbles

The brute force SMT e-graph can deal with inequalities by using Z3 special relations for transitive closure.

Set constraints maybe might have another view on this topic? https://en.wikipedia.org/wiki/Set_constraint

Finitely presented lattices. https://www.ams.org/journals/proc/1979-077-02/S0002-9939-1979-0542080-8/S0002-9939-1979-0542080-8.pdf Word problem is solvable. A classic trick of lattices is that x <= y can be encoded as an equational statement x = join(x,y). This would fit the general paradigm of e-graphs modulo theories via generalized union finds I presented https://arxiv.org/abs/2504.14340 . Finitely presented lattices is evocative of Dolan’s discussions in the first couple chapters of his thesis.

Type inference is an interesting alternative universe to logic programming. There are similarities between the two fields and historically they are both rooted in the 70s and the advent of unification. Retconning, Prolog is about proof search and indeed typing judgements are things that are proven. Typing systems / type inference implementations have some interesting extra little features like levels and the stuff that makes up algebraic subtyping. More things discovered in type inference should be ported to logic programming. Could we model computation as type inference?

Other Union find variations

  • Group union find
  • semigroup union find / lattice / uf dict. Egglog as a data structure

Retaining enumeration of eclasses. “Dancing links” trick. Union nodes. https://mastodon.gamedev.place/@harold/114599334531897851 “I’ve probably mentioned this before but a neat thing to add to union-find is another array of integers that is also initialized to 0,1,2… but you update it differently than the array of parents: for the union, when the roots aren’t the same, swap the corresponding entries of the array. That gives you a bunch of cyclic linked lists that let you enumerate any given set in the UF without scanning the whole UF, and you still have a cheap union (unlike maintaining explicit lists)”

https://bernsteinbear.com/union-find/

disequality union finds

  • I think the different styles of implementing union find via refcells vs arenas vs dictionaries is interesting. I feel like many things I see focus on the refcell version whereas I almost exclusively use the arena / dictionary version.
  • The proof producing union find is interesting. If a union find is for quickly checking connected components in a graph, the proof producing union find retains enough to give a spanning tree of those component
  • We were discussing earlier that you can have information keyed on equivalence classes that has a merge (add, min, max, others) function to call when two classes merge. Now that CF mentions it, this is evocative of attributed variables https://www.metalevel.at/prolog/attributedvariables (which is what you were referring to?)
  • I believe it’s possible to have a hierarchy of union finds where ufs lower in the hierarchy inherit unions from higher in the hierarchy. I think this is what this paper is getting at https://arxiv.org/pdf/2305.19203 but I’m interpolating a lot there. I have some speculation this might be a way to look at levels as they appear in HM inference. When you enter a let binding, allocate a new UF that inherits from above but doesn’t effect the UFs above. When you leave the let, scan and destroy this new UF to know what to generalize. Again, highly speculative You can annotate a union find with things that aren’t equal also https://dl.acm.org/doi/10.1145/3704913 in “forbid lists”. Merged classes accumulate forbid lists, so it’s similar to the annotated information idea0
  • Most variations on e-graph stuff if you strip out the term-y bits are about some twist on union finds https://z3prover.github.io/papers/z3internals.html#sec-equality-and-uninterpreted-functions This also describes a method of splicing in a linked list to enumerate equivalence classes. I like Harold’s description better. Chris’ aegraph has union nodes for a related purpose. I think of those as storing “right side up” the tree that the union find is the upside down version of.Philip Zucker: I sometimes muse about tie breaking unions not on rank or whatever but by other criteria. Again, in the aegraph I believe it used a deterministic method (oldest or newest?). It is also interesting to force union(x,y) to make y the parent and x the child always. This gives programmer control which is kind of interesting. Maybe firing on false beauty, but there’s something intriguing that https://en.wikipedia.org/wiki/Partition_refinement is dual to union finds in some sense. Union finds start all disjoint by assumption and merges. Partition refinement starts all merged by assumption and splits. This has never ended up being a useful observation to me.
  • Here’s another one. I’m contrary, so I often just don’t path compress or instead I do full eager compression of union finds in a “rebuild” pass, in both cases completely ruining the ackermann whatever. If you combine eager compression with a deterministic parent tie breaking (min for example), you can put a union find into a canonical form and then hash cons it for first class union finds.
  • Groupoid lesbre/lemerre. Fruwirth.

Rem’s algorithms

Junk

See also subtyping

It is mentioned that the “point” or birewriting or ordered chaining was for set containment. WHich does jive with similar stuff appearing in subtyping.

Chaining - bake in transitivity into an inference rule rather than as an axiom.

tSu  vSw
--------------------
    sig[t] S sig[w]

where sig = unifier of u=v

Ordered chaining https://people.mpi-inf.mpg.de/alumni/ag2/2011/hg/index/index7.html birewriting

https://www.sciencedirect.com/science/article/pii/S157106610400043X Bi-rewriting Rewriting Logic W.Marco Schorlemmer

rule1. rule2.

:- rule1, subclause, subclase

What is we used ASP style guard atoms on rules and an ordering on these corresponding to the clause ordering in prolog file to makes sure we get the first prolog counterexample?

Ordered resolution picks a model preference ordering. Ordered resolution turns the undirected clauses into directed logic programming rules. Akin to knuth bendix turns unmdirected equations into directed rewrite rules

Chaining Slagle 72

https://dl.acm.org/doi/pdf/10.1145/293347.293352 Ordered Chaining Calculi for First-Order Theories of Transitive Relations

Hmm. I’m kind of triggering on dependency pairs…?

part2. “solving inequations via narrowing like procedure” E-biunification ? = neagtive chaining

In the sense that union find = ground knuth bendix, I’d think that le union find would be ground atomic ordered chaining.

x < y a rewrite proof rewrites x up and y down. May still require search though?

Bisubstitution and Biunification https://www.cs.tufts.edu/~nr/cs257/archive/stephen-dolan/thesis.pdf birewriting or ordered chaining is not in the references.

I have vague memories of considering something that felt evocative of trying to maintain only one upper bound or something? When you unioned

A refinement rules looks like Pattern <= rhs Then for any term we can come up with such that t <= bisubst[pattern]

we can determine that t <= bisubst[rhs]

pattern matching is t ?= pat –> [subst]

or it may recurse into subterms?

t <=? pat –> [bisubst]

fuinction symbols are marked as coviaraint, contravariant or invariant argument by arguments. Correspnoding to different forms of monotonicity x <= y ==> f(x) <= f(y) x <= y ==> f(x) >= f(y)

_ <= _ is itself contravairant and covariant in some sense on it’s slots. In the sense I can replace

https://www.wolframscience.com/metamathematics/beyond-substitution-cosubstitution-and-bisubstitution/ A totally unrealted notion of bisubstition?

refinement egraph

see also subtyping.

https://microsoft.github.io/z3guide/docs/theories/Special%20Relations/ What about having

refines() == TransitiveClosure(refine0)

def refines(self, t1, t2): s.add(refine0())

def is_refinement(self, t1, t2): with self.solver: s.add(smt.Not(trans_refine(t1,t2))) res = s.check() return res == smt.unsat

You need to be working in uninterpreted domains with undefined elements. Or in option monad?

doubling function symbols. f0(x0) == f1(x1) means that f(x) is defined

You can’t really ask if a value/IntSort is defined or not. Makes no sense. You can ask if it must be correlated in two worlds.

squares

def defined(t): return t == twin(t)

Is this good enough? defined(t) and defined(t2) => t = t2

defined(x), defined(y), y != 0 => div(x,y) == div0(x0,y0)

Hmm. Other exAMPLEWAS

Is this a way of modelling a setoid? partial setoid? We get automatic congruence x = x0 -> f(x0) = f(x)

To assert x = x0 -> f(x0) = f1(x) aka forall x, f(x) == f0(x) asserts that the f palys nice with setoid. We automatically get synnettry and transitivity Nooooo…

No it’s more like a way of estabilishing a subset of terms without.

Maybe it’s a meta non-det. x == x0 f(0)(x) == f(1)(x)

R(x,y) -> R(y,z) -> R(x,z)

(x == x0) == (y == y0)

whereas R(x,y) ->

Inequality union find

https://github.com/golang/go/blob/master/src/cmd/compile/internal/ssa/poset.go#L100 Bitsets. That seems smart.

from dataclasses import dataclass
from collections import defaultdict
@dataclass
class LEFind():
    parents : dict
    upper : defaultdict(set)
    def __init__(self):
        self.parents = {}
        self.upper = defaultdict(set)
    def assert_le(self, x, y):
        x,y = self.find(x), self.find(y)
        if x == y:
            return
        elif self.is_le(y, x): # antisymmettry
            self.union(x, y)
            # union everything that is squashed between x and y
            for z in self.le_set(x): # actually could stop as soon as we are known above z?
                if self.is_le(z, y):
                    self.union(y, z)
            for z in self.le_set(y): # redundant?
                if self.is_le(z, x):
                    self.union(x, z)
        self.upper[x].add(y)
    def union(self, x, y):
        x, y = self.find(x), self.find(y)
        if x != y:
            self.upper[y].update(self.upper[x]) # merge upper sets
            self.parents[x] = y
    def find(self, x):
        while x in self.parents:
            x = self.parents[x]
        return x
    def is_le(self, x, y):
        # DFS search for y in upper set of x
        x,y = self.find(x), self.find(y)
        if x == y:
            return True
        todo = [x]
        seen = set(todo)
        while todo:
            x = todo.pop() # invariant is that x is already representative
            for z in self.upper[x]:
                z = self.find(z)   # compression could be updating z in place in upper[x]
                if z == y:
                    return True
                elif z not in seen:
                    seen.add(z)
                    todo.append(z)
        return False
    def le_set(self, x):
        x = self.find(x)
        todo = [x]
        seen = set(todo)
        while todo:
            x = todo.pop()
            for z in self.upper[x]:
                z = self.find(z)
                if z not in seen:
                    seen.add(z)
                    todo.append(z)
        return seen
le = LEFind()
le.assert_le(1, 2)
le.assert_le(2, 3)
le.assert_le(3, 4)
assert le.is_le(1,3)
le.assert_le(3, 1)
#le.find(1) == le.find(2)
le       
LEFind(parents={3: 1, 1: 2}, upper=defaultdict(<class 'set'>, {2: {2, 3, 4}, 1: {2, 4}, 3: {1, 4}, 4: set()}))

#le.find(1) == le.find(2)
LEFind(parents={3: 1, 2: 1}, upper=defaultdict(<class 'set'>, {1: {1, 2, 3, 4}, 2: {3}, 3: {1, 4}, 4: set()}), lower=defaultdict(<class 'set'>, {2: {1}, 3: {2}, 4: {3}, 1: {1, 2, 3}}))
from dataclasses import dataclass
from collections import defaultdict
@dataclass
class LEFind():
    parents : list[int]
    upper : list[tuple[int, int] | int] # either a pair of indices into upper or an uf index
    def __init__(self, n):
        self.parents = []   
        self.upper = [] # index into uppertree
        self.upper_tree = [] # a tree
    def makeset(self):
        n = len(self.parents)
        self.parents.append(n)
        self.upper.append(n)
        return n
    def find(self, x):
        while x != self.parents[x]:
            x = self.parents[x]
        return x
    def union(self, x, y):
        x = self.find(x)
        y = self.find(y)
        if x == y:
            return
        self.parents[x] = y
        self.upper.append((self.upper[x], self.upper[y]))
        self.upper[y] = len(self.upper) - 1
    def assert_le(self, x, y):
        x = self.find(x)
        y = self.find(y)
        if self.is_le(x, y):
            return
        else:
            self.upper.append((self.upper[x], self.upper[y]))
from dataclasses import dataclass
from collections import defaultdict
@dataclass
class LEFind():
    parents : dict
    upper : defaultdict(set)
    def __init__(self):
        self.parents = {}
        self.upper = defaultdict(set)
    def assert_le(self, x, y):
        x,y = self.find(x), self.find(y)
        if x == y:
            return
        elif self.is_le(y, x):
            self.union(x, y)
            # union everything that is squashed between x and y
            for z in self.le_set(x): # actually could stop as soon as we are known above z?
                if self.is_le(z, y):
                    self.union(y, z)
            for z in self.le_set(y): # redundant?
                if self.is_le(z, x):
                    self.union(x, z)
        elif not self.is_le(x, y): # avoid redundant additions.
            self.upper[x].add(y)
    def union(self, x, y):
        x, y = self.find(x), self.find(y)
        if x != y:
            self.upper[y].add(x)
            self.parents[x] = y
    def find(self, x):
        while x in self.parents:
            x = self.parents[x]
        return x
    def is_le(self, x, y):
        x,y = self.find(x), self.find(y)
        if x == y:
            return True
        todo = [x]
        seen = set(todo)
        while todo:
            x = todo.pop()
            for z in self.upper[x]: # we don't find on x
                if self.find(z) == y:
                    return True
                elif z not in seen: # crucially we don't find here.
                    seen.add(z)
                    todo.append(z)
        return False
    def le_set(self, x):
        x = self.find(x)
        todo = [x]
        seen = set(todo)
        while todo:
            x = todo.pop()
            for z in self.upper[x]: # we don't find on x
                if z not in seen: # crucially we don't find here.
                    seen.add(z)
                    todo.append(z)
        return {self.find(z) for z in seen}
le = LEFind()
le.assert_le(1, 2)
le.assert_le(2, 3)
assert le.is_le(1,3)
le.assert_le(3, 1)
le.find(1) == le.find(2)
le       
LEFind(parents={3: 1, 1: 2}, upper=defaultdict(<class 'set'>, {2: {1, 3}, 1: {2, 3}, 3: set()}))

non normal form of upper set. So that we follow non canonized keys to their upper sets. The keys of upper can be “stale” / noncanonized. Kind of like union nodes / splicing in lists

Nice thing about this is that the union find part is unchanged. mostly. Minus maintaining the eclass enumerator in union.

I think this can be implemented as a pointery thing. which is nice? upper can point to a tree? Yeah you can use a Union tree in an arena like thing if you like. That would be pretty slim and fast.

So 3 things Upper, lower, and union trees. a <= b b <= c

If you don’t expect ot state circular a <= b, b <= a stuff, then just don’t have the propagation thing. I would kind of anticipate this being true for refinement rewriting.

I think inequality is hopeless in terms of being as good as union finds. That is just how it be. But having a sense of how hoepful or hopeless a problem is gives you a sense of what kinds of solutions to look for. See for example canonization by search, where knowing you’re in the graph isomorphism regime tells you thewre has to be search involved.

The celverness is not in implementing it. It seems straigthforward. The cleverness is in naming an idea. A black box that holds online inequality queries

2 different goals.

Strict inequality feels like flirting with disequality. Dsieqaulity egraphs is a fair comparison though. We are bolting in this extra thing. Saying it’s impoortant even though it is encodable. I don’t think that diseq graphs fit into my EMT framework persay. The point of asserting diequalities is for guarded rules (like a != b, can’t alias). We aren’t seeking contradictions.

Comparing Contextaul vs refinement egraphs. Contextual says under what conditions something is true. Refinement egraphs just say that things are replaceable in a directional way. They are similar because contexts have an ordering.

Either we want to require rebuilding and not search recursively and allow not getting info sometimes until rebuilding. Or we want to minimize where things go in upper sets to avoid repetition by guarding

Reflecting Eq(x,y) as a statement was one of the motivations of egglog. A first class notion of equality / hypothetical equality.

if we have x <= y and y <= x, we probably want to propagate an x == y.

Yes, without disequality, it is impossible to assert a contradiction. That feels right.

The mutual recursion of assert_le and union is really confusing.

https://arxiv.org/abs/2002.00813 Faster Fully Dynamic Transitive Closure in Practice

from collections import defaultdict
from dataclasses import dataclass
@dataclass
class LEFindRebuild():
    parents : dict
    upper : defaultdict(set)
    def __init__(self):
        self.parents = {}
        self.upper = defaultdict(set)
    def assert_le(self, x, y):
        x, y = self.find(x), self.find(y)
        if x == y:
            return
        else:
            self.upper[x].add(y)
    def union(self, x, y):
        x, y = self.find(x), self.find(y)
        if x != y:
            self.parent[x] = y
            for z in self.le_set(x):
                self.assert_le(y, z)
    def find(self, x):
        while x in self.parents:
            x = self.parents[x]
        return x
    def is_le(self, x, y):
        return y in self.upper[x]
    def le_set(self, x):
        return self.upper[x]
    def rebuild(self):
        upper_old = self.upper
        self.upper = defaultdict(set)
        for x,ups in upper_old.items():
            x = self.find(x)
            for y in ups:
                self.upper[x].add(self.find(y))

le = LEFindRebuild()
le.assert_le(1, 2)
le.assert_le(2, 3)
le.rebuild()
le.rebuild()
le.le_set(1)
le
assert le.le_set(1) == {1, 2, 3}
assert le.le_set(2) == {2, 3}
assert le.le_set(3) == {3}
assert le.is_le(1,3)
assert not le.is_le(3,1)
le.union(1,3)

old refinement

E derived Rules R R rules. R rules express refinement. “More defined than” rewriting. Undef on int overflow can be rewritten to wraparound addition, but not vice versa

E, R, R<

R< can interreduce to a frontier. That lattice width thing https://en.wikipedia.org/wiki/Antichain non confluent. Irreparably.

WHen can I remove a rule from R< as redundant? Is R-simplify still what we want? We want the minimal frontier only? We can’t infer two things are comparable just because they have a critical pair. But maybe we should ask the question? Take critical pairs, ask oracle if they are ocmparable. If so, we can reduce, if not ok well then nothing. Oracle might be in the form of domain specific rewrite / ordering rules. Or for constantsc (semantic things) we can just run some function.

But if we use R-simplify for an element that has a non trivial frontier, we need to expand them all. This feels like it needs a closed world assumption.

R<, Rmin as two systems. critical pairs from Rmin become questions into R<.

Two orderings. Definedness and cost. [E + R<] , Rmin

Orient. E we can irent into an Rmin rule (some are unorientable. unfailing completion). R< we can’t orient, we can only push into Rmin if it’s definedness ordering matches term ordering.

{a -> b, b -> a} in R<, we can move it into E {a = b}. scc compression.

Another intuition says we want the closest frontier (edges, not paths), since the further stuff is implied by transitivity. {a -> c, a -> b, b -> c} ==> {a -> b, b -> c}

Kind of reminds me of hilbert basis being frontier edge. Is that meaningful?

Semi confluence. That one step confluent thing. We’re kind of trying to repair semi confluence. Strong conlfuence is what I meant.

Atomic ground Ground

Both should be decidable and fine under a good term ordering.

Egraph modulo theory style. replace / enhance union find with bellamn solver. Even building terms in nondet now. we can now choose to enumerate all terms above and all terms known below (comparable). Strucutred or layered eid. eq(lt)

GRS simplify edghe as refinement. Maybe if refinement is a total ordering?

If bottom can be replaced with anything (whatecver is most convenient), which seems to be the case in compilers, it’s non confluent by design. The point of the undefineness to to make compilation more optimized.

bottom -> 3 bottom -> 4 Different bottoms labelled by context maybe? bottom(ctx1) bottom(ctx2)

Can’t lose sight of this one. The Alive2 toy replication project was aimed at seeing

cvxlean. Relaxations of convex programming problems.

https://x.com/abhi9u/status/1826527029858103501 posets in lean. Using union find + DAG? Is this not hard actually?

https://mastodon.social/@cfbolz/113017389946004834 https://github.com/golang/go/blob/master/src/cmd/compile/internal/ssa/poset.go#L100

https://www.microsoft.com/en-us/research/wp-content/uploads/2009/01/pentagons.pdf is it a pentagon domain? Kind of jives https://microsoft.github.io/z3guide/docs/theories/Special%20Relations/

https://www.amazon.com/Introduction-Lattice-Computer-Science-Applications/dp/1118914376

a. the brute force ism yabe fine, question is how to make it ergonomic to use b. bellamn ford is probably fine / the canonical answer.

mark function symbols as monotone or ant monotone

patters. does t match p if t <= p or is t >= p. rewrites lhs <= rhs or lhs >= rhs is a variable binding saying t <= x or t >= x or t == x

pvael says tsort of scc. https://github.com/ruby/tsort TSort implements topological sorting using Tarjan’s algorithm for strongly connected components. TSort is designed to be able to be used with any object which can be interpreted as a directed graph.

If patterns have ordered semantics, even matching a ground pattern can be interesting… hmm. subpattern lhs -> rhs seeks substitition s x <= s@lhs implies x <= s@rhs

All function symbols are monotone?

Given some term ordering, return all terms less than that in databank.

All of this is writeable as regular rules though? And we’re not gonna do anything that special for the order.

Making monotonicity a rebuild rather than a rule is nice. We could do that same for songruence (souffle-egglog) but that stinks.

directed minimum spanning tree union find shows up in kruskal’s algorithm. It is a spanning tree https://www.cs.princeton.edu/courses/archive/spring13/cos528/ https://www.cs.princeton.edu/courses/archive/spring13/cos528/directed-mst-1.pdf https://en.wikipedia.org/wiki/Edmonds%27_algorithm spanning arborescence of minimum weight.

Maybe this is cheaper to maintain than a full shortest path.

You maintain a union find for things that are possibly comparable (connected through some chain of <= >=). That could prune a lot.

Design choices. Does R< or R<=. R< asserts a sort of inequality. Does R< edges get annotated by the conditions under which they are a struct refinement?

Choices of cnaonizaing R<.

  1. subsumed redunant edges to minimal hasse lattice
  2. insert all transitive edges
  3. have another simplification ordering < and insert only those edges in order. guass elim kind of.

path(x,z) :- edge(x,y), path(y,z), y > z. path is a kind of refinement ordering, > is a different simplification ordering. But insert path(x,z) only in Trying to bake in ordered resolution style thinking.

class EGraph():
    uf :  # R=
    refine : # R<
    def rebuild():

What if we maintain a strict less than graph and an eq graph. We can then prune less than… Basically seems like rules.

#import scipy.sparse
#import graph_tool
import networkx as nx

class Egraph():
    #ineq : nx.DiGraph
    #enodes : dict[ , int]
    def init(self):
        self.ineq = nx.DiGraph()
        self.enodes = {}
        self.mono_nodes = {} # monotonic enodes?
    def makeset(self):
        x = len(self.ineq)
        self.ineq.add_node(x)
        return x
    def union(self, x, y):
        self.ineq.add_edge(x,y)
        self.ineq.add_edge(y,x)
    def assert_le(self, x, y): #hmm. le. vs lt.
        self.ineq.add_edge(x,y)
    
    def rebuild(self):