## Applicative Bidirectional Programming and Automatic Differentiation

I got referred to an interesting paper by a comment of /u/syrak.

http://www2.sf.ecei.tohoku.ac.jp/~kztk/papers/kztk_jfp_am_2018.pdf

Applicative bidirectional programming (PDF), by Kazutaka Matsuda and Meng Wang

In it, they use a couple interesting tricks to make Lens programming more palatable. Lens often need to be be programmed in a point free style, which is rough, but by using some combinators, they are able to program lenses in a pointful style (with some pain points still left over). It is a really interesting, well-written paper. Lots ‘o’ Yoneda and laws. I’m not doing it justice. Check it out!

A while back I noted that reverse mode auto-differentiation has a type very similar to a lens and in fact you can build a working reverse mode automatic differentiation DSL out of lenses and lens-combinators. Many things about lenses, but not all, transfer over to automatic differentiation. The techniques of Matsuda and Wang do appear to transfer fairly well.

This is interesting to me for another reason. Their lift2 and unlift2 functions remind me very much of my recent approach to compiling to categories. The lift2 function is fanning a lens pair. This is basically what my FanOutput typeclass automated. unlift2 is building the input for a function function by supplying a tuple of projection lenses. This is what my BuildInput typeclass did. I think their style may extend many monoidal cartesian categories, not just lenses.

One can use the function b -> a in many of the situations one can use a in. You can do elegant things by making a Num typeclass of b -> a for example. This little fact seems to extend to other categories as well. By making a Num typeclass for Lens s a when a is a Num, we can use reasonable looking notation for arithmetic.

They spend some time discussing the necessity of a Poset typeclass. For actual lawful lenses, the dup implementation needs a way to recombine multiple adjustments to the same object. In the AD-lens case, dup takes care of this by adding together the differentials. This means that everywhere they needed an Eq typeclass, we can use a Num typeclass. There may be usefulness to building a wrapped type data NZ a = Zero | NonZero a like their Tag type to accelerate the many 0 values that may be propagating through the system.

Unfortunately, as is, the performance of this is abysmal. Maybe there is a way to fix it? Unlifting and lifting destroys a lot of sharing and often necessitates adding many redundant zeros. Why are you doing reverse mode differentiation unless you care about performance? Forward mode is simpler to implement. In the intended use case of Matsuda and Wang, they are working with actual lawful lenses, which have far less computational content than AD-lenses. Good lawful lenses should just be shuffling stuff around a little. Maybe one can hope GHC is insanely intelligent and can optimize these zeros away. One point in favor of that is that our differentiation is completely pure (no mutation). Nevertheless, I suspect it will not without help. Being careful and unlifting and lifting manually may also help. In principle, I think the Lensy approach could be pretty fast (since all it is is just packing together exactly what you need to differentiate into a data type), but how to make it fast while still being easily programmable? It is also nice that it is pretty simple to implement. It is the simplest method that I know of if you needed to port operable reverse mode differentiation to a new library (Massiv?) or another functional language (Futhark?). And a smart compiler really does have a shot at finding optimizations/fusions.

While I was at it, unrelated to the paper above, I think I made a working generic auto differentiable fold lens combinator. Pretty cool. mapAccumL is a hot tip.

For practical Haskell purposes, all of this is a little silly with the good Haskell AD packages around, the most prominent being

It is somewhat interesting to note the similarity of type forall s. Lens s appearing in the Matsuda and Wang approach to those those of the forall s. BVar s monad appearing in the backprop package. In this case I believe that the s type variable plays the same role it does in the ST monad, protecting a mutating Wengert tape state held in the monad, but I haven’t dug much into it. I don’t know enough about backprop to know what to make of this similarity.

The github repo with my playing around and stream of consciousness commentary is here

## A Touch of Topological Quantum Computation in Haskell Pt. II: Automating Drudgery

Last time we built the basic pieces we need to describe anyons in Haskell. Anyon models describe interesting physical systems where a set of particles (Tau and Id in our case) have certain splitting rules and peculiar quantum properties. The existence of anyons in a system are the core physics necessary to support topological quantum computation. In topological quantum computing, quantum gates are applied by braiding the anyons and measurements performed by fusing anyons together and seeing what particle comes out. Applying gates in this way has inherent error correcting properties.

The tree of particle production with particle labelled leaves picks a basis (think the collection $\{\hat{x}, \hat{y}, \hat{z}\}$ ) for the anyon quantum vector space. An individual basis vector (think $\hat{x}$ ) from this basis is specified by labelling the internal edges of the tree. We built a Haskell data type for a basic free vector space and functions for the basic R-moves for braiding two anyons and reassociating the tree into a new basis with F-moves. In addition, you can move around your focus within the tree by using the function lmap and rmap. The github repo with that and what follows below is here.

#### Pain Points

We’ve built the atomic operations we need, but they work very locally and are quite manual. You can apply many lmap and rmap to zoom in to the leaves you actually wish to braid, and you can manually perform all the F-moves necessary to bring nodes under the same parent, but it will be rather painful.

The standard paper-and-pencil graphical notation for anyons is really awesome. You get to draw little knotty squiggles to calculate. It does not feel as laborious. The human eye and hand are great at applying a sequence of reasonably optimal moves to untangle the diagram efficiently. Our eye can take the whole thing in and our hand can zip around anywhere.

To try and bridge this gap, we need to build functions that work in some reasonable way on the global anyon tree and that automate simple tasks.

#### A Couple Useful Functions

Our first useful operation is pullLeftLeaf. This operation will rearrange the tree using F-moves to get the leftmost leaf associated all the way to the root. The leftmost leaf will then have the root as a parent.

Because the tree structure is in the FibTree a b data type, we need the tuple tree type of the pulled tree. This is a slightly non-trivial type computation.

In order to do this, we’ll use a bit of typelevel programming. If this is strange and alarming stuff for you, don’t sweat it too much. I am not the most elegant user of these techniques, but I hope that alongside my prose description you can get the gist of what we’re going for.

(Sandy Maguire has a new book on typelevel programming in Haskell out. Good stuff. Support your fellow Haskeller and toss him some buckos.)

The resulting tree type b is an easily computable function of the starting tree type a. That is what the “functional dependency” notation | a -> b in the typeclass definition tells the compiler.

The first 4 instances are base cases. If you’re all the way at the leaf, you basically want to do nothing. pure is the function that injects the classical tree description into a quantum state vector with coefficient 1.

The meat is in the last instance. In the case that the tree type matches ((a,b),c), we recursively call PullLeftLeaf on (a,b) which returns a new result (a',b'). Because of the recursion, this a' is the leftmost leaf. We can then construct the return type by doing a single reassociation step. The notation ~ forces two types to unify. We can use this conceptually as an assignment statement at the type level. This is very useful for building intermediate names for large expressions, as assert statements to ensure the types are as expected, and also occasionally to force unification of previously unknown types. It’s an interesting operator for sure.

The recursion at the type level is completely reflected in the actual function definition. We focus on the piece (a,b) inside t by using lmap. We do a recursive call to pullLeftLeaf, and finally fmove' performs the final reassociation move. It is all rather verbose, but straightforward I hope.

You can also build a completely similar PullRightLeaf.

#### A Canonical Right Associated Basis

One common way of dealing with larger trees is to pick a canonical basis of fully right associated trees. The fully right associated tree is a list-like structure. Its uniformity makes it easier to work with.

By recursively applying pullLeftLeaf, we can fully right associate any tree.

This looks quite similar to the implementation of pullLeftLeaf. It doesn’t actually have much logic to it. We apply pullLeftLeaf, then we recursively apply rightAssoc in the right branch of the tree.

#### B-Moves: Braiding in the Right Associated Basis

Now we have the means to convert any structure to it’s right associated canonical basis. In this basis, one can apply braiding to neighboring anyons using B-moves, which can be derived from the braiding R-moves and F-moves.

The B-move applies one F-move so that the two neighboring leaves share a parent, uses the regular braiding R-move, then applies the inverse F-move to return back to the canonical basis. Similarly, bmove'  is the same thing except applies the under braiding braid' rather that the over braiding braid.

(Image Source : Preskill’s notes)

#### Indexing to Leaves

We also may desire just specifying the integer index of where we wish to perform a braid. This can be achieved with another typeclass for iterated rmaping. When the tree is in canonical form, this will enable us to braid two neighboring leaves by an integer index. This index has to be a typelevel number because the output type depends on it.

In fact there is quite a bit of type computation. Given a total tree type s and an index n this function will zoom into the subpart a of the tree at which we want to apply our function. The subpart a is replaced by b, and then the tree is reconstructed into t. t is s with the subpart a mapped into b.  I have intentionally made this reminiscent of the type variables of the lens type Lens s t a b .

This looks much noisier that it has to because we need to work around some of the unfortunate realities of using the typeclass system to compute types. We can’t just match on the number n in order to pick which instance to use because the patterns 0 and n are overlapping. The pattern n can match the number 0 if n ~ 0. The pattern matching in the type instance is not quite the same as the regular Haskell pattern matching we use to define functions. The order of the definitions does not matter, so you can’t have default cases. The patterns you use cannot be unifiable. In order to fix this, we make the condition if n is greater than 0 an explicit type variable gte. Now the different cases cannot unify. It is a very common trick to need a variable representing some branching condition.

For later convenience, we define rmapN which let’s us not need to supply the necessary comparison type gte.

#### Parentifying Leaves Lazily

While it is convenient to describe anyon computations in a canonical basis, it can be quite inefficient. Converting an arbitrary  anyon tree into the standard basis will often result in a dense vector. A natural thing to do for the sake of economy is only do reassociation on demand.

The algorithm for braiding two neighboring leaves is pretty straightforward. We need to reassociate these leaves so that they have the same parent. First we need the ability to map into the least common ancestor of the two leaves. To reassociate these two leaves to have a common parent we pullrightLeaf the left subtree and then pullLeftLeaf the left subtree. Finally, there is a bit extra bit of shuffling to actually get them to be neighbors.

As a first piece, we need a type level function to count the number of leaves in a tree. In this case, I am inclined to use type families rather than multi parameter typeclasses as before, since I don’t need value level stuff coming along for the ride.

Next, we make a typeclass for mapping into the least common ancestor position.

We find the least common ancestor position by doing a binary search on the size of the left subtrees at each node. Once the size of the left subtree equals n, we’ve found the common ancestor of leaf n and leaf n+1.

Again, this LCAMap typeclass has a typelevel argument gte that directs it which direction to go down the tree.

The Twiddle typeclass will perform some final cleanup after we’ve done all the leaf pulling. At that point, the leaves still do not have the same parent. They are somewhere between 0 and 2 F-moves off depending on whether the left or right subtrees may be just a leaf or larger trees. twiddle is not a recursive function.

Putting this all together we get the nmap function that can apply a function after parentifying two leaves. By far the hardest part is writing out that type signature.

#### Usage Example

Here’s some simple usage:

Note that rmapN is 0-indexed but nmap is 1-indexed. This is somewhat horrifying, but that is what was natural in the implementation.

Here is a more extended example showing how to fuse some particles.

I started with the tree at the top and traversed downward implementing each braid and fusion. Implicitly all the particles shown in the diagram are Tau particles. The indices refer to particle position, not to the particles “identity” as you would trace it by eye on the page. Since these are identical quantum  particles, the particles don’t have identity as we classically think of it anyhow.

The particle pairs are indexed by the number on the left particle. First braid 1 over 2, then 2 over 3, fuse 1 and 2, braid 2 under 3, fuse 2 and 3, and then fuse 1 and 2. I got an amplitude for the process of -0.618, corresponding to a probability of 0.382. I would give myself 70% confidence that I implemented all my signs and conventions correctly. The hexagon and pentagon equations from last time being correct gives me some peace of mind.

Syntax could use a little spit polish, but it is usable. With some readjustment, one could use the Haskell do notation removing the need for explicit >>=.

#### Next Time

Anyons are often described in categorical terminology. Haskell has a category culture as well. Let’s explore how those mix!