Reverse Mode Auto Differentiation is Kind of Like a Lens

Warning: I’m using sketchy uncompiled Haskell pseudocode.

Auto-differentiation is writing a function that also computes the derivative alongside calculating its value. Function composition is done alongside applying the chain rule to the derivative part.

One way to do this is to use a “dual number”. Functions now take a tuple of values and derivatives.

The Jacobean of a function from R^n \rightarrow R^m is a m by n matrix. The chain rule basically says that you need to compose the matrices via multiplication when you compose the value functions.  This is the composition of the linear maps.

Conceptually, you initialize the process with a NxN identity matrix corresponding to the fact that $latex \partial x_i/\partial x_j=\delta_{ij}

Vectorized versions of scalar functions (maps) will often use diag

A couple points:

  1.  Since the Jacobean j is always going to be multiplied in composition, it makes sense to factor this out into a Monad structure (Applicative maybe? Not sure we need full Monad power).
  2. There is an alternative to using explicit Matrix data types for linear maps. We could instead represent the jacobeans using (Vector Double) -> Vector Double. The downside of this is that you can’t inspect elements. You need explicit matrices as far as I know to do Gaussian elimination and QR decomposition. You can sample the function to reconstitute the matrix if need be, but this is somewhat roundabout. On the other hand, if your only objective is to multiply matrices, one can use very efficient versions. Instead of an explicit dense NxN identity matrix, you can use the function id :: a -> a, which only does some minimal pointer manipulation or is optimized away. I think that since we are largely multiplying Jacobeans, this is fine.


What we’ve shown so far is Forward Mode.

When you multiply matrices you are free to associate them in any direction you like. (D(C(BA))) is the association we’re using right now. But you are free to left associate them. ((DC)B)A). You can write this is right associated form using the transpose ((DC)B)A)^T = (A^T(B^T(C^TD^T)))

This form is reverse mode auto differentiation. Its advantage is the number of computations you have to do and the intermediate values you have to hold. If one is going from many variables to a small result, this is preferable.

It is actually exactly the same in implementation except you reverse the order of composition of the derivatives. We forward compose value functions and reverse compose derivative functions (matrices).


We have CPSed our derivative matrices.

Really a better typed version would not unify all the objects into a. While we’ve chosen to use Vector Double as our type, if we could tell the difference between R^n and R^m at the type level the following would make more sense.

However, this will no longer be a monad. Instead you’ll have to specify a Category instance. The way I got down to this stuff is via reading Conal Elliott’s new Automatic Differentiation paper which heavily uses the category interface.  I was trying to remove the need to use constrained categories (it is possible, but I was bogged down in type errors) and make it mesh nice with hmatrix. Let me also mention that using the Arrow style operators *** and dup and &&& and fst, and clever currying that he mentions also seems quite nice here. The Tuple structure is nice for expressing direct sum spaces in matrices. (Vector a, Vector b) is the direct sum of those vector spaces.

Anyway, the arrows for RD are

This is a form I’ve seen before though. It is a lens. Lens’ have a getter (a -> b) that extracts b from a and a setter (a -> b -> a) that given an a and a new b returns the replaced a.

Is an automatic derivative function in some sense extracting an implicit calculable value from the original vector and returning in a sense how to change the original function? It is unclear whether one should take the lens analogy far or not.

The type of Lens’  (forall f. Functor f => (b -> f b) -> a -> f a) means that it is isomorphic to a type like DFun’. The type itself does imply the lens laws of setters and getters, so these functions are definitely not proper lawful lenses. It is just curious that conceptually they are not that far off.

The lens trick of replacing this function with a quantified rank 1 type (forall f. ) or quantified rank-2 (forall p.) profunctor trick seems applicable here. We can then compose reverse mode functions using the ordinary (.) operator and abuse convenience functions from the lens library.

Neat if true.



Haskell Gloss is awesome

Gloss is a super simple binding to drawing 2d stuff in a window for haskell

Given how relatively bewildering Hello World feels in Haskell, it’s surprising how easy it is to get an animated window going. I think that may be because of expectations. You expect Hello world to be really easy and have no confusion and yet there is an inexplicable IO monad-what? , whereas you expect drawing to involve a bunch of inexplicable boiler plate bullshit.

This simulates a pendulum. Drawing it as a line.

simulate takes a pile of arguments

first thing describes a window, with title text, size and screen position I think?

then background color

frames per second

initial state (0 angle 0 angular velocity)

a drawing function from state to a picture, which is a gloss type of lines and crap

and a state update function taking a time step dt and current state.


I feel like there is room for a completely introductory tutorial to Haskell using gloss. It’s so rewarding to see stuff splash up on the screen.

MAYBE I’LL DO IT. or maybe not.

An ignoramus thinking about Compiling to Categories

I haven’t thoroughly read this (in fact barely at all) and yet I have some thoughts.

Conal Elliott is up to some good shit. Maybe ignore me and just checkout the links.

Simply Typed Lambda Calculus (STLC)

Hask is a category of haskell types with functions as arrows between them, but there is a lot going on in Haskell. Polymorphism for example, which is often modeled as some kind of Natural Transformation of EndoFunctors from Hask to Hask (but I don’t think this covers every possible use of polymorphism. Maybe it does).

It is commonly mentioned that STLC is the more direct mapping. STLC is kind of a subset of haskell with no polymorphism or with the polymorphism stripped out (Every time you use a polymorphic function just monomorphize it. This blows up the number of functions floating around).

STLC is a Cartesian Closed Category (CCC), which means it is always possible to create pairs between any two types and functions between any two types.

data STLCType a = Prim a | Pair (STLCType a) (STLCType a) | Arr (STLCType a) (STLCType a)

data STLCTerm a = Lam Var STLCTerm | App STLCTerm | Var Var | Prim a

data Var = Var String STLCType


which maybe be extendible with a bunch of primitive operations and types (primitives might include addition, integers, bits, bit operations, etc.). It isn’t clear to me where it is most elegant to put type annotations. Maybe it is best to keep them separate and compare them.

Apparently it is possible to compile this in a point free form

data CatTerm a = Comp STLCTerm STLCTerm | App STLCTerm STLCTerm | Prim a

or maybe

data CatTerm a = App STLCTerm STLCTerm | Prim a| Comp

Dealing with the labeling of variables correctly is a real pain in the ass, so this is a win from the compiler standpoint. It is a pain to manually write functions using this style.

The compiling to categories project I think is using Haskell as the language and GHC to do the parsing and some other stuff, but then grabbing Core (the GHC intermediate language) and converting it into the Category form. I don’t see why you couldn’t use an STLC DSL and do the same. It would be less ergonomic to the user but also much less complicated. I wonder. I’ve written interpreters for STLC and they are very simple.

Circuits form a CCC. Basic Prim type is a wire with a Boolean value on it. Pairs of these make a bus. Composition is just attaching wires between subunits. PrimOps might include Ands and Ors and Nands and multiplexers and such. Or maybe you want to work at the level where 32-bit integers are primitives and addition and subtraction and other arithmetic functions are primops.

The Arrow type is more questionable. Can you really do higher order functions in fixed circuitry? In principle, yes. Since every prim type is finite and enumerable, arrows are also finitely enumerable. You could use a varying lookup table for example as the incoming data. This is an exponential blowup though. Maybe you ought to be working in the category where arrows are types of functions that are compactly encodable and runnable. You don’t want really stupidly massive circuits anyhow. Some kind of complexity theory restriction. For example, maybe you want all functions encodable in a tight BDD. You really need to shape of the BDD to be fixed. Maybe BDD whose width is bounded by some polynomial of the depth? If you don’t need the full width, maybe you could leave sections dead. Just spitballin’

Or in many cases I assume the higher order functions will come applied at compile time, in which case you can just substitute the given circuit in to the locations it is needed or share it somehow (probably needs a clock to time multiplex its use) at the multiple locations it is needed to save space.

Also of extreme interest:

He’s using this compiling to categories perspective with the intent to layout parallel circuits.

He uses Generic DataTypes with are very functory, which implies type parameters which tempts one into polymorphism. But I think again he is using polymorphism as a scheme which upon compilation gets turned into a bunch of different legit types. Maybe you’d want

data STLCType f a = Prim (f a) | Pair (STLCType f a) (STLCType f a) | Arr (STLCType f a) (STLCType f a)


You could do well to make way more lightweight operator synonyms for this lambda calculus

Lam String LC | App LC LC | Var String

(–>) = Lam


(\\) = Lam

and some synonyms for common variables

x = “x”

y = “y”



($$) = App

to dereference

(**) = Var  – bind this very close. Trying to look kind of like pointer derferencing?

maybe also add pattern matching into the language

(Pat) =

And some kind of autocurrying

Curry [Var] |

Maybe use Vec instead of list for compile time size.

I guess this is gonna be funky and having the actual lambda syntax compile is super cool and nice. But it is also nice to have a userland haskell concept of this stuff without weird plugins. A OverloadLambda extension would be cool. I don’t know how it would work. Type directed inference of which kind of lambda to use? Is that possible?


Also, could one piggyback on the haskell type system to make the well typing lighter weight a la the well-typed interpeter. Maybe using GADTs.

It does seem unlikely that one could use raw lambdas in a good way


How I Ruined Today Reading Haskell Posts

Lot of gold here (comments included). Graphs are important and I haven’t yet grokked the FGL thing or other methods of handling them

Functional programming with graphs from haskell

A new methodology for Arduino Haskell bindings

Sharing in Haskell using Data.Reify.

Resources on String Diagrams, and Adjunctions, and Kan Extensions

I’ve been trying to figure out Kan Extensions

Ralf Hinze on Kan Extensions


But while doing that I went down a rabbit hole on String Diagrams

This post is the first one on String Diagrams that made sense to me.

I had seen this stuff before, but I hadn’t appreciated it until I saw what Haskell expressions it showed equivalence between. They are not obvious equivalences

This seems like a very useful video on this topic.

In Summary, it is an excellent notation for talking about transformations of a long sequence of composed Functors  F G H … into some other long sequence of Functors. The conversion of functors runs from up to down. The composition of functors procedes left to right.  F eta is the fmap of eta, and eta F is eta with the forall’ed type unified to be of the form F a.

Adjunctions L -| R are asymmetric between cups and caps. L is on the left in cups and on the right in caps. That’s what makes squiggles pull straightable

I think I have an interesting idea for a linear algebra library based on this stuff


John Baez and Mike Stay’s Rosetta Stone (A touch stone I keep returning to)

Dan Piponi gave a talk which is another touch stone of mine that I come back to again and again. There is a set of corresponding blog posts.

Other resources:

NCatLab article

John Baez hosted seminars



Dan Marsden’s Article

Marsden and Hinze have been collaborating

Stephen Diehl on Adjunctions


A Section From an old Oregon Programming Language Summer School (a rich set of resources)


Marsden and Hinze have been collaborating


Mike Stay doing a very interesting series of Category Theory in Javascript. He uses contracts in place of types. Defeats one of the big points of types (static analysis), but still pretty cool



I think that about covers everything I know about.

Oh yeah, there is the whole Coecke and Abramsky categorical quantum mechanics stuff too.

Using the Purescript Servant Bridge

Alright, here is some garbage code. Hope it helps you, sorry if it confuses you.

Checkout the Counter example

that is where I pulled everything from. I ripped out just about everything fancy just so you and I can see the truly bare bones example.

It’s mostly boiler plate on the generator side.

He does a couple fancier things that you may want, like rearranging names and changing folders. Here is a more basic example

This goes in app/ if you’re using a stack template

Mostly things can just be the defaults. Look at the Counter Example for some more config stuff you can do.

You do need to separate out the user json api by itself. If you hand writeAPIModuleWithSettings an api that has the RAW serving the index.html, it freaked out. Maybe there is a way to handle that, but it’s not like that is probably what you want anyhow.

The myTypes Sum Type you want to add to for every type that you want to export over to purescript. frontEndRoot is where the generated files will go.

The Proxy business is a bunch of typelevel programming boilerplate. So is the empty MyBridge type.

There is basically no content to this code.

You also need to add this app/PSGenerator.hs file to your cabal file.


Every time you want to run the generator, you need to run

stack exec psGenerator


This then will put an API file and a Data type file into your purescript source in frontend/src

Using the API is a touch annoying but correct. If you look at the generated signature

There are a lot of constraints you need to satisfy in the monad m in order to call this thing. You need a monad that is Reader-like for getting the SPSettings_, needs to handle a Possible AjaxError, and needs to be an Aff-like monad. Woof.

It makes sense that you’d want to do all of this, but it is a burdensome mental overhead to get started.

Here’s some very basic source that shows how to at least get to the stage where you can log it the resulting request. I’m just dumping the error like a bad boy.

Note that you have to install purescript-servant-support as it tells you when you run psGenerator. I’ve been using psc-package. It is often helpful to go in to the psc-package.json file and update to the latest package-set. Just a little tip.

You see that the ExceptT handles the AjaxError and the ReaderT supplies the settings, which uses the defaults + a baseURL to point the request to

The whole thing is run inside an Aff monad.


Here’s the basic servant code


Again, I started using the stack servant template, whose directory structure I’m complying with.


Edit: Some more comments: Purescript bridge is a seperate project from servant-purescript. Purescript bridge will translate your Haskell types. Servant purescript writes your api calls. The two lines in the main of PSGenerator.hs do these sepearte tasks. the writeAPI writes the API calls and  writePSTypes writes the types.

If you want to transport a parametrized data type like (Maybe a) in the myTypes things, hand the type a special type from here

works like a charm



Elm, Eikonal, and Sol LeWitt

We saw this cool Sol LeWitt wall at MASS MoCA. It did not escape our attention that it was basically an eikonal equation and that the weird junctures were caustic lines.

It was drawn with alternating colored marker lines appearing a cm away from the previous line. This is basically Huygens principal.

So I hacked together a demo in elm. Elm is a Haskell-ish language for the web.




So I made a quick rip and run elm program to do this. This is the output, which I could make more dynamic.

The algorithm is to turn a list of points into their connecting lines. Then move the line perpendicular to itself, then recompute the new intersection points. It’s somewhat reminiscent of Verlet integration. Lines coordinates are momentum-like and points are position like and we alternate them. This is a finite difference version of the geometric Huygen’s principle.

Alternative methods which might work better include the Fast Marching Method or just using the wave equation and then plotting iso surfaces.

I also had to resample the function to get only the maximum y value for each x value in order to duplicate the LeWitt effect.


These are the helper functions with lots of junk in there

And this is the svg main program.



notes on elm

elm is installed with npm


you import packages (including your own) with

import ThisPackage

and you check types by just writing them and hitting enter rather than :t

elm-live is a very handy thing. A live reloading server that watches for changes in your files.

elm-make myfile.elm

will generate the javascript and html

This is a good tutorial and a good snippet to get you going


Differences from Haskell:

elm isn’t lazy which is probably good.

The composition operator (.) is now <<

elm doesn’t have the multiline pattern match of haskell. You need  to use case expressions. I miss them.

typeclass facilities are not emphasized.

The list type is List a rather than [a]


Contracts and Category Theory

Contracts are both less cryptic and less impressive than a type system.

A type system somehow proves that things will never happen at compile time. You’ll never get a string when you expected an int or even more powerful facts. Contracts just guarantee that at least if certain problems occur at least the program will freak out and tell you at runtime.

I had been vaguely aware of contracts which I considered to be kind of a band aid Racket thing that is their solution to type safety (besides typed racket), but I did not go into depth. And I kind of viewed the thing more as a methodology for showing loop invariants and algorithmic correctness rather than type correctness. I do not know if this is an accurate portrayal of what is in Racket and I’m pretty sure contracts do not actually originate there (Eiffel?).

Mike Stay (who you may know as the co-author of the Rosetta Stone paper a bunch of videos which I don’t know how I didn’t come across before (they’re kind of old by skateboarding millennial mountain dew front end developer standards. Did Node even exist? Barely.). Javascript (well maybe after python) was my language of greatest comfort a couple years ago and I would have loved this. I absolutely loved Bartosz Milewski’s Category Theory for Programmer’s series. There is a crap-ton of line noise  that kind of muddies the waters though in javascript. I wonder if it makes sense to me because I mostly already understand what he’s going for from a Haskell context. He has some cool patterns here like using objects as typeclasses/interfaces.

The really neat thing he discusses is higher order contracts which I’d bet is a central piece of contract based programming but I had never gotten that far.

I’m still skimming over the vids, but nice job.


Some simple ST Monad examples

The ST monad is what you use to do real mutable state in Haskell basically.

The State monad is a more pure guy that just automatically threads the state through pure functions for you.

The ST monad, and structures in it, to my understanding is actually backed by computer memory that is changing. Some things that should be fast and easy become actually fast and easy. Like changing a single element in an array without rebuilding the whole damn thing (or a lot of it).

The ST monad is probably bad style. You’re supposed to bend over backward to avoid mutability. It’s a gun. Don’t shoot yourself. Maybe better style is to use the C FFI (foreign function interface) if you really really need mutability.

Unlike the IO monad the ST monad can be escaped. Sometimes this is called thawing and freezing, the process of going into and out of the monad.

Here’s a couple snippets that demo how it works.

I recommend not thinking about the actual monad laws of this thing. The type signature of ST is messed up. It uses some kind of weird type argument to guarantee in the typechecker that the ST monad reference can’t leak outside the monad. In terms of just pretending the do notation is imperative like, it makes sense though.

makeSTRef puts that value you give it into the variable n.

readSTRef pulls out. Modify Let’s you modify.

runST ubiquitously has to be used to rip off the ST monad layer. You don’t want it if you want to combine a bunch of little ST functions. makeArray’ doesn’t have it so if you look at it in ghci you don’t see 10, you see <<ST Action>>. If you haven’t read the reference or frozen the vector, you can’t use runST. You’ll get an error because that would leak the mutable thing out of the monad.

Vectors are how Haskell has actual c style arrays. It’s kind of like numpy if you’re familiar with that. Unboxed means you’re using raw ints and floats and stuff.

M.replicate builds a size 3 vector filled with doubles of value 1.2. Then I modify the second element and freeze it into an immutable vector to escape the monad.

Storable vectors might be the better default. They are the same really, except they can be passed through the C FFI. I believe hmatrix uses them (and other c interfacing libraries) for passing arrays into Haskell.