Garbage Can Compiling to Categories with Inspectable Lambdas

There are a couple kinds of functions that we can turn into totally inspectable data.

Linear functions can be reconstituted into a matrix if you give a basis of vectors.

Functions from enumerable types can be turned into a lookup table

Sufficiently polymorphic functions are another example though. forall a. a-> a is commonly known to only be id. The same goes for fst = forall a b. (a,b)->a and snd and swap and all the nesting of . These functions have exactly one inhabiting value (excluding internal churning and the possibility of going into an infinite loop).

So the type directly tells us the implementation

forall a. (a,a)->a is similar. It can only be fst or snd. Types that reuse a type parameter in the input can only be permutations.

I’ve been trying to find a way to take a written lambda and convert it to data automatically and have been having trouble.

An opaque type that we have hidden the contructors to is the same (T,T)->T can only be fst or snd specialized to T since we can’t possibly destruct on T.

We can figure out which one by giving a labeled example to that function and then inspecting a single output.  This gives the permutation and duplication that was done.

Similarly for T -> Either T T

Once we have this, we can (Hopefully) reinterpret this lambda in terms of a monoidal category.

 

 

What about TH? Also the new quantified constraints extensions might be helpful?

 

 

Ok. A Different approach. This works much better to what I had in mind. you can write aribatrary (\(x,y,) -> (y,x)) tuple like lambdas and it will convert them to a category. I really had to hack around to get the thing to compile. Like that Pick typeclass, what the heck? Why can I get defaults values in type families but not in typeclasses?

It is all decidedly not typesafe. You can get totally nonsensical things to compile to something. However if you stick to lambdas, you’ll be ok. Maybe.

No on further review this does not work. I got tricked that the type seemed ok at a certain point.  A couple problems arise upon actual application. Since the idea is to drive the form based on the type variables upon actual application to something that has types of the same form it gets all screwed up. Also tons of instances are overlapping, although I think this is fixable.

Maybe what I need is existential types that can’t ever unify together accidentally.

A couple thought on typelevel programming principles:

  1. Typeclasses are hard to get default cases. You want to use type families if that is what you want
  2. Typeclasses need unique stuff to appear on the right hand side. Only 1 pattern should match. You might need to add extra parameters to match to which you can force on the left hand side of the instance
  3. ~ type equality is real useful

 

An alternative to using lambda is to use an explicit Proxy. The type variables are basically just as good for syntactic purposes (a touch more noisy).

 

 

Pytorch Trajectory Optimization

Trajectory optimization is cool. The idea is to take a dynamical problem as a big ole optimization problem, finding the best actions to take to achieve your goals or maximize a reward.

There are a couple of flavors of trajectory optimization (shooting methods, collocation methods) http://www.matthewpeterkelly.com/tutorials/trajectoryOptimization/

PyTorch gives a pretty low overhead extension to Numpy that also gives autodifferentiation. It is mainly intended as a neural network library, for which it has a number of facilities.

Gradient Descent is not the preferred method for these problems (According to Boyd’s Convex optimization course). Gradient Descent has shit convergence compared to newton iteration, but is very flexible and easy to implement.

In addition, using a normal ODE solver from Scipy would be much more stable, but it would require cleverness to have the same code work for both scipy and the torch parts. So screw it.

One nicety of this approach is that we don’t even have to have our derivatives solved for. They could be all tied up in a

I thought that maybe I could just weight the dynamics cost enough to have it require the dynamics be satisfied, but that did not seem to work. Maybe with more fiddling? On further review my code had massive bugs in it. I’m not sure that the dynamics cost version wouldn’t work, but the Lagrange multiplier method seems to work well and makes sense too.

In this formulation, we can also train some kind of parametrized controller function f_w(x) by sampling some random initial starting conditions (or even dynamical parameters like mass and length etc, or noise forces). This is quite nice.

Additional bits that may be nice: Backtracking line search, logarithmic potential for inequalities, I wonder if a symplectic style interleaving of position and momentum might be nice even for this global case. Should definitely just tie up all the vars into a single x. Can we use a lagrangian or hamiltonian and then have pytorch differentiate that? It may in fact be nice to use some combinator to be able to hand the same function to ODEInt for a couple reasons (getting good initilizations  of the path for example).

For a simple system, I’m using \dot{x}=v , \dot{v}=f , where you get to control f at every time point and x is starting at 0 and wants to get to 1. I’m using a simple scheme of finite difference in time for the time derivative. x and v are defined at t and f, lx, lv are defined at the half time steps t + \frac{1}{2}. You need at least two time steps to get a derivative. I’m adding a square cost to the force, otherwise it would just get a huge force. lx and lv are Lagrange multipliers enforcing the equations of motion at each time step

Here was an initial pass (just here for for historical reasons, look at the updated one below. This one does not work as is)

 

goofed up a couple things (inlcuding my xres making no sense. You need to explicility zero gradients. Pretty annoying). Lagrange multiplier method makes total sense.

Could we use a Hamiltonian and use autograd to derive equations of motion? Seems plausible and convenient.

Can I make a custom pytorch layer for sparse Hessians? The data oriented viewpoint would have you pump the gradient and hessian backward. Or could you automatically build an H-matrix structure for the hessian of convnets?

Put a neural controller in there. Each batch could have randomized parameters, noise, and initial conditions.

Is rebuilding total_cost every time bad?