Giving the Mostly Printed CNC a try (MPCNC)

Declan had the good idea to make a CNC machine. There is a popular plan available here

https://www.v1engineering.com/specifications/

A Doge

The really cute part of it is using electrical conduit as rails, which are shockingly inexpensive. Like a couple bucks for 4 feet! Holy shnikes!

We’ve been printing up a storm for the last couple weeks. A ton of parts!

We already had a lot of motors and stuff lying around. Declan bought a lot of stuff too just for this. Assorted bearings and bolts. The plans have a bill of materials.

Repetier host seemed to work pretty well for controlling the board

Used the RAMPS branch of the mpcnc marlin repo

Edited the header files as described in this post so that we could use both extruders as extra x and y motor drivers. It did not seem like driving two motors from the same driver board was acceptable. Our bearings are gripping the rails a little too tight. It is tough to move.

Some useful links on the thingiverse version of the mpccnc https://www.thingiverse.com/thing:724999

He suggests using this program http://www.estlcam.com/ Seem like windows only? ugh.

The mpcnc plans don’t contain actual tool mounts but here are some examples

A pen holder: https://www.thingiverse.com/thing:1612207/comments

A dewalt mount: https://www.thingiverse.com/thing:944952

This is an interesting web based g-code maker. Ultimately a little to janky though. It works good enough to get started http://jscut.org/jscut.html . Not entirely clear what pocket vs interior vs whatever is. engrave sort of seemed like what I wanted. Went into inkscape with a reasonable png and traced bitmapped it, then object to path. It’s also nice to just find an svg on the internet

The following code was needed to zero repetier and the RAMPS at the touch off point. We added it as a macro. It is doing some confusing behavior though.

pycam is the best I can find for 3d machining. Haven’t actually tried it yet

http://pycam.sourceforge.net/getting-started/

We should probably upgrade the thing to have limit switches. It pains me every time we slam it into the edge.

All in all, a very satisfying project. Hope we build something cool with it.

A horsie

Chile: Nice place

Just got back from Chile from a vacation visiting Will. Nice place.

We took a leisurely amount of time getting to Torres del Paine, which was the main feature of our trip. We travelled through Santiago, Punta Arenas and Puerto Natales. We spent a very tired day in the children’s science museum and rode the funicular. There wasn’t that much to do in the latter two cities, maybe we could have shaved some time from them. Our hostel in Punta Arenas was notably ramshackle. We spent 5 days backpacking in the park. Absolutely gorgeous. The wind on the second day was like nothing I’ve ever experienced. I was a little concerned about staying on my feet. Hiking poles for the win. Day 4 was cold and wet and miserable, but it ended up ok in the end. We were able to get a spot in a refugio when it just was too overwhelming to try to set up our tents in a flooded muddy campsite on that day. I think Beth in particular was at the end of her rope. I basically didn’t poop the entire first week I was there, but one glorious day on the mountain the heavens parted for me, and I was fine from then on. I didn’t quite pack food right. There ended up being camp stores at most of the places we stayed, but if I hadn’t been able to re up on cookies it would have been a lean couple days food wise. Ramen Bombs for the win. We drank right from the streams, which is unusual for us. Usually we filter.

All told we did ok on food. I really like the al pobre thing. What isn’t to like a about steak, onions, and eggs on fries? Chileans seem a little eager on the mayo. Nobody does breakfast right except the US. The street food was good. I love the fried tortilla thing that you can just slather salsa on. It was like 30 cents. The empanadas were also pretty great cheap food. Ceviche was also very tasty. They toss out avocado like it’s nuthin down there. Sandwiches were kind of shitty. Don’t know if that is entirely fair, but that is how it felt. Highlight meal of the trip was at Cafe Artimana in Puerto Natales. Yeah, I got some al pobre. But also basil lemonade stuff

After the hiking, we scheduled an early return to Santiago rather than busting our asses to a glacier viewpoint. In the airport at Punta Arenas, we got the southernmost dominos in the entire world. Ben Will and Declan went on a taxi quest to go get it. Wandered around Santiago, saw some churches and cathedrals, a fort, ate churros, etc. Declan was on a quest for a neck pillow. We did a prison themed Escape Room. People felt like we got a little cheated because some of the puzzles felt like bullshit? I think they really expect to break room records. I suck at escape rooms. We were able to spend a day in Valparaiso, which had a super awesome street art scene.

I spent the last day puking my guts out. So it goes. Not sure how exactly. The street sausage may have put me over the top. I guess I’m a sensitive fellow? I get pretty consistently unwell on trips.

Chile has tons of fluffy street dogs. They’re pretty friendly, although they do chase cars and motorcycles. Idiots.

Chile has a way lower english quotient than other trips I made. I’ve been surprised how common at least some english has been in Europe and Asia, and was now equally surprised how little there was in Chile. It makes sense. A lot of the continent is spanish speaking. It was really useful to have Will around, who has gotten shockingly good at Spanish from an outsiders perspective.

SUCH MEMORIES

Declan’s post on the trip.

tough day

Wait, where are all my BOBBY PICS!?!

o there u r u cutie

overwhelmed

A strange place named Andre’s
boys
beth

Thoughts on Faking Some of GADTs in Rust

I’m a guy who is somewhat familiar with Haskell who is trying to learn Rust. So I thought I’d try to replicate some cool Haskell functionality in Rust. I would love to hear comments, because I’m trying to learn. I have no sense of Rust aesthetics yet and in particular I have no idea how this interacts with the borrow system. What follows is a pretty rough brain dump

GADTs (Generalized algebraic data types) are an extension in Haskell that allows you to write constrained type signatures for your data constructors. They also change how the type checking of pattern matching is processed.

GADTs are sometimes described/faked by being built by making data types that hold equality/unification constraints. Equality constraints in Haskell like a ~ Int are fairly magical and the Rust compiler does not support them in an obvious way. Maybe this is the next project. Figure out how to fake ’em if one can. I don’t think this is promising though, because faking them will be a little wonky, and then GADTs are a little wonky on top of that. See https://docs.rs/refl/0.1.2/refl/ So we’ll go another (related) road.

This is roughly what GADTs look like in Haskell.

And here is one style of encoding using smart constructors and a typeclass for elimination (pattern matching is replicated as a function that takes callbacks for the data held in the different cases). Regular functions can have a restricted type signature than the most general one their implementation implies. The reason to use a typeclass is so that we can write the eliminator as returning the same type that the GADT supplies. There isn’t an explicit equality constraint. A kind of Leibnitz equality

is hiding in the eliminator. The Leibnitz equality can be used in place of (~) constraints at some manual cost. http://code.slipthrough.net/2016/08/10/approximating-gadts-in-purescript/

https://jesper.sikanda.be/files/leibniz-equality.pdf

The

is a problem for Rust. Rust does not have higher kinded types, although they can be faked to some degree. https://gist.github.com/CMCDragonkai/a5638f50c87d49f815b8 There are murmurs of Associated Type Constructors / GATs , whatever those are , that help ease the pain, but I’m pretty sure they are not implemented anywhere yet.

I’m going to do something related, a defunctionalization of the higher kinded types. We make an application trait, that will apply the given type function tag to the argument. What I’m doing is very similar to what happens in the singletons library, so we may be getting some things for free.

https://typesandkinds.wordpress.com/2013/04/01/defunctionalization-for-the-win/

Then in order to define a new typelevel function rip out a quick tag type and an App impl.

It might be possible to sugar this up with a macro. It may also be possible to write typelevel functions in a point free style without defining new function tag names. The combinators Id, Comp, Par, Fst, Snd, Dup, Const are all reasonably definable and fairly clear for small functions. Also the combinator S if you want to talk SKI combinatory calculus, which is unfit for humans. https://en.wikipedia.org/wiki/SKI_combinator_calculus For currying, I used a number for how many arguments are left to be applied (I’m not sure I’ve been entirely consistent with these numbers). You need to do currying quite manually. It may be better to work with tuplized arguments

Anyway, the following is a translation of the above Haskell (well, I didn’t wrap an actual i64 or bool in there but I could have I think). You need to hide the actual constructors labeled INTERNAL in a user inaccessible module.

The smart constructors put the right type in the parameter spot

Then pattern matching is a custom trait per gadtified type. Is it possible to unify the different elimination traits that will come up into a single Elim trait? I’m 50-50 about whether this is possible. What we’re doing is a kind of fancy map_or_else if that helps you.

https://doc.rust-lang.org/std/option/enum.Option.html#method.map_or_else

Usage. You have to explicitly pass the return type function to the eliminator. No inference is done for you. It’s like Coq’s match but worse. BTW the dbg! macro is the greatest thing on earth. Well done, Rust.

You can make helpers that don’t require explicit types to be given

One could also make an Eq a b type with Refl similarly. Then we need typelevel function tags that take two type parameter. Which, with currying or tupling, we may already have.

Questions:

Is this even good? Or is it a road of nightmares? Is this even emulating GADTs or am I just playing phantom type games?

We aren’t at full gadt. We don’t have existential types. Rust has some kind of existential story evolving (already there?), but the state of it is confusing to me. Something to play with. Higher rank functions would help?

Are overlapping typeclasses a problem in Rust?

Again, I have given nearly zero thought to borrowing and how it interacts with this. I’m a Rust n00b. I should think about it. Different eliminators based on whether you own or are borrowing?.

How much of singleton style dependent types do we get from this? It feels like we have already paid the cost of defunctionalizing. http://hackage.haskell.org/package/singletons

My current playground for this is at https://github.com/philzook58/typo

Cvxpy and NetworkX Flow Problems

Networkx outputs scipy sparse incidence matrices

https://networkx.github.io/documentation/networkx-1.9/reference/generated/networkx.linalg.graphmatrix.incidence_matrix.html#networkx.linalg.graphmatrix.incidence_matrix

https://docs.scipy.org/doc/scipy/reference/sparse.html

Networkx also has it’s own flow solvers, but cvxpy gives you some interesting flexibility, like turning the problem mixed integer, quadratic terms, and other goodies. Plus it is very easy to get going as you’ll see.

So here’s a basic example of putting these two together. Very straightforward and cool.

Here was a cool visual from a multi commodity flow problem (nx.draw_networkx_edges)

Nice, huh.

Nand2Tetris in Verilog and FPGA and Coq

Publishing these draft notes because it has some useful info in it and trying to reboot the project. It’s very ambitious. We’ll see where we get with it.

https://github.com/philzook58/nand2coq

 

Old Stuff (Last Edited 6/23/18):

So my friends and I are starting the nand2tetris course. I feel like I have some amount of familiarity with the topics involved, so I’d like to put it into challenge mode for me.

Week 1 is about basic combinatorial logic gate constructions and sort of the ideas of an HDL.

I was trying to keep up in Verilog and failing. The verilog syntax is a little bit more verbose.

Still not so bad.

The easiest thing to use was assign statements.  The difference between = and <= in verilog is still a little opaque to me

I compiled them and ran them using Icarus verilog (iverilog and the vpp the output file).

I started using MyHDL but I’m not sure I saw why it was going to be easier? But the MyHdl docs did help me understand a bit why verilog is the way it is.

 

Here is a big list of interesting projects:

MyHDL – A python hardware description language. Can output VHDL and Verilog. based around python generators and some decorators.

Icarus Verilog – http://iverilog.wikia.com/wiki/Main_Page. iverilog Compiles verilog into a assembly format which can be run with vvp command.

example

 

Verilator – Compiles Verilog to C for simulation

GTKWave – A Waveform viewer

IceStick – A cheap 20$ ish fpga usb board that can be programmed easily

IceStorm http://www.clifford.at/icestorm/ – An OpenSource toolchain for compiling to and programming ice40 fpga chips

IceStudio – a graphical block editor. Last I checked it was still a little clunky

EdaPlayground – online web app for writing code and giving to  simulators

 

Formal tools:

yosys-smtbmc

symbiyosys

http://www.clifford.at/papers/2016/yosys-smtbmc/

http://zipcpu.com/blog/2017/10/19/formal-intro.html

 

icestick floorplan – https://knielsen.github.io/ice40_viewer/ice40_viewer.html

ZipCPU

open source fpga twitter https://twitter.com/ico_tc?lang=en

https://opencores.org/

 

Learning Verilog for FPGAs: The Tools and Building an Adder

 

Upduino – interesting set of boards. Cheap.

http://gnarlygrey.atspace.cc/development-platform.html#upduino

 

Questionable: Clash?

installing icestick on the mac

https://github.com/ddm/icetools

https://github.com/Homebrew/homebrew-core/issues/9229

Had to pip uninstall enum34. Weird.

 

Verilog

Start with module statement

end lines with semicolons.

You need to name instantiated elements

 

 

yosys -p “synth_ice40 -top not1 -blif not.blif” not.v

https://mcmayer.net/first-steps-with-the-icestorm-toolchain/

../icetools/arachne-pnr/bin/arachne-pnr  -d 1k -P tq144 -o not.asc -p not.pcf not.blif

../icetools/icestorm/icepack/icepack not.asc not.bin

iceprog not.bin

The ftdi isn’t working

 

 

 

 

 

More Reinforcement Learning with cvxpy

So I spent thanksgiving doing this and playing Zelda. Even though that sounds like a hell of a day, seems a little sad for thanksgiving :(. I should probably make more of an effort to go home next year.

I tried implementing a more traditional q-learning pipeline using cvxpy (rather than the inequality trick of the last time). Couldn’t get it to work as well. And it’s still kind of slow despite a lot of rearrangement to vectorize operations (through batching basically).

I guess I’m still entranced with the idea of avoiding neural networks. In a sense, that is the old boring way of doing things. The Deep RL is the new stuff. Using ordinary function approximators is way older I think. But I feel like it takes a problem out of the equation (dealing with training neural nets). Also I like modeling languages/libraries.

I kept finding show stopping bugs throughout the day (incorrectly written maxaction functions, typos, cross episode data points, etc.), so I wouldn’t be surprised if there is one still in here. It’s very surprising how one can convince oneself that it is kind of working when it is actually impossible it’s working. All these environments are so simple, that I suspect I could randomly sample controllers out of a sack for the time I’ve been fiddling with this stuff and find a good one.

 

I also did the easy cartpole environment using the inequality trick.  Seems to work pretty well.

 

 

I also have some Work in Progress on getting full swingup cartpole. Currently is not really working. Seems to kind of be pumping about right? The continuous force control easy cartpole does work though.

 

Now I feel that a thing that matters quite a bit is what is your choice of action for the next time step. Hypothetically you want a ton of samples here. I now think that using an action that is just slightly perturbed from the actual action works well because the actual action is tending to become roughly the optimal one. Subsequent time steps have roughly the same data in them.

One advantage of discrete action space is that you can really search it all.

Does that mean I should seriously investigate the sum of squares form? A semidefinite variable per data point sounds bad. I feel like I’d have to seriously limit the amount of data I’m using. Maybe I’ll be pleasantly surprised.

I haven’t even gotten to playing with different polynomials yet. The current implementation is exponentially sized in the number of variables. But in kind of a silly way. I think it would be better to use all terms of a bounded total degree.

 

Q-Learning with Linear Programming (cvxpy, OpenAI Gym Pendulum)

http://web.mit.edu/~pucci/www/discountedLP.pdf

http://underactuated.mit.edu/underactuated.html?chapter=dp

There is a fun idea of using Linear Programming to do dynamic programming I originally saw in the underactuated robotics textbook.

In my experience reinforcement learning is finicky and depressing. It usually doesn’t work and is very hard to troubleshoot. Do you just need to run it for 10 minutes? 10 years? Is there a bug? God knows. I end up wriggling hyperparameters and praying a lot.

One part of this is the relative finickiness of neural network optimization compared to the technology of convex optimization. Convex optimization solvers are quite reliable and fast.

There is a way of phrasing Q learning as a linear programming problem

The linear programming approach relaxes the Bellman equations.

Q(s_t,a_t)=r_t + \gamma \max_a Q(s_{t+1},a)

to

\forall a. Q(s_t,a_t) \ge r_t +\gamma Q(s_{t+1},a)

We can approach this forall in a couple ways, one of which is just sampling actions somehow. To make the constraint tight in places you minimize a weighting of Q

\min \sum w_i * Q(s_i,a_i)

If Q is written as a linear combination of basis functions

Q(s,a)=\sum \alpha_i f_i(s,a)

The all of this put together is a linear program in the variables \alpha_i.

 

For ease, I used cvxpy. I don’t even store my state action pairs, which is quite lazy of me. Even here, compiling the linear program via cvxpy is kind of slow. This preprocessing step takes longer than the actual solve does. You could avoid cvxpy and directly interface a linear programming solver much faster, if that is your thing.

The whole process is still model free. I didn’t plug in pendulum dynamics anywhere. I run openAI gym and use the resulting state-action-state tuples to add inequalities to my cvxpy model. I weight where I want the inequalities to be tightest by using the actual states experienced.

Unfortunately, it still took a couple hours of hyper parameter tuning and fiddling to get the thing to work. So not a grand success on that point.

I made a lot of guesswork for what seemed reasonable

I parametrized the dependence of Q on a by a quadratic so that it is easy to maximize analytically. That is what the polyfit stuff is about. Maximum of ax^2+bx+c is at -b/2a. I really should be checking the sign of the a coefficient. I am just assuming it is positive. Naughty boy.

m assuming that it

Chebyshev polynomials are probably good.

It seemed to help to use a slight perturbation of the actual action used on the right hand side of the Bellman inequality. My reasoning here is that the pendulum is actually a continuous system, so we should be using the differential Bellman equation really.

Should I allow for some kind of slack in the equations? Getting a bad reward or data point or one weird unusual state could ruin things for everyone. Inequalities are unforgiving.

Gamma seemed to matter a decent amount

The regularization of alpha seemed largely irrelevant.

Epsilon greediness seems to not matter much either.

 

 

Future ideas:

Might be good to replace the sampling of a with a Sum of Squares condition over the variable a.

Should I damp the update in some way? Add a cost the changing alpha from it’s previous value. A kind of damped update / using a prior.

 

 

 


Edit:

A improved version. Fixed the bug in my maxaction function. I shouldn’t have been assuming that it was always concave down.

Also vectorized slightly. Fairly significantly improves the solve time. Not much time is spent in cvxpy, now the solve is dominated by about 3 legitimate seconds in OSQP.

You can flip stuff in and out of loops to try different versions. This method is off-policy, so I could keep data around forever. However, it mostly just slowed the solve time.

 

A Touch of Topological Quantum Computation in Haskell Pt. I

Quantum computing exploits the massive vector spaces nature uses to describe quantum phenomenon.

The evolution of a quantum system is described by the application of matrices on a vector describing the quantum state of the system. The vector has one entry for every possible state of the system, so the number of entries can get very, very large. Every time you add a new degree of freedom to a system, the size of the total state space gets multiplied by the size of the new DOF, so you have a vector exponentially sized in the  number  of degrees of freedom.

Now, a couple caveats. We could have described probabilistic dynamics similarly, with a probability associated with each state. The subtle difference is that quantum amplitudes are complex numbers whereas probabilities are positive real numbers. This allows for interference. Another caveat is that when you perform a measurement, you only get a single state, so you are hamstrung by the tiny amount of information you can actually extract out of this huge vector. Nevertheless, there are a handful of situations where, to everyone’s best guess, you get a genuine quantum advantage over classical or probabilistic computation.

Topological quantum computing is based around the braiding of particles called anyons. These particles have a peculiar vector space associated with them and the braiding applies a matrix to this space. In fact, the space associated with the particles can basically only be manipulated by braiding and other states require more energy or very large scale perturbations to access. Computing using anyons has a robustness compared to a traditional quantum computing systems. It can be made extremely unlikely that unwanted states are accessed or unwanted gates applied. The physical nature of the topological quantum system has an intrinsic error correcting power. This situation is schematically similar in some ways to classical error correction on a magnetic hard disk. Suppose some cosmic ray comes down and flips a spin in your hard disk. The physics of magnets makes the spin tend to realign with it’s neighbors, so the physics supplies an intrinsic classical error correction in this case.

The typical descriptions of how the vector spaces associated with anyons work I have found rather confusing. What we’re going to do is implement these vector spaces in the functional programming language Haskell for concreteness and play around with them a bit.

Anyons

In many systems, the splitting and joining of particles obey rules. Charge has to be conserved. In chemistry, the total number of each individual atom on each side of a reaction must be the same. Or in particle accelerators, lepton number and other junk has to be conserved.

Anyonic particles have their own system of combination rules. Particle A can combine with B to make C or D. Particle B combined with C always make A. That kind of stuff. These rules are called fusion rules and there are many choices, although they are not arbitrary. They can be described by a table N_{ab}^{c} that holds counts of the ways to combine a and b into c. This table has to be consistent with some algebraic conditions, the hexagon and pentagon equations, which we’ll get to later.

We need to describe particle production trees following these rules in order to describe the anyon vector space.

Fibonacci anyons are one of the simplest anyon systems, and yet sufficiently complicated to support universal quantum computation. There are only two particles types in the Fibonacci system, the I particle and the \tau  particle. The I particle is an identity particle (kind of like an electrically neutral particle). It combines with \tau to make a \tau. However, two \tau particle can combine in two different ways, to make another \tau particle or to make an I particle.

So we make a datatype for the tree structure that has one constructor for each possible particle split and one constructor (TLeaf, ILeaf) for each final particle type. We can use GADTs (Generalize Algebraic Data Types) to make only good particle production history trees constructible. The type has two type parameters carried along with it, the particle at the root of the tree and the leaf-labelled tree structure, represented with nested tuples.

Free Vector Spaces

We need to describe quantum superpositions of these anyon trees. We’ll consider the particles at the leaves of the tree to be the set of particles that you have at the current moment in a time. This is a classical quantity. You will not have a superposition of these leaf particles. However, there are some quantum remnants of the history of how these particles were made. The exact history can never be determined, kind of like how the exact history of a particle going through a double slit cannot be determined. However, there is still a quantum interference effect left over. When you bring particles together to combine them, depending on the quantum connections, you can have different possible resulting particles left over with different probabilities. Recombining anyons and seeing what results is a measurement of the system .

Vectors can be described in different basis sets. The bases for anyon trees are labelled by both a tree structure and what particles are at the root and leaves. Different tree associations are the analog of using some basis x vs some other rotated basis x’. The way we’ve built the type level tags in the FibTree reflects this perspective.

The labelling of inner edges of the tree with particles varies depending on which basis vector we’re talking about. A different inner particle is the analog of \hat{x} vs \hat{y}.

To work with these bases we need to break out of the mindset that a vector put on a computer is the same as an array. While for big iron purposes this is close to true, there are more flexible options. The array style forces you to use integers to index your space, but what if your basis does not very naturally map to integers?

A free vector space over some objects is the linear combination of those objects. This doesn’t have the make any sense. We can form the formal sum (3.7💋+2.3i👩‍🎨) over the emoji basis for example. Until we attach more meaning to it, all it really means is a mapping between emojis and numerical coefficients. We’re also implying by the word vector that we can add two of the combinations coefficient wise and multiply scalars onto them.

We are going to import free vectors as described by the legendary Dan Piponi as described here: http://blog.sigfpe.com/2007/03/monads-vector-spaces-and-quantum.html

What he does is implement the Free vector space pretty directly. We represent a Vector space using a list of tuples [(a,b)]. The a are basis objects and b are the coefficients attached to them.

 

The Vector monad factors out the linear piece of a computation. Because of this factoring, the type constrains the mapping to be linear, in a similar way that monads in other contexts might guarantee no leaking of impure computations. This is pretty handy. The function you give to bind correspond to a selector columns of the matrix.

We need some way to zoom into a subtrees and then apply operations. We define the operations lmap and rmap.

You reference a node by the path it takes to get there from the root. For example,  (rmap . lmap . rmap) f applies f at the node that is at the right-left-right position down from the root.

Braiding

For Fibonacci anyons, the only two non trivial braidings happen when you braid two \tau particles. 

We only have defined how to braid two particles that were split directly from the same particle. How do we describe the braiding for the other cases? Well we need to give the linear transformation for how to change basis into other tree structures. Then we have defined braiding for particles without the same immediate parent also.

F-Moves

We can transform to a new basis. where the histories differs by association. We can braid two particles by associating the tree until they are together. An association move does not change any of the outgoing leaf positions. It can, however, change a particle in an interior position. We can apply an F-move anywhere inside the tree, not only at the final leaves.

 

Fusion / Dot product

Two particles that split can only fuse back into themselves. So the definition is pretty trivial. This is like \hat{e}_i \cdot \hat{e}_j = \delta_{ij}.

Hexagon and Pentagon equations

The F and R matrices and the fusion rules need to obey consistency conditions called the hexagon and pentagon equations. Certain simple rearrangements have alternate ways of being achieved. The alternative paths need to agree.

Next Time:

With this, we have the rudiments of what we need to describe manipulation of anyon spaces. However, applying F-moves manually is rather laborious. Next time we’ll look into automating this using arcane type-level programming. You can take a peek at my trash WIP repo here

 

RefErences:
A big ole review on topological quantum computation: https://arxiv.org/abs/0707.1889
Ady Stern on The fractional quantum hall effect and anyons: https://www.sciencedirect.com/science/article/pii/S0003491607001674

 

Another good anyon tutorial: https://arxiv.org/abs/0902.3275

Mathematica program that I still don’t get, but is very interesting: http://www.cs.ox.ac.uk/people/jamie.vicary/twovect/guide.pdf

Kitaev huge Paper: https://arxiv.org/abs/cond-mat/0506438

Bonderson thesis: https://thesis.library.caltech.edu/2447/2/thesis.pdf

Bernevig review: https://arxiv.org/abs/1506.05805

More food for thought:

The Rosetta Stone: http://math.ucr.edu/home/baez/rosetta.pdf

http://blog.sigfpe.com/2008/08/hopf-algebra-group-monad.html

http://haskellformaths.blogspot.com/2012/03/what-is-hopf-algebra.html

 

 

Deriving the Chebyshev Polynomials using Sum of Squares optimization with Sympy and Cvxpy

Least squares fitting \sum (f(x_i)-y_i)^2 is very commonly used and well loved. Sum of squared fitting can be solved using just linear algebra. One of the most convincing use cases to me of linear programming is doing sum of absolute value fits \sum |f(x_i)-y_i|  and maximum deviation fits \max_i |f(x_i)-y_i|. These two quality of fits are basically just as tractable as least squares, which is pretty cool.

The trick to turning an absolute value into an LP is to look at the region above the graph of absolute value.

This region is defined by y \ge x and y \ge -x. So you introduce a new variable y. Then the LP \min y subject to those constraints will minimize the absolute value. For a sum of absolute values, introduce a variable y_i for each absolute value you have. Then minimize \sum_i y_i. If you want to do min max optimization, use the same y value for every absolute value function.

\min y

\forall i. -y \le x_i \le y

Let’s change topic a bit. Chebyshev polynomials are awesome. They are basically the polynomials you want to use in numerics.

Chebyshev polynomials are sines and cosines in disguise. They inherit tons of properties from them. One very important property is the equioscillation property. The Chebyshev polynomials are the polynomials that stay closest to zero while keeping the x^n coefficient nonzero (2^(n-2) by convention). They oscillate perfectly between -1 and 1 on the interval x \in [-1,1] just like sort of a stretched out sine. It turns out this equioscillation property defines the Chebyshev polynomials

We can approximate the Chebyshev polynomials via sampling many points between [-1,1]. Then we do min of the max absolute error optimization using linear programming. What we get out does approximate the Chebyshev polynomials.

 

 

Red is the actual Chebyshev polynomials and green is the solved for polynomials. It does a decent job. More samples will do even better, and if we picked the Chebyshev points it would be perfect.

Can we do better? Yes we can. Let’s go on a little optimization journey.

Semidefinite programming allows you to optimize matrix variables with the constraint that they have all positive eigenvalues. In a way it lets you add an infinite number of linear constraints. Another way of stating the eigenvalue constraint is that

\forall q. q^T X q \ge 0

You could sample a finite number of random q vectors and take the conjunction of all these constraints. Once you had enough, this is probably a pretty good approximation of the Semidefinite constraint. But semidefinite programming let’s you have an infinite number of the constraints in the sense that \forall q is referencing an infinite number of possible q , which is pretty remarkable.

Finite Sampling the qs has similarity to the previously discussed sampling method for absolute value minimization.

Sum of Squares optimization allows you to pick optimal polynomials with the constraint that they can be written as a sum of squares polynomials. In this form, the polynomials are manifestly positive everywhere. Sum of Squares programming is a perspective to take on Semidefinite programming. They are equivalent in power. You solve SOS programs under the hood by transforming them into semidefinite ones.

You can write a polynomial as a vector of coefficients \tilde{a}.

\tilde{x} = \begin{bmatrix} 1 \\ x \\ x^2 \\ x^3 \\ \vdots \end{bmatrix}

\tilde{a} = \begin{bmatrix} a_0 \\ a_1 \\ a_2 \\ a_3 \\ \vdots \end{bmatrix}

p(x)=\tilde{a}^T \tilde{x}

Instead we represent the polynomial with the matrix Q

p(x) = \tilde{x}^T Q \tilde{x}

If the matrix is positive semidefinite, then it can be diagonalized into the sum of squares form.

In all honesty, this all sounds quite esoteric, and it kind of is. I struggle to find problems to solve with this stuff. But here we are! We’ve got one! We’re gonna find the Chebyshev polynomials exactly by translating the previous method to SOS.

The formulation is a direct transcription of the above tricks.

\min t

-t \le p(x) \le t  by which I mean p(x) + t is SOS and t - p(x) is SOS.

There are a couple packages available for python already that already do SOS, .

ncpol2sdpa (https://ncpol2sdpa.readthedocs.io/en/stable/)

Irene (https://irene.readthedocs.io/en/latest/index.html)

SumofSquares.jl for Julia and SOSTools for Matlab. YalMip too I think. Instead of using those packages, I want to roll my own, like a doofus.

Sympy already has very useful polynomial manipulation functionality. What we’re going to do is form up the appropriate expressions by collecting powers of x, and then turn them into cvxpy expressions term by term. The transcription from sympy to cvxpy isn’t so bad, especially with a couple helper functions.

One annoying extra thing we have to do is known as the S-procedure. We don’t care about regions outside of x \in [-1,1]. We can specify this with a polynomial inequality (x+1)(x-1) \ge 0. If we multiply this polynomial by any manifestly positive polynomial (a SOS polynomial in particular will work), it will remain positive in the region we care about. We can then add this function into all of our SOS inequalities to make them easier to satisfy. This is very similar to a Lagrange multiplier procedure.

Now all of this seems reasonable. But it is not clear to me that we have the truly best polynomial in hand with this s-procedure business. But it seems to works out.

 

Ooooooh yeah. Those curves are so similar you can’t even see the difference. NICE. JUICY.

There are a couple interesting extension to this. We could find global under or over approximating polynomials. This might be nice for a verified compression of a big polynomial to smaller, simpler polynomials for example. We could also similar form the pointwise best approximation of any arbitrary polynomial f(x) rather than the constant 0 like we did above (replace -t \le p(x) \le t for -t \le p(x) - f(x) \le t in the above). Or perhaps we could use it to find a best polynomial fit for some differential equation according to a pointwise error.

I think we could also extend this method to minimizing the mean absolute value integral just like we did in the sampling case.

\min \int_0^1 t(x)dx

-t(x) \le p(x) \le t(x)

 

More references on Sum of Squares optimization:

http://www.mit.edu/~parrilo/

 

 

Reverse Mode Differentiation is Kind of Like a Lens II

For those looking for more on automatic differentiation in Haskell:

Ed Kmett’s ad package

http://hackage.haskell.org/package/ad

Conal Elliott is making the rounds with a new take on AD (GOOD STUFF).

http://conal.net/papers/essence-of-ad/

Justin Le has been making excellent posts and has another library he’s working on.

https://blog.jle.im/entry/introducing-the-backprop-library.html

 

And here we go:

Reverse mode automatic differentiation is kind of like a lens. Here is the type for a non-fancy lens

When you compose two lenses, you compose the getters (s -> a) and you compose the partially applied setter (b -> t) in the reverse direction.

We can define a type for a reverse mode differentiable function

When you compose two differentiable functions you compose the functions and you flip compose the Jacobian transpose (dy -> dx). It is this flip composition which gives reverse mode it’s name. The dependence of the Jacobian on the base point x corresponds to the dependence of the setter on the original object

The implementation of composition for Lens and AD are identical.

Both of these things are described by the same box diagram (cribbed from the profunctor optics paper www.cs.ox.ac.uk/people/jeremy.gibbons/publications/poptics.pdf ).

 

 

This is a very simple way of implementing a reserve mode automatic differentiation using only non exotic features of a functional programming language. Since it is so bare bones and functional, is this a good way to achieve the vision gorgeous post by Christoper Olah?  http://colah.github.io/posts/2015-09-NN-Types-FP/  I do not know.

Now, to be clear, these ARE NOT lenses. Please, I don’t want to cloud the water, do not call these lenses. They’re pseudolenses or something. A very important part of what makes a lens a lens is that it obeys the lens laws, in which the getter and setter behave as one would expect. Our “setter” is a functional representation of the Jacobian transpose and our getter is the function itself. These do not obey lens laws in general.

Chain Rule AND Jacobian

What is reverse mode differentiation? One’s thinking is muddled by defaulting to the Calc I perspective of one dimensional functions. Thinking is also muddled by  the general conception that the gradient is a vector. This is slightly sloppy talk and can lead to confusion. It definitely has confused me.

The right setting for intuition is R^n \rightarrow R^m functions

If one looks at a multidimensional to multidimensional function like this, you can form a matrix of partial derivatives known as the Jacobian. In the scalar to scalar case this is a 1\times 1 matrix, which we can think of as just a number. In the multi to scalar case this is a 1\times n matrix which we somewhat fuzzily can think of as a vector.

The chain rule is a beautiful thing. It is what makes differentiation so elegant and tractable.

For many-to-many functions, if you compose them you matrix multiply their Jacobians.

Just to throw in some category theory spice (who can resist), the chain rule is a functor between the category of differentiable functions and the category of vector spaces where composition is given by Jacobian multiplication. This is probably wholly unhelpful.

The cost of multiplication for an a \times b matrix A and an b \times c matrix B is O(abc) . If we have 3 matrices ABC, we can associate to the left or right. (AB)C vs A(BC) choosing which product to form first. These two associations have different cost, abc * acd for left association or abd * bcd for right association. We want to use the smallest dimension over and over. For functions that are ultimately many to scalar functions, that means we want to multiply starting at the right.

For a clearer explanation of the importance of the association, maybe this will help https://en.wikipedia.org/wiki/Matrix_chain_multiplication

 

Functional representations of matrices

A Matrix data type typically gives you full inspection of the elements. If you partially apply the matrix vector product function (!* :: Matrix -> Vector -> Vector) to a matrix m, you get a vector to vector function (!* m) :: Vector -> Vector. In the sense that a matrix is data representing a linear map, this type looks gorgeous. It is so evocative of purpose.

If all you want to do is multiply matrices or perform matrix vector products this is not a bad way to go. A function in Haskell is a thing that exposes only a single interface, the ability to be applied. Very often, the loss of Gaussian elimination or eigenvalue decompositions is quite painfully felt. For simple automatic differentiation, it isn’t so bad though.

You can inefficiently reconstitute a matrix from it’s functional form by applying it to a basis of vectors.

One weakness of the functional form is that the type does not constrain the function to actually act linearly on the vectors.

One big advantage of the functional form is that you can intermix different matrix types (sparse, low-rank, dense) with no friction, just so long as they all have some way of being applied to the same kind of vector. You can also use functions like (id :: a -> a) as the identity matrix, which are not built from any underlying matrix type at all.

To match the lens, we need to represent the Jacobian transpose as the function (dy -> dx) mapping differentials in the output space to differentials in the input space.

The Lens Trick

A lens is the combination of a getter (a function that grabs a piece out of a larger object) and a setter (a function that takes the object and a new piece and returns the object with that piece replaced).

The common form of lens used in Haskell doesn’t look like the above. It looks like this.

This form has exactly the same content as the previous form (A non obvious fact. See the Profunctor Optics paper above. Magic neato polymorphism stuff), with the added functionality of being able to compose using the regular Haskell (.) operator.

I think a good case can be made to NOT use the lens trick (do as I say, not as I do). It obfuscates sharing and obfuscates your code to the compiler (I assume the compiler optimizations have less understanding of polymorphic functor types than it does of tuples and functions), meaning the compiler has less opportunity to help you out. But it is also pretty cool. So… I dunno. Edit:

/u/mstksg points out that compilers actually LOVE the van Laarhoven representation (the lens trick) because when f is finally specialized it is a newtype wrappers which have no runtime cost. Then the compiler can just chew the thing apart.

https://www.reddit.com/r/haskell/comments/9oc9dq/reverse_mode_differentiation_is_kind_of_like_a/

One thing that is extra scary about the fancy form is that it makes it less clear how much data is likely to be shared between the forward and backward pass. Another alternative to the lens that shows this is the following.

This form is again the same in end result. However it cannot share computation and therefore isn’t the same performance wise. One nontrivial function that took me some head scratching is how to convert from the fancy lens directly to the regular lens without destroying sharing. I think this does it

 

Some code

I have some small exploration of the concept in this git https://github.com/philzook58/ad-lens

Again, really check out Conal Elliott’s AD paper and enjoy the many, many apostrophes to follow.

Some basic definitions and transformations between fancy and non fancy lenses. Extracting the gradient is similar to the set function. Gradient assumes a many to one function and it applies it to 1.

Basic 1D functions and arrow/categorical combinators

Some List based stuff.

And some functionality from HMatrix

In practice, I don’t think this is a very ergonomic approach without something like Conal Elliott’s Compiling to Categories plugin. You have to program in a point-free arrow style (inspired very directly by Conal’s above AD paper) which is pretty nasty IMO. The neural network code here is inscrutable. It is only a three layer neural network.