The Beauty of the Cone: How Convex Cones Simplify Convex Programming

I watched the Stephen Boyd course to get me started in convex programming. At the beginning, he spends some time talking about convex sets rather than launching in convex optimization. I did not appreciate this sufficiently on the first pass. Convex sets are a very geometric topic and I think that for the most part, convex functions are best thought as a special case of them. The epigraph of a scalar valued convex function on R^d , the filled in area above a graph, is a d+1 dimensional convex set. Convex constraints on the domain can be thought of as further cutting this shape. Finding the minimum of the shape can be thought of as a geometrical problem of finding the furthest point in the -y direction.

There is another mathematical topic that I did not appreciate for how powerful and clean it is. If you check out this textbook by Fenchel, he starts with the topic of convex cones rather than sets, I now realize for good reason.

I was sketching out a programmatic representation of convex sets and was annoyed at how ugly things were turning out. First off, infinity is a huge problem. Many convex problems have infinite answers.

The simplest problem is \max_x c^T x with no constraints. The answer plunges off to infinity vaguely in the direction of c. The next simplest problem is \max_x c^T x , a^T x \geq 0. This either goes off to infinity away from the constraint plane, hits the constraint plane and goes off to infinity, or if c and a are parallel, it is an arbitrary location on the constraint plane.

In short, the very most simple convex problems have infinite answers. You actually need to have a fairly complex problem with many conditions before you can guarantee a finite answer. Once we have a bounded LP, or a positive definite quadratic problem do we start to guarantee boundedness.

In order to work with these problems, it is helpful (necessary?) to compactify your space. There are a couple options here. One is to arbitrarily make a box cutoff. If we limit ourselves to an arbitrary box of length 1e30, then every answer that came back as infinite before is now finite, albeit huge. This makes me queasy though. It is ad hoc, actually kind of annoying to program all the corner cases, and very likely to have numerical issues. Another possibility is to extend your space with rays. Rays are thought of as points at infinity. Now any optimization problem that has an infinite answer returns the ray in the direction the thing goes of to infinity at. It is also annoying to make every function work with either rays or points though.

Another slightly less bothersome aesthetic problem is that the natural representation of half spaces is a normal ray and offset a^T x \geq b The principles of duality make one want to make this object as similar to our representation of points as possible, but it has 1-extra dimension and 1 arbitrary degree of freedom (scalar multiplying a and b by the same positive constant does not change the geometrical half space described). This is ugly.

In the field of projective geometry, there is a very beautiful thing that arises. In projective geometry, all scalar multiples of a ray are considered the same thing. This ray is considered a “point”. The reason this makes sense is that projective geometry is a model of perspective and cameras. Two points are completely equivalent from the perspective of a pinhole camera if they lie on the same ray connecting to the pinhole. This ray continues inside the camera and hits the photographic screen. Hence points on the 2D screen correspond to rays in 3D space. It makes elegant sense to consider the pinhole to be the origin or your space, and hence you come to the previous abstract definition. Points at infinity in 3D (like stars effectively) are not a problem since they lie on finitely describable rays. Points at infinite edge of the 2D screen are not really a problem either. Perfectly reasonable points in 3D can map to the infinite edge of the screen in principle. Someone standing perfectly to the side of the pinhole in 3d space has a ray that goes perfectly horizontally into the camera, and in a sense would only hit a hypothetical infinite screen at infinity.

A great many wonderful (and practical!) things fall out of the projective homogenous coordinates. They are ubiquitous in computer graphics, computer vision, and robotics. The mathematical language describing translations and rotations is unified. Both can be described using a single matrix. This is not the intention, but it is a pleasant surprise. Other geometrical questions become simple questions of linear or vector algebra. It is very cool.

Can we use this method for describing the space we want to find convex sets in? I think not. Unfortunately, the topology of projective space is goofy. At the very least in 2D projective space, which can be thought of as a sphere with opposite points identified, do not necessarily have an inside and outside (I’m questioning this idea now)? So convex sets and talking about maximal half planes and such seems questionable.

But I think we can fix it. Cones are good. In a slight twist on the projective geometry idea, what if you only non negative multiples of rays \lambda \geq 0 as the same “point”. You can take as a canonical plane x_0 =1 similar to the pinhole camera. This plane can be thought of as your more ordinary affine space. Now half spaces touching the origin (cones) correspond to affine half spaces. We have a reasonable way of describing points at infinity on this plane, which correspond to rays. Arbitrary convex sets on this plane correspond to cones of rays.

Cones in this context are sets closed under arbitrary non-negative sums of points within them. Hence a cone always includes the origin. Cones are basically convex sets of rays.

By adding in an arbtrary-ish degree of freedom to points, we bring points and half spaces much closer in alignment. Now evaluating whether a point in a half space looks like a^T x \geq 0 with no ugly extra b.

So in closing, as convex sets are kind of a cleaner version of convex functions, so are convex cones a cleaner version of convex sets. This is actually useful, since when you’re programming, you’ll have to deal with way less corner infinite cases. The theory is also more symmetrical and beautiful

Lens as a Divisibility Relation: Goofin’ Off With the Algebra of Types

Types have an algebra very analogous to the algebra of ordinary numbers (video). This is the basic table of correspondences. Code with all the extensions available here.

One way to see that this makes sense is by counting the cardinality of types built out of these combinators. Unit is the type with 1 inhabitant. Void has 0 inhabitants. If a has n and b has m possible values, then Either a b has n + m inhabitants, (a,b) has n*m and there are n^m possible tabulations of a->b. We’re gonna stick to just polynomials for the rest of this, ignoring a->b.

Another way of looking at this is if two finitely inhabited types have the same number of inhabitants, then the types can be put into an isomorphism with each other. In other words, types modulo isomorphisms can be thought as representing the natural numbers. Because of this, we can build a curious proof system for the natural numbers using ordinary type manipulation.

In addition, we also get a natural way of expressing and manipulating polynomials.Polymorphic types can be seen as being very similar to polynomial expressions with natural coefficients N[x]. The polymorphic type variables have the ability to be instantiated to any value, corresponding to evaluating the polynomial for some number.

The Lens ecosystem gives some interesting combinators for manipulating this algebra. The type Iso' a b contains isomorphisms. Since we’re only considering types up to isomorphism, this Iso' represents equality. We can give identity isomorphisms, compose isomorphisms and reverse isomorphisms.

We can already form a very simple proof.

Now we’ll add some more combinators, basically the axioms that the types mod isos are a commutative semiring. Semirings have an addition and multiplication operator that distribute over each other. It is interesting to note that I believe all of these Iso' actually are guaranteed to be isomorphisms ( to . from = id and from . to = id ) because of parametricity. from and to are unique ignoring any issues with bottoms because the polymorphic type signature is so constraining. This is not usually guaranteed to be true in Haskell just from saying it is an Iso'. If I give you an Iso' Bool Bool it might actually be the iso (const True) (const True) for example, which is not an isomorphism.

There are also combinators for lifting isomorphisms into bifunctors: firsting, seconding, and bimapping. These are important for indexing into subexpressions of our types in a point-free style.

Here is a slightly more complicated proof now available to us.

We can attempt a more interesting and difficult proof. I developed this iteratively using . _ hole expressions so that GHC would tell me what I had manipulated my type to at that point in my proof.

Artwork Courtesy of David. Sorry for any motion sickness.

The proof here is actually pretty trivial and can be completely automated away. We’ll get to that later.

If Iso' is equality, what about the other members of the Lens family? Justin Le says that Lens s a are witness to the isomorphism of a type s to the tuple of something and a. Prism witness a similar thing for sums. Once we are only considering types mod isos, if you think about it, these are expressions of two familiar relations on the natural numbers: the inequality relation and the divisibility relation

Mathematically, these relations can be composed with equalities, just like in the lens hierarchy Lens and Prism can be composed with Iso. Both form a category, since they both have id and (.).

Here are a couple identities that we can’t derive from these basic combinators. There are probably others. Woah-ho, my bad. These are totally derivable using id_mul, id_plus, mul_zero, _1, _2, _Left, _Right.

Pretty neat! Random thoughts and questions before we get into the slog of automation:

  • Traversal is the “is polynomial in” relation, which seems a rather weak statement on it’s own.
  • Implementing automatic polynomial division is totally possible and interesting
  • What is the deal with infinite types like [a]? Fix. I suppose this is a theory of formal power series / combinatorial species. Fun combinatorics, generatingfunctionology. Brent Yorgey did his dissertation on this stuff. Wow. I’ve never really read this. It is way more relevant than I realized.
  • Multivariate polynomial algorithms would also be interesting to explore (Grobner basis, multivariate division)
  • Derivatives of types and zippers – Conor McBride
  • Negative Numbers and Fractions?
  • Lifting to rank-1 types. Define Negative and Fractions via Galois connection?

Edit: /u/carette (wonder who that is šŸ˜‰ ) writes:

“You should dig into
J Carette, A Sabry Computing with semirings and weak rig groupoids, in Proceedings of ESOP 2016, p. 123-148. Agda code in https://github.com/JacquesCarette/pi-dual/tree/master/Univalence. A lot of the algebra you develop is there too.

If you hunt around in my repos, you’ll also find things about lenses, exploring some of the same things you mention here.”

Similar ideas taken further and with more sophistication. Very interesting. Check it out.

Automation

Our factor example above was quite painful, yet the theorem was exceedingly obvious by expansion of the left and right sides. Can we automate that? Yes we can!

Here’s the battle plan:

  • Distribute out all expressions like a*(b+c) so that all multiplication nodes appear at the bottom of the tree.
  • Reduce the expression by absorbing all stupid a*1, a*0, a+0 terms.
  • Reassociate everything to the right, giving a list like format
  • Sort the multiplicative terms by power of the variable

Once we have these operations, we’ll combine them into a canonical operation. From there, most equality proofs can be performed using the rewrite operation, which merely puts both sides into canonical form

Once we have those, the painful theorem above and harder ones becomes trivial.

Now we’ll build the Typeclasses necessary to achieve each of these aims in order. The Typeclass system is perfect for what we want to do, as it builds terms by inspecting types. It isn’t perfect in the sense that typeclass pattern matching needs to be tricked into doing what we need. I have traded in cleverness and elegance with verbosity.

In order to make our lives easier, we’ll need to tag every variable name with a newtype wrapper. Otherwise we won’t know when we’ve hit a leaf node that is a variable. I’ve used this trick before here in an early version of my faking Compiling to Categories series. These wrappers are easily automatically stripped.

A common pattern I exploit is to use a type family to drive complicated recursion. Closed type families allow more overlap and default patterns which is very useful for programming. However, type families do not carry values, so we need to flip flop between the typeclass and type family system to achieve our ends.

Here is the implementation of the distributor Dist. We make RDist and LDist typeclasses that make a sweep of the entire tree, using ldist and rdist as makes sense. It was convenient to separate these into two classes for my mental sanity. I am not convinced even now that I have every case. Then the master control class Dist runs these classes until any node that has a (*) in it has no nodes with (+) underneath, as checked by the HasPlus type family.

Next is the Absorb type class. It is arranged somewhat similarly to the above. Greedily absorb, and keep doing it until no absorptions are left. I think that works.

The Associators are a little simpler. You basically just look for the wrong association pattern and call plus_assoc or mul_assoc until they don’t occur anymore, then recurse. We can be assured we’re always making progress if we either switch some association structure or recurse into subparts.

Finally, the SortTerm routine. SortTerm is a bubble sort. The typeclass Bubble does a single sweep of swapping down the type level list-like structure we’ve built. The SortTerm uses the Sorted type family to check if it is finished. If it isn’t, it call Bubble again.

Hope you thought this was neat!

Giving the Mostly Printed CNC a try (MPCNC)

Declan had the good idea to make a CNC machine. There is a popular plan available here

https://www.v1engineering.com/specifications/

A Doge

The really cute part of it is using electrical conduit as rails, which are shockingly inexpensive. Like a couple bucks for 4 feet! Holy shnikes!

We’ve been printing up a storm for the last couple weeks. A ton of parts!

We already had a lot of motors and stuff lying around. Declan bought a lot of stuff too just for this. Assorted bearings and bolts. The plans have a bill of materials.

Repetier host seemed to work pretty well for controlling the board

Used the RAMPS branch of the mpcnc marlin repo

Edited the header files as described in this post so that we could use both extruders as extra x and y motor drivers. It did not seem like driving two motors from the same driver board was acceptable. Our bearings are gripping the rails a little too tight. It is tough to move.

Some useful links on the thingiverse version of the mpccnc https://www.thingiverse.com/thing:724999

He suggests using this program http://www.estlcam.com/ Seem like windows only? ugh.

The mpcnc plans don’t contain actual tool mounts but here are some examples

A pen holder: https://www.thingiverse.com/thing:1612207/comments

A dewalt mount: https://www.thingiverse.com/thing:944952

This is an interesting web based g-code maker. Ultimately a little to janky though. It works good enough to get started http://jscut.org/jscut.html . Not entirely clear what pocket vs interior vs whatever is. engrave sort of seemed like what I wanted. Went into inkscape with a reasonable png and traced bitmapped it, then object to path. It’s also nice to just find an svg on the internet

The following code was needed to zero repetier and the RAMPS at the touch off point. We added it as a macro. It is doing some confusing behavior though.

pycam is the best I can find for 3d machining. Haven’t actually tried it yet

http://pycam.sourceforge.net/getting-started/

We should probably upgrade the thing to have limit switches. It pains me every time we slam it into the edge.

All in all, a very satisfying project. Hope we build something cool with it.

A horsie

Chile: Nice place

Just got back from Chile from a vacation visiting Will. Nice place.

We took a leisurely amount of time getting to Torres del Paine, which was the main feature of our trip. We travelled through Santiago, Punta Arenas and Puerto Natales. We spent a very tired day in the children’s science museum and rode the funicular. There wasn’t that much to do in the latter two cities, maybe we could have shaved some time from them. Our hostel in Punta Arenas was notably ramshackle. We spent 5 days backpacking in the park. Absolutely gorgeous. The wind on the second day was like nothing I’ve ever experienced. I was a little concerned about staying on my feet. Hiking poles for the win. Day 4 was cold and wet and miserable, but it ended up ok in the end. We were able to get a spot in a refugio when it just was too overwhelming to try to set up our tents in a flooded muddy campsite on that day. I think Beth in particular was at the end of her rope. I basically didn’t poop the entire first week I was there, but one glorious day on the mountain the heavens parted for me, and I was fine from then on. I didn’t quite pack food right. There ended up being camp stores at most of the places we stayed, but if I hadn’t been able to re up on cookies it would have been a lean couple days food wise. Ramen Bombs for the win. We drank right from the streams, which is unusual for us. Usually we filter.

All told we did ok on food. I really like the al pobre thing. What isn’t to like a about steak, onions, and eggs on fries? Chileans seem a little eager on the mayo. Nobody does breakfast right except the US. The street food was good. I love the fried tortilla thing that you can just slather salsa on. It was like 30 cents. The empanadas were also pretty great cheap food. Ceviche was also very tasty. They toss out avocado like it’s nuthin down there. Sandwiches were kind of shitty. Don’t know if that is entirely fair, but that is how it felt. Highlight meal of the trip was at Cafe Artimana in Puerto Natales. Yeah, I got some al pobre. But also basil lemonade stuff

After the hiking, we scheduled an early return to Santiago rather than busting our asses to a glacier viewpoint. In the airport at Punta Arenas, we got the southernmost dominos in the entire world. Ben Will and Declan went on a taxi quest to go get it. Wandered around Santiago, saw some churches and cathedrals, a fort, ate churros, etc. Declan was on a quest for a neck pillow. We did a prison themed Escape Room. People felt like we got a little cheated because some of the puzzles felt like bullshit? I think they really expect to break room records. I suck at escape rooms. We were able to spend a day in Valparaiso, which had a super awesome street art scene.

I spent the last day puking my guts out. So it goes. Not sure how exactly. The street sausage may have put me over the top. I guess I’m a sensitive fellow? I get pretty consistently unwell on trips.

Chile has tons of fluffy street dogs. They’re pretty friendly, although they do chase cars and motorcycles. Idiots.

Chile has a way lower english quotient than other trips I made. I’ve been surprised how common at least some english has been in Europe and Asia, and was now equally surprised how little there was in Chile. It makes sense. A lot of the continent is spanish speaking. It was really useful to have Will around, who has gotten shockingly good at Spanish from an outsiders perspective.

SUCH MEMORIES

Declan’s post on the trip.

tough day

Wait, where are all my BOBBY PICS!?!

o there u r u cutie

overwhelmed

A strange place named Andre’s
boys
beth

Proving Addition is Commutative in Haskell using Singletons

Yesterday morning, I was struck with awe at how amazingly cool the dependently typed approach (Agda, Idris, Coq) to proving programs is. It is great that I can still feel that after tinkering around with it for a couple years. It could feel old hat.

In celebration of that emotion, I decided to write a little introductory blog post about how to do some proving like that in Haskell. Haskell can replicate this ability to prove with varying amounts of pain. For the record, there isn’t anything novel in what follows.

One methodology for replicating the feel of dependent types is to use the singletons pattern. The singleton pattern is a way of building a data type so that there is an exact copy of the value of a variable in its type.

For future reference, I have on the following extensions, some of which I’ll mention when they come up.

Here’s how the pattern goes.

Step 1: define your ordinary data type. Bool is already defined in the Prelude, but here is what it looks like

Step 2: turn on the DataKinds extension. This automatically promotes any data type constructor like True or False or Just into types that have apostrophes in front of them 'True, 'False, 'Just. This is mostly a convenience. You could have manually defined something similar like so

Step 3: Define your singleton datatype. The singleton datatype is a GADT (generalized abstract data type). GADT declarations take on a new syntax. It helps to realize that ordinary type constructors like Just are just functions. You can use them anywhere you can use functions. Just has the type a -> Maybe a. It might help to show that you can define a lower case synonym.

just is clearly just a regular function. What makes constructors a special thing (not quite “just functions”) is that you can also pattern match on them. Data constructors are functions that “do nothing”. They hold the data until eventually you get it back again via pattern matching.

So why don’t you write the type signature of the constructors when you define them? Why does using a data statement look so different than a function definition? The GADT syntax brings the two concepts visually closer.

Letting you define the type signature for the constructor let’s you define a constrained type signature, rather than the inferred most general one. It’s similar to the following situation. If you define an identity function id, it has the polymorphic type a -> a. However, you can explicitly constrain the function with an identical implementation. If you try to use boolid on an Int it is a type error.

The GADT syntax let’s you constrain what the type signature of the constructor is, but also very interestingly, let’s the type checker infer information when you pattern match into the GADT.

With all that spiel, here is the singleton type for Bool as a GADT.

You have made an exact copy of the value at the type level. When you pattern match into a variable x of type SBool s in the STrue branch, it knows that s ~ 'True and in the SFalse branch it knows that s ~ 'False.

Here’s the analogous construction for a Peano natural number

Now let’s talk about programming.

Addition is straightforward to define for our Nat.

Let’s replicate this definition at the type level. The extension we’ll want is TypeFamilies. Type families enables a syntax and feature for defining functions on types very similarly to how you define regular functions.

Now finally, we can exactly mirror this definition on singletons

In the type signature SNat is kind of like a weirdo forall. It is a binding form that generates a new type variable you need to express the typelevel connections you want. The type variable n is a typelevel thing that represents the value.

Now let’s talk about proving. Basically, if you’re intent is proving things, I think it is simplest if you forget that the original data type ever existed. It is just a vestigial appendix that makes the DataKinds you need. Only work with singletons. You will then need to make a safe layer translating into and out of the singletons if you want to interface with non-singleton code.

We’re going to want to prove something about equality. The standard definition of equality is

I put the 1 there just so I wouldn’t clash with the Eq typeclass. It’s ugly, sorry. You can find an identical definition in base http://hackage.haskell.org/package/base-4.12.0.0/docs/Data-Type-Equality.html that uses the extension TypeOperators for a much cleaner syntax.

Why is this a reasonable equality? You can construct using Refl only when you are just showing that a equals itself. When you pattern match on Refl, the type checker is learning that a ~ b. It’s confusing. Let’s just try using it.

We can prove a couple facts about equality. First off that it is a symmetric relation. If a = b then b = a.

When we pattern match and expose the incoming Refl, the type checker learns that a ~ b in this branch of the pattern match. Now we need to return an Eq1 b a. But we know that a ~ b so this is the same as an Eq1 a a. Well, we can easily do that with a Refl.

Similarly we can prove the transitivity of equality.

Pattern matching on the first equality tells the type checker that a ~ b, the second that b ~ c. Now we need to return a Eq1 a c but this is the same as Eq1 a a because of the ~ we have, so Refl suffices.

It’s all damn clever. I wouldn’t have come up with it.

Now let’s start proving things about our addition operator. Can we prove that

This one is straightforward since obviously 'Zero is 'Zero. How about something slightly more complicated 1 + 0 = 1.

The typechecker can evaluate addition on concrete numbers to confirm this all works out.

Here’s a much more interesting property \forall x. 0 + x = x

This one is also straightforward to prove. Looking at the definition of NPlus knowing that the first argument is 'Zero is enough to evaluate forward.

Here’s our first toughy. \forall x. x + 0 = x This may seem silly, but our definition of NPlus did not treat the two arguments symmetrically. it only pattern match on the first. Until we know more about x, we can’t continue. So how do we learn more? By pattern matching and looking at the possible cases of x.

The first case is very concrete and easy to prove. The second case is more subtle. We learn that x ~ 'Succ x1 for some x1 when we exposed the SSucc constructor. Hence we now need to prove Eq1 (NPlus ('Succ x1) 'Zero) ('Succ x1). The system now has enough information to evaluate NPlus a bit, so actually we need Eq1 ('Succ (NPlus x1 'Zero)) ('Succ x1). The term (NPlus x1 'Zero) looks very similar to what we were trying to prove in the first case. We can use a recursive call to get an equality proof that we pattern match to a Refl to learn that(NPlus x1 'Zero) ~ x1 which will then make the required result Eq1 ('Succ x1) ('Succ x1) which is obviously true via Refl. I learned a neat-o syntax for this second pattern match, called pattern guards. Another way of writing the same thing is

Follow all that? Haskell is not as friendly a place to do this as Idris or Agda is.

Now finally, the piece de resistance, the commutativity of addition, which works in a similar but more complicated way.


A question: to what degree does this prove that nplus or snplus are commutative? The linkage is not perfect. snplus is type constrained to return the same result as NPlus which feels ok. nplus is syntactically identical to the implementation of the other two, but that is the only link, there is nothing compiler enforced.

The existence of non-termination in Haskell also makes everything done here much less fool-proof. It wouldn’t be all that hard to accidentally make a recursive call in one of our proofs that is non terminating and the compiler would accept it and say nothing.

I recommend you check out the links below for more.

Source available here https://github.com/philzook58/singleberg

Resources:

https://github.com/RyanGlScott/ghc-software-foundations

http://hackage.haskell.org/package/singletons

https://blog.jle.im/entry/introduction-to-singletons-1.html

http://hackage.haskell.org/package/singleton-nats

Casadi – Pretty Damn Slick

Casadi is something I’ve been aware of and not really explored much. It is a C++ / python / matlab library for modelling optimization problems for optimal control with bindings to IPOpt and other solvers. It can produce C code and has differentiation stuff. See below for some examples after I ramble.

I’ve enjoyed cvxpy, but cvxpy is designed specifically for convex problems, of which many control problems are not.

Casadi gives you a nonlinear modelling language and easy access to IPOpt, an interior point solver that works pretty good (along with some other solvers, many of which are proprietary however).

While the documentation visually looks very slick I actually found it rather confusing in contents at first. I’m not sure why. Something is off.

You should download the “example pack” folder. Why they don’t have these in html on the webpage is insane to me. https://github.com/casadi/casadi/releases/download/3.4.4/casadi-example_pack-v3.4.4.zip

It also has a bunch of helper classes for DAE building and other things. They honestly really put me off. The documentation is confusing enough that I am not convinced they give you much.

The integrator classes give you access to external smart ode solvers from the Sundials suite. They give you good methods for difficult odes and dae (differential algebraic equations, which are ODEs with weird constraints like x^1 + y^1 == 1) Not clear to me if you can plug those in to an optimization, other than by a shooting method.

Casadi can also output C which is pretty cool.

I kind of wondered about Casadi vs Sympy. Sympy has lot’s of general purpose symbolic abilities. Symbolic solving, polynomial smarts, even some differential equation understanding. There might be big dividends to using it. But it is a little harder to get going. I feel like there is an empty space for a mathemtical modelling language that uses sympy as it’s underlying representation. I guess monkey patching sympy expressions into casadi expression might not be so hard. Sympy can also output fast C code. Sympy doesn’t really have any support for sparseness that I know of.

As a side note, It can be useful to put these other languages into numpy if you need extended reshaping abilities. The other languages often stop at matrices, which is odd to me.

Hmm. Casadi actually does have access to mixed integer programs via bonmin (and commercial solvers). That’s interesting. Check out lotka volterra minlp example

https://groups.google.com/forum/#!topic/casadi-users/8xCHmP7UmpI

The optim interface makes some of this look better. optim.minimize and subject_to. Yeah, this is more similar to the interfaces I’m used to. It avoids the manual unpacking of the solution and the funky feel of making everything into implicit == 0 expressions.

https://web.casadi.org/blog/opti/

Here is a simple harmonic oscillator example using the more raw casadi interface. x is positive, v is velocity, u is a control force. I’m using a very basic leap frog integration. You tend to have to stack things into a single vector with vertcat when building the final problem.

Let’s use the opti interface, which is pretty slick. Here is a basic cartpole https://web.casadi.org/blog/opti/

Very fast. Very impressive. Relatively readable code. I busted this out in like 15 minutes. IPopt solves the thing in the blink of an eye (about 0.05s self reported). Might be even faster if I warm start it with a good solution, as I would in online control (which may be feasible at this speed). Can add the initial condition as a parameter to the problem

I should try this on an openai gym.

Haskell has bindings to casadi.

http://hackage.haskell.org/package/dynobud

Thoughts on Faking Some of GADTs in Rust

I’m a guy who is somewhat familiar with Haskell who is trying to learn Rust. So I thought I’d try to replicate some cool Haskell functionality in Rust. I would love to hear comments, because I’m trying to learn. I have no sense of Rust aesthetics yet and in particular I have no idea how this interacts with the borrow system. What follows is a pretty rough brain dump

GADTs (Generalized algebraic data types) are an extension in Haskell that allows you to write constrained type signatures for your data constructors. They also change how the type checking of pattern matching is processed.

GADTs are sometimes described/faked by being built by making data types that hold equality/unification constraints. Equality constraints in Haskell like a ~ Int are fairly magical and the Rust compiler does not support them in an obvious way. Maybe this is the next project. Figure out how to fake ’em if one can. I don’t think this is promising though, because faking them will be a little wonky, and then GADTs are a little wonky on top of that. See https://docs.rs/refl/0.1.2/refl/ So we’ll go another (related) road.

This is roughly what GADTs look like in Haskell.

And here is one style of encoding using smart constructors and a typeclass for elimination (pattern matching is replicated as a function that takes callbacks for the data held in the different cases). Regular functions can have a restricted type signature than the most general one their implementation implies. The reason to use a typeclass is so that we can write the eliminator as returning the same type that the GADT supplies. There isn’t an explicit equality constraint. A kind of Leibnitz equality

is hiding in the eliminator. The Leibnitz equality can be used in place of (~) constraints at some manual cost. http://code.slipthrough.net/2016/08/10/approximating-gadts-in-purescript/

https://jesper.sikanda.be/files/leibniz-equality.pdf

The

is a problem for Rust. Rust does not have higher kinded types, although they can be faked to some degree. https://gist.github.com/CMCDragonkai/a5638f50c87d49f815b8 There are murmurs of Associated Type Constructors / GATs , whatever those are , that help ease the pain, but I’m pretty sure they are not implemented anywhere yet.

I’m going to do something related, a defunctionalization of the higher kinded types. We make an application trait, that will apply the given type function tag to the argument. What I’m doing is very similar to what happens in the singletons library, so we may be getting some things for free.

https://typesandkinds.wordpress.com/2013/04/01/defunctionalization-for-the-win/

Then in order to define a new typelevel function rip out a quick tag type and an App impl.

It might be possible to sugar this up with a macro. It may also be possible to write typelevel functions in a point free style without defining new function tag names. The combinators Id, Comp, Par, Fst, Snd, Dup, Const are all reasonably definable and fairly clear for small functions. Also the combinator S if you want to talk SKI combinatory calculus, which is unfit for humans. https://en.wikipedia.org/wiki/SKI_combinator_calculus For currying, I used a number for how many arguments are left to be applied (I’m not sure I’ve been entirely consistent with these numbers). You need to do currying quite manually. It may be better to work with tuplized arguments

Anyway, the following is a translation of the above Haskell (well, I didn’t wrap an actual i64 or bool in there but I could have I think). You need to hide the actual constructors labeled INTERNAL in a user inaccessible module.

The smart constructors put the right type in the parameter spot

Then pattern matching is a custom trait per gadtified type. Is it possible to unify the different elimination traits that will come up into a single Elim trait? I’m 50-50 about whether this is possible. What we’re doing is a kind of fancy map_or_else if that helps you.

https://doc.rust-lang.org/std/option/enum.Option.html#method.map_or_else

Usage. You have to explicitly pass the return type function to the eliminator. No inference is done for you. It’s like Coq’s match but worse. BTW the dbg! macro is the greatest thing on earth. Well done, Rust.

You can make helpers that don’t require explicit types to be given

One could also make an Eq a b type with Refl similarly. Then we need typelevel function tags that take two type parameter. Which, with currying or tupling, we may already have.

Questions:

Is this even good? Or is it a road of nightmares? Is this even emulating GADTs or am I just playing phantom type games?

We aren’t at full gadt. We don’t have existential types. Rust has some kind of existential story evolving (already there?), but the state of it is confusing to me. Something to play with. Higher rank functions would help?

Are overlapping typeclasses a problem in Rust?

Again, I have given nearly zero thought to borrowing and how it interacts with this. I’m a Rust n00b. I should think about it. Different eliminators based on whether you own or are borrowing?.

How much of singleton style dependent types do we get from this? It feels like we have already paid the cost of defunctionalizing. http://hackage.haskell.org/package/singletons

My current playground for this is at https://github.com/philzook58/typo

Cvxpy and NetworkX Flow Problems

Networkx outputs scipy sparse incidence matrices

https://networkx.github.io/documentation/networkx-1.9/reference/generated/networkx.linalg.graphmatrix.incidence_matrix.html#networkx.linalg.graphmatrix.incidence_matrix

https://docs.scipy.org/doc/scipy/reference/sparse.html

Networkx also has it’s own flow solvers, but cvxpy gives you some interesting flexibility, like turning the problem mixed integer, quadratic terms, and other goodies. Plus it is very easy to get going as you’ll see.

So here’s a basic example of putting these two together. Very straightforward and cool.

Here was a cool visual from a multi commodity flow problem (nx.draw_networkx_edges)

Nice, huh.

A Little Bloop on Typed Template Haskell

I’ve found looking at my statistics that short, to the point, no bull crap posts are the most read and probably most useful.

I’ve been tinkering around with typed template Haskell, which a cursory internet search doesn’t make obvious even exists. The first thing to come up is a GHC implementation wiki that seems like it might be stalled in some development branch. No. Typed template Haskell is already in GHC since GHC 7.8. And the first thing that comes up on a template haskell google search is the style of Template Haskell where you get deep into the guts of the syntax tree, which is much less fun.

Here’s a basic addition interpreter example.

The typed splice $$ takes a out of TExpQ a. The typed quote [|| ||] puts it in. I find that you tend to be able to follow the types to figure out what goes where. If you’re returning a TExpQ, you probably need to start a quote. If you need to recurse, you often need to splice. Etc.

You tend to have to put the actual use of the splice in a different file. GHC will complain otherwise.

At the top of your file put this to have the template haskell splices dumped into a file

Or have your package.yaml look something like this

If you’re using stack, you need to dive into .stack/dist/x86/Cabal/build and then /src or into the executable folder something-exe-tmp/app to find .dump-splices files.

Nice. I don’t know whether GHC might have optimized this thing down anyhow or not, but we have helped the process along.

Some more examples: unrolling the recursion on a power function (a classic)

You can unroll a fibonacci calculation

This is blowing up in calculation though (no memoization, you silly head). We can implement sharing in the code using let expression (adapted from https://wiki.haskell.org/The_Fibonacci_sequence). Something to think about.

Tinkering around, you’ll occasionally find GHC can’t splice and quote certain things. This is related to cross stage persistence and lifting, which are confusing to me. You should look in the links below for more info. I hope I’ll eventually get a feel for it.

If you want to see more of my fiddling in context to figure out how to get stuff to compile here’s my github that I’m playing around in https://github.com/philzook58/staged-fun

Useful Link Dump:

Simon Peyton Jones has a very useful talk slides. Extremely useful https://www.cl.cam.ac.uk/events/metaprog2016/Template-Haskell-Aug16.pptx

Matthew Pickering’s post keyed me into that Typed Template Haskell is even there.

 http://mpickering.github.io/posts/2019-02-14-stage-3.html

MuniHac 2018: Keynote: Beautiful Template Haskell

https://markkarpov.com/tutorial/th.html Mark Karpov has a useful Template Haskell tutorial

Oleg Kiselyov: I’m still trying to unpack the many things that are going on here. Oleg’s site and papers are a very rich resource.

I don’t think that staged metaprogramming requires finally tagless style, as I first thought upon reading some of his articles. It is just very useful.

http://okmij.org/ftp/meta-programming/

http://okmij.org/ftp/meta-programming/tutorial/

Typed Template Haskell is still unsound (at least with the full power of the Q templating monad) http://okmij.org/ftp/meta-programming/#TExp-problem

You can write things in a finally tagless style such that you can write it once and have it interpreted as staged or in other ways. There is a way to also have a subclass of effects (Applicatives basically?) less powerful than Q that is sound.

A Touch of Topological Quantum Computation 3: Categorical Interlude

Welcome back, friend.

In the last two posts, I described the basics of how to build and manipulate the Fibonacci anyon vector space in Haskell.

As a personal anecdote, trying to understand the category theory behind the theory of anyons is one of the reasons I started learning Haskell. These spaces are typically described using the terminology of category theory. I found it very frustrating that anyons were described in an abstract and confusing terminology. I really wondered if people were just making things harder than they have to be. I think Haskell is a perfect playground to clarify these constructions. While the category theory stuff isn’t strictly necessary, it is interesting and useful once you get past the frustration.

Unfortunately, I can’t claim that this article is going to be enough to take you from zero to categorical godhood

but I hope everyone can get something out of it. Give it a shot if you’re interested, and don’t sweat the details.

The Aroma of Categories

I think Steve Awodey gives an excellent nutshell of category theory in the introductory section to his book:

“What is category theory? As a first approximation, one could say that category theory is the mathematical study of (abstract) algebras of functions. Just as group theory is the abstraction of the idea of a system of permutations of a set or symmetries of a geometric object, so category theory arises from the idea of a system of functions among some objects.”

For my intuition, a category is any “things” that plug together. The “in” of a thing has to match the “out” of another thing in order to hook them together. In other words, the requirement for something to be a category is having a notion of composition. The things you plug together are called the morphisms of the category and the matching ports are the objects of the category. The additional requirement of always having an identity morphism (a do-nothing connection wire) is usually there once you have composition, although it is good to take especial note of it.

Category theory is an elegant framework for how to think about these composing things in a mathematical way. In my experience, thinking in these terms leads to good abstractions, and useful analogies between disparate things.

It is helpful for any abstract concept to list some examples to expose the threads that connect them. Category theory in particular has a ton of examples connecting to many other fields because it is a science of analogy. These are the examples of categories I usually reach for. Which one feels the most comfortable to you will depend on your background.

  • Hask. Objects are types. Morphisms are functions between those types
  • Vect. Objects are vector spaces, morphisms are linear maps (roughly matrices).
  • Preorders. Objects are values. Morphisms are the inequalities between those values.
  • Sets. Objects are Sets. Morphisms are functions between sets.
  • Cat. Objects are categories, Morphisms are functors. This is a pretty cool one, although complete categorical narcissism.
  • Systems and Processes.
  • The Free Category of a directed graphs. Objects are vertices. Morphisms are paths between vertices

Generic Programming and Typeclasses

The goal of generic programming is to run programs that you write once in many way.

There are many ways to approach this generic programming goal, but one way this is achieved in Haskell is by using Typeclasses. Typeclasses allow you to overload names, so that they mean different things based upon the types involved. Adding a vector is different than adding a float or int, but there are programs that can be written that reasonably apply in both situations.

Writing your program in a way that it applies to disparate objects requires abstract ways of talking about things. Mathematics is an excellent place to mine for good abstractions. In particular, the category theory abstraction has demonstrated itself to be a very useful unified vocabulary for mathematical topics. I, and others, find it also to be a beautiful aesthetic by which to structure programs.

In the Haskell base library there is a Category typeclass defined in base. In order to use this, you need to import the Prelude in an unusual way.

The Category typeclass is defined on the type that corresponds to the morphisms of the category. This type has a slot for the input type and a slot for the output type. In order for something to be a category, it has to have an identity morphisms and a notion of composition.

The most obvious example of this Category typeclass is the instance for the ordinary Haskell function (->). The identity corresponds to the standard Haskell identity function, and composition to ordinary Haskell function composition.

Another example of a category that we’ve already encountered is that of linear operators which we’ll call LinOp. LinOp is an example of a Kliesli arrow, a category built using monadic composition rather than regular function composition. In this case, the monad Q from my first post takes care of the linear pipework that happens between every application of a LinOp. The fish <=< operator is monadic composition from Control.Monad.

A related category is the FibOp category. This is the category of operations on Fibonacci anyons, which are also linear operations. It is LinOp specialized to the Fibonacci anyon space. All the operations we’ve previously discussed (F-moves, braiding) are in this category.

The “feel” of category theory takes focus away from the objects and tries to place focus on the morphisms. There is a style of functional programming called “point-free” where you avoid ever giving variables explicit names and instead use pipe-work combinators like (.), fst, snd, or (***). This also has a feel of de-emphasizing objects. Many of the combinators that get used in this style have categorical analogs. In order to generically use categorical typeclasses, you have to write your program in this point free style.

It is possible for a program written in the categorical style to be a reinterpreted as a program, a linear algebra operation, a circuit, or a diagram, all without changing the actual text of the program. For more on this, I highly recommend Conal Elliot’s  compiling to categories, which also puts forth a methodology to avoid the somewhat unpleasant point-free style using a compiler plug-in. This might be an interesting place to mine for a good quantum programming language. YMMV.

Monoidal Categories.

Putting two processes in parallel can be considered a kind of product. A category is monoidal if it has this product of this flavor, and has isomorphisms for reassociating objects and producing or consuming a unit object. This will make more sense when you see the examples.

We can sketch out this monoidal category concept as a typeclass, where we use () as the unit object.

Instances

In Haskell, the standard monoidal product for regular Haskell functions is (***) from Control.Arrow. It takes two functions and turns it into a function that does the same stuff, but on a tuple of the original inputs. The associators and unitors are fairly straightforward. We can freely dump unit () and get it back because there is only one possible value for it.

The monoidal product we’ll choose for LinOp is the tensor/outer/Kronecker product.

Otherwise, LinOp is basically a monadically lifted version of (->). The one dimensional vector space Q () is completely isomorphic to just a number. Taking the Kronecker product with it is basically the same thing as scalar multiplying (up to some shuffling).

Now for a confession. I made a misstep in my first post. In order to make our Fibonacci anyons jive nicely with our current definitions, I should have defined our identity particle using type Id = () rather than data Id. We’ll do that now. In addition, we need some new primitive operations for absorbing and emitting identity particles that did not feel relevant at that time.

With these in place, we can define a monoidal instance for FibOp. The extremely important and intriguing F-move operations are the assoc operators for the category. While other categories have assoc that feel nearly trivial, these F-moves don’t feel so trivial.

This is actually useful

The parC operation is extremely useful to explicitly note in a program. It is an opportunity for optimization. It is possible to inefficiently implement parC in terms of other primitives, but it is very worthwhile to implement it in new primitives (although I haven’t here). In the case of (->), parC is an explicit location where actual computational parallelism is available. Once you perform parC, it is not longer obviously apparent that the left and right side of the tuple share no data during the computation. In the case of LinOp and FibOp, parC is a location where you can perform factored linear computations. The matrix vector product (A \otimes B)(v \otimes w) can be performed individually (Av)\otimes (Bw). In the first case, where we densify A \otimes B and then perform the multiplication, it costs O((N_A N_B)^2) time, whereas performing them individually on the factors costs O( N_A^2 + N_B^2) time, a significant savings. Applied category theory indeed.

Laws

Judge Dredd courtesy of David

Like many typeclasses, these monoidal morphisms are assumed to follow certain laws. Here is a sketch (for a more thorough discussion check out the wikipedia page):

  • Functions with a tick at the end like assoc' should be the inverses of the functions without the tick like assoc, e.g. assoc . assoc' = id
  • The parC operation is (bi)functorial, meaning it obeys the commutation law parC (f . f') (g . g') = (parC f g) . (parC f' g') i.e. it doesn’t matter if we perform composition before or after the parC.
  • The pentagon law for assoc: Applying leftbottom is the same as applying topright
  • The triangle law for the unitors:

String Diagrams

String diagrams are a diagrammatic notation for monoidal categories. Morphisms are represented by boxes with lines.

Composition g . f is made by connecting lines.

The identity id is a raw arrow.

The monoidal product of morphisms f \otimes g is represented by placing lines next to each other.

The diagrammatic notion is so powerful because the laws of monoidal categories are built so deeply into it they can go unnoticed. Identities can be put in or taken away. Association doesn’t even appear in the diagram. The boxes in the notation can naturally be pushed around and commuted past each other.

This corresponds to the property

(id \otimes g) \circ (f \otimes id) = (f \otimes id) \circ (id \otimes g)

What expression does the following diagram represent?

Is it (f \circ f') \otimes (g \circ g') (in Haskell notation parC (f . f') (g . g') )?

Or is it (f \otimes g) \circ (f' \otimes g') (in Haskell notation (parC f g) . (parC f' g')?

Answer: It doesn’t matter because the functorial requirement of parC means the two expressions are identical.

There are a number of notations you might meet in the world that can be interpreted as String diagrams. Three that seem particular pertinent are:

  • Quantum circuits
Image result for quantum circuits
  • Anyon Diagrams!

Braided and Symmetric Monoidal Categories: Categories That Braid and Swap

Some monoidal categories have a notion of being able to braid morphisms. If so, it is called a braided monoidal category (go figure).

The over and under morphisms are inverse of each other over . under = id. The over morphism pulls the left morphism over the right, whereas the under pulls the left under the right. The diagram definitely helps to understand this definition.

These over and under morphisms need to play nice with the associator of the monoidal category. These are laws that valid instance of the typeclass should follow. We actually already met them in the very first post.

If the over and under of the braiding are the same the category is a symmetric monoidal category. This typeclass needs no extra functions, but it is now intended that the law over . over = id is obeyed.

When we draw a braid in a symmetric monoidal category, we don’t have to be careful with which one is over and under, because they are the same thing.

The examples that come soonest to mind have this symmetric property, for example (->) is a symmetric monoidal category..

Similarly LinOp has an notion of swapping that is just a lifting of swap

However, FibOp is not symmetric! This is perhaps at the core of what makes FibOp so interesting.

Automating Association

Last time, we spent a lot of time doing weird typelevel programming to automate the pain of manual association moves. We can do something quite similar to make the categorical reassociation less painful, and more like the carefree ideal of the string diagram if we replace composition (.) with a slightly different operator

Before defining reassoc, let’s define a helper LeftCollect typeclass. Given a typelevel integer n, it will reassociate the tree using a binary search procedure to make sure the left branch l at the root has Count l = n.

Once we have LeftCollect, the typeclass ReAssoc is relatively simple to define. Given a pattern tree, we can count the elements in it’s left branch and LeftCollect the source tree to match that number. Then we recursively apply reassoc in the left and right branch of the tree. This means that every node has the same number of children in the tree, hence the trees will end up in an identical shape (modulo me mucking something up).

It seems likely that one could write equivalent instances that would work for an arbitrary monoidal category with a bit more work. We are aided somewhat by the fact that FibOp has a finite universe of possible leaf types to work with.

Closing Thoughts

While our categorical typeclasses are helpful and nice, I should point out that they are not going to cover all the things that can be described as categories, even in Haskell. Just like the Functor typeclass does not describe all the conceptual functors you might meet. One beautiful monoidal category is that of Haskell Functors under the monoidal product of Functor Composition. More on this to come, I think. https://parametricity.com/posts/2015-07-18-braids.html

We never even touched the dot product in this post. This corresponds to another doodle in a string diagram, and another power to add to your category. It is somewhat trickier to work with cleanly in familiar Haskell terms, I think because (->) is at least not super obviously a dagger category?

You can find a hopefully compiling version of all my snippets and more in my chaotic mutating Github repo https://github.com/philzook58/fib-anyon

See you next time.

References

The Rosetta Stone paper by Baez and Stay is probably the conceptual daddy of this entire post (and more).

Bartosz Milewski’s Category Theory for Programmer’s blog (online book really) and youtube series are where I learned most of what I know about category theory. I highly recommend them (huge Bartosz fanboy).

Catsters – https://byorgey.wordpress.com/catsters-guide-2/

https://en.wikibooks.org/wiki/Haskell/Category_theory

https://www.math3ma.com/blog/what-is-category-theory-anyway

There are fancier embeddings of category theory and monoidal categories than I’ve shown here. Often you want constrained categories and the ability to choose unit objects. I took a rather simplistic approach here.

http://hackage.haskell.org/package/constrained-categories

http://hackage.haskell.org/package/data-category


https://parametricity.com/posts/2015-07-18-braids.html