Uniform Continuity is Kind of Like a Lens

A really interesting topic is exact real arithmetic. It turns out, there are systematic ways of calculating numerical results with arbitrarily fine accuracy.

In practice this is not used much as it is complicated and slow.

There are deep waters here.

The problem is made rather difficult by the fact that you can’t compute real numbers strictly, you have to in some sense compute better and better finite approximations.

One way of doing this is to compute a stream of arbitrarily good approximations. If someone needs a better approximation than you’ve already given, they pop the next one off.

Streams give you some inverted control flow. They allow the results to pull on the input, going against the grain of the ordinary direction of computation. If you are interested in a final result of a certain accuracy, they seem somewhat inefficient. You have to search for the right amount to pull the incoming streams, and the intermediate computations may not be helpful.

Haskell chews infinite lists up for breakfast, so it’s a convenient place for such things https://wiki.haskell.org/Exact_real_arithmetic https://hackage.haskell.org/package/exact-real

A related but slightly different set of methods comes in the form of interval arithmetic. Interval arithmetic also gives precise statements of accuracy, maintain bounds of the accuracy as a number is carried along

Interval arithmetic is very much like forward mode differentiation. In forward mode differentiation, you compute on dual numbers (x,dx) and carry along the derivatives as you go.

Conceptually, differentiation and these validated bounds are connected as well. They are both telling you something about how the function is behaving nearby. The derivative is mostly meaningful at exactly the point it is evaluated. It is extremely local. The verified bounds being carried along are sort of a very principled finite difference approximation.

But reverse mode differentiation is often where it is at. This is the algorithm that drives deep learning. Reverse mode differentiation can be modeled functionally as a kind of lens. http://www.philipzucker.com/reverse-mode-differentiation-is-kind-of-like-a-lens-ii/ . The thing that makes reverse mode confusing is the backward pass. This is also inverted control flow, where the output pushes information to the input. The Lens structure does this too

It carrier a function that goes in the reverse direction which are being composed in the opposite direction of ordinary control flow. These functions are the “setters” in the ordinary usage of the Lens, but they are the backproppers for differentiation.

By analogy one might try

There is something pleasing here compared to interval arithmetic in that the output epsilon drives the input delta. The second function is kind of a Skolemized \delta(\epsilon) from the definition of continuity.

Although it kind of makes sense, there is something unsatisfying about this. How do you compute the x -> y? You already need to know the accuracy before you can make this function?

So it seems to me that actually a better definition is

This type surprised me and is rather nice in many respects. It let’s you actually calculate x -> y, has that lazy pull based feel without infinite streams, and has delta as a function of epsilon.

I have heard, although don’t understand, that uniform continuity is the more constructive definition (see constructive analysis by Bridger) https://en.wikipedia.org/wiki/Uniform_continuity This definition seems to match that.

In addition we are able to use approximations of the actual function if we know the accuracy it needs to be computed to. For example, given we know we need 0.01 accuracy of the output, we know we only need 0.009 accuracy in the input and we only need the x term of a Taylor series of sine (the total inaccuracy of the input and the inaccuracy of our approximation of sine combine to give total inaccuracy of output). If we know the needed accuracy allows it, we can work with fast floating point operations. If we need better we can switch over to mpfr, etc.

This seems nice for MetaOcaml staging or other compile time macro techniques. If the epsilon required is known at compile time, it makes sense to me that one could use MetaOcaml to produce fast unrolled code. In addition, if you know the needed accuracy you can switch between methods and avoid the runtime overhead. The stream based approach seems to have a lot of context switching and perhaps unnecessary intermediate computations. It isn’t as bad as it seems, since these intermediate computations are usually necessary to compute anyhow, but still.

We can play the same monoidal category games with these lenses as ever. We can use dup, par, add, mul, sin, cos etc. and wire things up in diagrams and what have you.

This might be a nice type for use in a theorem prover. The Lens type combined with the appropriate properties that the intervals go to zero and stay consistent for arbitrary epsilon seems like enough? { Realf | something something something}

Relation to Backwards error analysis?

Does this have nice properties like backprop when on high dimensional inputs? That’s where backprop really shines, high to low dimensional functions

Computing Syzygy Modules in Sympy

Reading about the methods of computational algebra is really compelling to me because some domains that seem like natural fits

I used to have no idea that multivariate polynomial equations had guaranteed methods that in some sense solve those systems. It’s pretty cool.

However, when I was pouring over the two Cox Little O’shea volumes, the chapter on modules made my eyes glaze over. Who ordered that? From my perspective, modules are vector spaces where you cripple the ability to divide scalars. Fair enough, but the language is extremely confusing and off-putting. Syzygy? Free Resolution? Everything described as homomorphisms and exact sequences? Forget it. Why do this? This is too abstract.

So I’ve been on the lookout for some application to motivate them. And I think I have at least one. Capacitor Inductor circuits.

A pure resistive circuit can be treated by linear algebra. The voltages and currents are connected by linear relations. http://www.philipzucker.com/linear-relation-algebra-of-circuits-with-hmatrix/

The common way to describe inductor capacitors circuits is by using phasor analysis, where the resistances become impedances which have a frequency parameter in them. I’m not entirely convinced that it isn’t acceptable to just use linear algebra over rational functions of the frequency, but I have some reason to believe more carefulness regarding division may bear fruit. I suspect that carefulness with division corresponds to always sticky issues of boundary conditions.

On a slightly different front, I was very impressed by Jan Willems Open Dynamical systems. https://homes.esat.kuleuven.be/~sistawww/smc/jwillems/Articles/JournalArticles/2007.1.pdf In it, he talks about differential equations as describing sets of possible trajectories of systems. He uses module theory as a way to manipulate those sets and conditions from module theory to describe interesting qualitative features like controllability of those systems.

He sticks to the tools of Hermite and Smith forms of matrices, as he is mostly interested in single variable polynomials as the ring in question. Here’s my issues

  1. I’m not really familiar with these forms
  2. I can’t find good implementations of these. Perhaps here https://desr.readthedocs.io/en/latest/index.html (Differential Equation Symmetry Reduction), which seems like an interesting project for other reasons as well. Maybe I’m a fool, but I’d like to stick to python for the moment.
  3. I also have an inkling that modules over multivariate polynomials will come in handy for playing around with band theory (or partial different equations for that matter). Maybe something interesting to be said regarding topological materials?

It seems like Groebner basis techniques should acceptably solve these systems as well. Converting between the analog of range and nullspace representations as I did in my previous post corresponds to syzygy calculations in the terminology of modules

Sympy does supply a Groebner basis algorithm, but not much beyond that. The AGCA module that should supply calculations over modules is mostly a lie. The documentation lists many functions that are not implemented. Which is too bad.

However, you can can hack in syzygy calculation into a Groebner basis calculation. I started pouring over chapter 5 of Using Algebra again, and it really has everything you need.

When one converts a set of polynomials to a Groebner basis, one is getting an equivalent set of polynomials with excellent properties. A Groebner basis is an analog of reduced echelon form of a matrix (these rows are equivalent to the old rows), and Buchberger’s algorithm is an analog of gaussian elimination. https://mattpap.github.io/masters-thesis/html/src/groebner.html#special-case-1-gauss-algorithm . You can find a decomposition of a polynomial in your ideal by a multivariate division algorithm with respect to the Groebner basis. This is the analog of the ability of an upper triangular matrix to be easily inverted.

You can perform a number of tricks by adding in dummy variables to the Groebner basis algorithm. The first thing you can do with this sort of trick is track how to write the Groebner basis in terms of the original basis. This is the analog of working with an augmented matrix during gaussian elimination. https://en.wikipedia.org/wiki/Augmented_matrix

I found this Maple documentation helpful in this regard (although formatted horrifically)

https://www.maplesoft.com/support/help/Maple/view.aspx?path=Groebner%2fBasis_details

We want to track a matrix A that writes the Groebner basis vector G to the original vector of polynomials F. G = AF. We do it by attaching the each generator f of F a fresh marker variable f + m. Then the coefficients on m in the extended Groebner basis correspond to the matrix A. Think about it.

The other direction matrix can be found via the reduction algorithm with respect to the Grobner basis F = BG . This is pretty straightforward given that sympy implemented reduction for us.

From these we determine that

G = GBA
F = FAB

Finding the syzygies of a set of generators is the analog of finding a nullspace of a matrix. A syzygy is a set of coefficients to “dot” onto the generators and get zero. In linear algebra talk, they are sort of orthogonal to the generator set.

The ability to find a nullspace gives you a lot of juice. One can phrase many problems, including solving a Ax=b system of equations as a nullspace finding problem.

Proposition 3.3 of Using Algebra tells us how to calculate the generators of a syzygy module for a Groebner basis. It’s a little strange. The S-polynomial of two generators from the basis is zero after reduction by the basis. The S-polynomial plus the reduction = 0 gives us a very interesting identity on the generators (a syzygy) and it turns out that actually these generate all possible syzygies. This is still not obvious to me but the book does explain it.

Proposition 3.8 of Using Algebra tells us how to get the syzygies of the original generators given the previous information. We map back the generators of G and append the columns I – AB also

I – AB columns are syzygys of F. F (I – AB) = F – FAB = F- F = 0 using the equation from above F = FAB

I’m still trying to figure out how to do calculations on modules proper. I think it can be done be using dummy variables to turn module vectors into single expressions. There is an exercise in Using Algebra that mentions this.

Grobner basis reference suggestions:

Categorical Combinators for Convex Optimization and Model Predictive Control using Cvxpy

We’re gonna pump this well until someone MAKES me stop.

This particular example is something that I’ve been trying to figure out for a long time, and I am pleasantly surprised at how simple it all seems to be. The key difference with my previous abortive attempts is that I’m not attempting the heavy computational lifting myself.

We can take pointful DSLs and convert them into point-free category theory inspired interface. In this case a very excellent pointful dsl for convex optimization: cvxpy. Some similar and related posts converting dsls to categorical form

A convex optimization problem optimizes a convex objective function with constraints that define a convex set like polytopes or balls. They are polynomial time tractable and shockingly useful. We can make a category out of convex optimization problems. We consider some variables to be “input” and some to be “output”. This choice is somewhat arbitrary as is the case for many “relation” feeling things that aren’t really so rigidly oriented.

Convex programming problems do have a natural notion of composition. Check out the last chapter of Rockafeller, where he talks about the convex algebra of bifunctions. Instead of summing over the inner composition variable like in Vect \sum_j A_{ij}B_{jk}, or existentializing like in Rel \{ (a,c) |\exists b. (a,b)\in A, (b,c) \in B \}, we instead minimize over the inner composition variable min_y A(x,y) + B(y,z). These are similar operations in that they all produce bound variables.

The identity morphism is just the simple constraint that the input variables equal to output variables with an objective function of 0. This is an affine constraint, hence convex.

In general, if we ignore the objective part entirely by just setting it to zero, we’re actually working in a very computationally useful subcategory of Rel, ConvexRel, the category of relations which are convex sets. Composition corresponds to an existential operation, which is quickly solvable by convex optimization techniques. In operations research terms, these are feasibility problems rather than optimization problems. Many of the combinators do nothing to the objective.

The monoidal product just stacks variables side by side and adds the objectives and combines the constraints. They really are still independent problems. Writing things in this way opens up a possibility for parallelism. More on that some other day.

We can code this all up in python with little combinators that return the input, output, objective, constraintlist. We need to hide these in inner lambdas for appropriate fresh generation of variables.

Now for a somewhat more concrete example: Model Predictive control of an unstable (linearized) pendulum.

Model predictive control is where you solve an optimization problem of the finite time rollout of a control system online. In other words, you take measurement of the current state, update the constraint in an optimization problem, ask the solver to solve it, and then apply the force or controls that the solver says is the best.

This gives the advantage over the LQR controller in that you can set hard inequality bounds on total force available, or positions where you wish to allow the thing to go. You don’t want your system crashing into some wall or falling over some cliff for example. These really are useful constraints in practice. You can also include possibly time dependent aspects of the dynamics of your system, possibly helping you model nonlinear dynamics of your system.

I have more posts on model predictive control here http://www.philipzucker.com/model-predictive-control-of-cartpole-in-openai-gym-using-osqp/ http://www.philipzucker.com/flappy-bird-as-a-mixed-integer-program/

Here we model the unstable point of a pendulum and ask the controller to find forces to balance the pendulum.

We can interpret the controller in GraphCat in order to produce a diagram of the 10 step lookahead controller. This is an advantage of the categorical style a la compiling to categories

We can also actually run it in model predictive control configuration in simulation.

And some curves. How bout that.

Bits and Bobbles

LazySets https://github.com/JuliaReach/LazySets.jl

ADMM – It’s a “lens”. I’m pretty sure I know how to do it pointfree. Let’s do it next time.

The minimization can be bubbled out to the top is we are always minimizing. If we mix in maximization, then we can’t and we’re working on a more difficult problem. This is similar to what happens in Rel when you have relational division, which is a kind of universal quantification \forall . Mixed quantifier problems in general are very tough. These kinds of problems include games, synthesis, and robustness. More on this some other day.

Convex-Concave programming minimax https://web.stanford.edu/~boyd/papers/pdf/dccp_cdc.pdf https://web.stanford.edu/class/ee364b/lectures/cvxccv.pdf

The minimization operation can be related to the summation operation by the method of steepest descent in some cases. The method of steepest descent approximates a sum or integral by evaluating it at it’s most dominant position and expanding out from there, hence converts a linear algebra thing into an optimization problem. Examples include the connection between statistical mechanics and thermodynamics and classical mechanics and quantum mechanics.

Legendre Transformation ~ Laplace Transformation via steepest descent https://en.wikipedia.org/wiki/Convex_conjugate yada yada, all kinds of good stuff.

Intersection is easy. Join/union is harder. Does MIP help?

Quantifier elimination

MIP via adjunction

Naive Synthesis of Sorting Networks using Z3Py

As a simple extension of verifying the sorting networks from before, we can synthesize optimally small sorting networks. The “program” of the sorting network is specified by a list of tuples of the elements we wish to compare and swap in order. We just generate all possible sequences of comparison operations and ask z3 to try verifying. If z3 says it verifies, we’re done.

Here are some definitions for running the thing

and here is a simple generating thing for all possible pairs.

As is, this is astoundingly slow. Truly truly abysmally slow. The combinatorics of really naively search through this space is abysmal. I doubt you’re going to get more than a network of size 6 out of this as is.

Some possible optimizations: early pruning of bad networks with testing, avoiding ever looking at obviously bad networks. Maybe a randomized search might be faster if one doesn’t care about optimality. We could also ask z3 to produce networks.

For more on program synthesis, check out Nadia Polikarpova’s sick course here.

https://github.com/nadia-polikarpova/cse291-program-synthesis

Notes on Finally Tagless

For reading group this week we read finally tagless partially evaluated http://okmij.org/ftp/tagless-final/JFP.pdf. It took me a couple minutes to regain my footing on my preferred explanation of what tagless is. This is why we write blog posts, to remember that kind of thing.

One thing that is very confusing about finally tagless as presented is that people tend to be talking about dsls with binding forms, like embedded lambda calculi, or tensor summation and things. This is complicated and I think to some degree orthogonal to the core of the the idea. Instead I’ll use Bool as my running example, which is so simple that it perhaps obscures the idea in the opposite direction.

When you define a data type, you define constructors. Constructors are just functions. This is more readily apparent using GADT syntax.

What makes constructor feel like more than just ordinary functions is that you can pattern match out of them too. Applying constructors and pattern matching out of them is a completely lossless process. The two processes are dual in some sense. In some sense, it seems like programming is one big shuffling game. In some sense. In some sense. In some sense.

In some sense. iN SoME SeNSe

Anyway, pattern matching is it’s own thing that doesn’t feel like other piece of the language. But pattern matching can be captured as a first class object with the notion of an eliminator / recursor function. If you think about it, what pattern matching is is a thing that takes that data type and then gives you the stuff inside the data type. So pattern matching is the same as function that takes in a functions that tell me what to do with that stuff for each case.

The bohm-berarducci encoding of data types makes the pattern matching function the data type itself.

In the final encoding of the datatype, we replace the data keyword with the class keyword. We can witness the isomorphism with an instance for BoolI and an intepretation function from BoolI to BoolF

However, there are some very nice features of this encoding. Typeclass resolution happens completely at compile time. This means that you can write something once and have it run many ways, with no runtime overhead. This is useful for dsls, especially ones you intend to just immediately interpret out of.

Once way you can have something run many ways is by having a syntax tree for the thing you want to do. Then you can write different intepreters. But then you have the runtime cost of interpretation.

A second feature is the openness of typeclasses compared to data types. Suppose you wanted to add another field to BoolI. Now you need to correct all your functions. However, you can make the new field a new typeclass and all your old functions still work. You can require the power you need.

A third thing is that finally tagless does get you some of the type restriction available with GADTs in a language without them. GADTs are IN SOME SENSE just constructors without the most general inferred type. But they also let you recover the type information you hid away upon pattern matching.

We can see the correspondence in a different way. A typeclass constraint corresponds to the implicit supplying of a dictionary with fields correspond to the typeclass.

What is finally tagless not so good for? Brains mostly. It is quite a mind bending style to use. If you want to do deep pattern matching in some simplifier, it is possible, yet rather difficult to achieve. I’ve seen this done in some Oleg papers somewhere (on SQL query optimization I think?)

Here’s another example on list

Going the other direction from finally tagless is interesting as well. If you take a typeclass and replace the keyword class with data, you get something like the “free” version of that class. Two cases in mind are that of the free monoid and free monad. The usual presentation of these looks different though. That is because they are canonized. These data types need to be thought of as “modulo” the laws of the typeclass, which probably shows up in a custom Eq instance. I’m a little hazy on exactly how to explain the Pure constructors, but you do need them I think.

http://okmij.org/ftp/tagless-final/JFP.pdf – tagless final paper. Also some very interesting things related to partial evaluation

https://oleg.fi/gists/posts/2019-06-26-linear-church-encodings.html – interesting explanation of bohm-berarducci

http://okmij.org/ftp/tagless-final/course/lecture.pdf – oleg’s course

I thought reflection without remorse had some related form of free monad http://okmij.org/ftp/Haskell/zseq.pdf