Walk on Spheres Method in Julia

I saw a cool tweet (and corresponding conference paper) by Keenan Crane

http://www.cs.cmu.edu/~kmcrane/Projects/MonteCarloGeometryProcessing/index.html

I was vaguely aware that one can use a Monte Carlo method to solve the boundary value Laplace equation \nabla^2 \phi = 0 , but I don’t think I had seen the walk on spheres variant of it before. I think Crane’s point is how similar all this is to stuff graphics people already do and do well. It’s a super cool paper. Check it out.

Conceptually, I think it is plausible that the Laplace equation and a monte carlo walk are related because the static diffusion equation \nabla^2 n = 0 from Fick’s law ultimately comes from the brownian motion of little guys wobbling about from a microscopic perspective.

Slightly more abstractly, both linear differential equations and random walks can be describe by matrices, a finite difference matrix (for concreteness) K and a transition matrix of jump probabilities T. The differential equation is discretized to Kx=b and the stationary probability distribution is Tp=b, where b are sources and sinks at the boundary.

The mean value property of the Laplace equation allows one to speed this process up. Instead of having a ton of little walks, you can just walk out randomly sampling on the surface of big spheres. en.wikipedia.org/wiki/Walk-on-spheres_method. Alternatively you can think of it as eventually every random walk exits a sphere, and it is at a random spot on it.

So here’s the procedure. Pick a point you want the value of \phi at. Make the biggest sphere you can that stays in the domain. Pick a random point on the sphere. If that point is on the boundary, record that boundary value, otherwise iterate. Do this many many times, then the average value of the boundaries you recorded it the value of \phi

This seems like a good example for Julia use. It would be somewhat difficult to code this up efficiently in python using vectorized numpy primitives. Maybe in the future we could try parallelize or do this on the GPU? Monte carlo methods like these are quite parallelizable.

The solution of the 1-d Laplace equation is absolutely trivial. If the second derivative is 0, then $\phi = a + b x $. This line is found by fitting it to the two endpoint values.

So we’re gonna get a line out

using LinearAlgebra
avg = 0
phi0 = 0
phi1 = 10
x_0 = 0.75
function monte_run(x)
    while true
            l = rand(Bool) # go left?
            if (l && x <= 0.5) # finish at left edge 0
                return phi0
            elseif (!l && x >= 0.5) # finish at right edge 1
                return phi1
            else
                if x <= 0.5 # move away from 0
                    x += x
                else
                    x -= 1 - x # move away from 1
                end
            end
    end
end

monte_runs = [monte_run(x) for run_num =1:100, x=0:0.05:1 ]
import Statistics
avgs = vec(Statistics.mean( monte_runs , dims=1))
stddevs = vec(Statistics.std(monte_runs, dims=1)) ./ sqrt(size(monte_runs)[1]) # something like this right?

plot(0:0.05:1, avgs, yerror=stddevs)
plot!(0:0.05:1,  (0:0.05:1) * 10 )

And indeed we do.

You can do a very similar thing in 2d. Here I use the boundary values on a disc corresponding to x^2 – y^2 (which is a simple exact solution of the Laplace equation).



function monte_run_2d(phi_b, x)
    while true
            r = norm(x)
            if r > 0.95 # good enough
                return phi_b(x)
            else
                dr = 1.0 - r #assuming big radius of 1
                θ = 2 * pi * rand(Float64) #
                x[1] += dr * cos(θ)
                x[2] += dr * sin(θ)
            end
    end
end


monte_run_2d( x -> x[1],  [0.0 0.0] )


monte_runs = [monte_run_2d(x -> x[1]^2 - x[2]^2 ,  [x 0.0] ) for run_num =1:1000, x=0:0.05:1 ]

import Statistics
avgs = vec(Statistics.mean( monte_runs , dims=1))
stddevs = vec(Statistics.std(monte_runs, dims=1)) ./ sqrt(size(monte_runs)[1]) # something like this right?
plot(0:0.05:1, avgs, yerror=stddevs)
plot!(0:0.05:1,  (0:0.05:1) .^2 )

There’s more notes and derivations in my notebook here https://github.com/philzook58/thoughtbooks/blob/master/monte_carlo_integrate.ipynb

A Smattering of Physics in Sympy

Sympy is fun. I’ve been enjoying trying out some simple physics problems and seeing what kind of fun angles sympy brings to the table. It does pretty good on concrete problems, not so good at abstract derivations.

Kinematics

Ah such fond memories! In high school, I was taught by Ric Thompson “the big four”.

x_f = x_i + v_i t + \frac{1}{2} a t^2

v_f = v_i + a t

v_f^2 = v_i^2 + 2 a d

d = \frac{v_f + v_i}{2} t

The equations are of course, overcomplete. They are all implied by \frac{d^2}{dt^2}x = a, but even with only algebra then second two are derivable from the first two.

Of course a natural way of deriving the equations is to solve one equation for a variable and substitute it into the other equation. sympy makes this pretty dang easy.

from sympy import *
init_printing()
t,a,d,vf,vi = symbols("t a d vf vi")
e1 = Eq(d , vi * t + 1/2 * a * t ** 2)
tsub = solve(Eq(vf , vi + a * t),t)[0]
print(tsub) # This is assuming a is nonzero though.
expand(simplify(e1.subs(t,tsub)))
(vf - vi)/a
Eq(d, 0.5*vf**2/a - 0.5*vi**2/a)

However, there is a more automated approach.

It turns out that a decent chunk of physics equations are or can be well approximated by a system of polynomial equations. There are systematic methods that are guaranteed to solve the problem (albeit maybe not in the lifetime of the universe).

A grobner basis is an equivalent set of polynomial equations that has useful properties. For some simple purposes, all you need to know is that if you give the variables you want to eliminate first, the Groebner basis will contain equations without those variable. Here we specify t as one to eliminate, so we get an equation without t in it


G = groebner(  [vi * t + 1/2 * a * t ** 2 - d,  
                vi + a * t - vf] , 
                 t,vf,d,a,vi  )
for e in G:
    print(e)
-2.0*d + 1.0*t*vf + 1.0*t*vi
1.0*a*t - 1.0*vf + 1.0*vi
-2.0*a*d + 1.0*vf**2 - 1.0*vi**2

I’ve actually been pleasantly surprised at how many physics problems reduce ultimately to systems of polynomial constraints. Energy and momentum conservation are polynomial constraints (classical feynman diagrams kind of). Special relativity questions can be reduced to polynomial constraints using the proper time.

#elephant problem
# elephants give birth at 21 months. On a rocket at velocity v
# how long T until you see it give birth? 
tau , t1, t2, x1, v, c, T = symbols("tau t1 t2 c1 v c T")

eqs = [
    tau**2 - (t1**2 - x1**2 / c**2), # proper time
    x1 - v * t1, # distance away
    c * t2 - x1, # time for light to travel back
    T - t1 - t2, # total time
    tau - 21 # proper time is 21 months
    
]

groebner(eqs, tau , t1, t2, x1, v, T)

Lagrangian Mechanics

The Structure and Interpretation of Classical Mechanics is an interesting book.

It points out that notation we use is extremely imprecise and implicit. This is a source of great confusion.

A great benefit of programming up such examples is that it makes explicit (sometimes painfully so) steps that were implicit before.

In the Euler Lagrange equation, first partially differentiates considering q and \dot{q} to be independent parameters. Then a substitution is makde for a function x(t) and then we procede with a differentiation with respect to time.

# simple harmonic oscillator lagrangian style
m, k = symbols("m k", real = True, positive=True)
v, q = symbols("v q")
K = Rational(1,2) * m * v ** 2 #kinetic energy
V = Rational(1,2) * k * q ** 2 # potential energy
L =  K - V  #Lagrangian
F = diff(L,q) # force
p = diff(L,v) # momentum

x_ = Function("x")
t = symbols("t")

x = x_(t)

subst = { v : diff(x,t),
         q : x} # replacement to turn q into a function x

# euler-lagrange equations of motion
eq = F.subs( subst ) - diff( p.subs(subst)  , t )
dsolve(eq) # general solution cosine and sines

Here’s an analogous thing for a pendulum

#simple harmonic oscillator lagrangian style
m, g, L = symbols("m g L", real = True, positive=True)
theta, w = symbols("theta omega")
K = Rational(1,2) * m * (w * L) ** 2 #kinetic energy
V = - Rational(1,2) * m * g * L * cos(theta) # potential energy. angle is defined as 0 = hanging down
L =  K - V  #Lagrangian
F = diff(L,theta) # force
p = diff(L,w) # momentum
F
p

x_ = Function("theta")
t = symbols("t")

x = x_(t)

subst = { w : diff(x,t),
         theta : x} # replacement to turn q into a function x

# euler-lagrange equations of motion
eq = F.subs( subst ) - diff( p.subs(subst)  , t )
eq
#dsolve(eq) 

Another place where an implicit stated substitution is absolutely vital is in the Legendre transform going from the Lagrangian to the Hamiltonian.

# legendre transformation to hamiltonian
p = symbols( "p" )
H_ = p * v - L # hamiltonian but we haven't solved out v yet
v_of_pq = solve(diff(H_, v), v)[0] # set derivative to 0 to solve for v.
H = simplify(H_.subs(v, v_of_pq )) # substitue back in. Here is the actual hamiltonian
H

Statistical Mechanics

Sympy can do Gaussian integrals! How convenient. It can also do power series expansions. And differentiate. So it takes the drudgery out of some simple calculations

# ideal gas partition function
beta, m, V, N, kb, T  = symbols("beta m V N k_b T", real=True, positive=True)
p = symbols("p", real=True)
Z = integrate( exp( - beta * Rational(1,2) * p ** 2 / m ), (p,-oo,oo))**(3*N) * V**N #partition function
def avg_energy(Z):
    return - diff(ln(Z), beta).subs(beta, 1/ kb / T)
print(avg_energy(Z)) #
F = (-ln(Z) / beta).subs(beta, 1 / kb / T) #helmholtz free energy
S = diff(F , T) # sentropy is derivative of helmholtz wrt T
S # the functional dependence on T and V I think is correct
P = -diff(F , V) # pressure is - derivative of V
P
# Neato
# hamrmonic oscillator partition function
beta, m, k = symbols("beta m k ", real=True, positive=True)
p, x = symbols("p x", real=True)
E = R(1,2) * p ** 2 / m  + R(1,2) * k * x ** 2
Z = integrate( integrate( exp( - beta * E ), (p,-oo,oo)) , (x,-oo, oo))**N 
diff(-ln(Z),beta)

Perturbation theory of the partition function of an anharmonic oscillator. Pretty easy. It is interesting to note that this is the very simplest schematic of how perturbation theory can be approached for quantum field theory.

# pertubration theory of anharmonic oscillator
beta, m, k, g = symbols("beta m k g ", real=True, positive=True)
p, x = symbols("p x", real=True)
E = Rational(1,2) * ( p ** 2 / m  +  k * x ** 2) + g * x ** 4
series(exp( - beta * E ), g).removeO()
Z = integrate( integrate( series(exp( - beta * E ), g, n=2).removeO(), (p,-oo,oo)) , (x,-oo, oo))
simplify(diff(-ln(Z),beta)) #E
simplify(diff(-ln(Z),k)/beta) #<x**> 

Other things that might be interesing : 2 oscillators, A chain of oscillators, virial expansion

Thermo and Legendre Tranformations

Thermodynamics is a poorly communicated topic. Which variables remain in expressions and what things are held constant when differentiating are crucial and yet poorly communicated and the notation is trash. Sympy helps make some things explicit. It’s fun.

u,s,t,p,v,n,r = symbols("u s t p v n r")

du,ds,dt,dp,dv = symbols("du ds dt dp dv")
# taylor series in stuff?

e1 = p * v - n * r * t
e2 = u - Rational(3 , 2) * n * r * t

state = [  (u,du), (s,ds), (t,dt) , (p,dp) , (v,dv) ]

def differential(e):
    return sum( [ diff(e,x) * dx  for x,dx in state]   )


de1 = differential(e1 )
de2 = differential(e2 )

e3 = du - (t * ds - p * dv)

eqs = [e1,e2,de1,de2,e3]
print(eqs)
G = groebner( eqs, u , du,  t, dt, p, dp, v, dv,  ds )
for e in G:
    print(e)
R = Rational
U,S,T,P,V,N, k = symbols("U S T P V N k")

cv = R(3,2) * N * k
e1 = U - cv * T
e2 = P * V - N * k * T
e3 = S - cv * ln(T) + N * k * ln(V)

elim = [P,T]
Ps = solve([e1,e2,e3], P)
Ps
es = [ e.subs(Ps) for e in [e1,e2,e3] ]
Ts = solve(e3, T)[0]
es = [  e.subs(T,Ts) for e in es ]
Usv = solve(es[0],U)[0]
psv = diff(Usv,V)
tsv = diff( Usv , S )

#solve(es[0], V)

Hsv = Usv + P * V  # enthalpy and legendre trnasformation
Vps = solve(diff(Hsv, V) , V) 
H =  Hsv.subs(V, Vps[0]) 
simplify(H)

There are so many other things!

What about a Van Der Waals equation? Optics (geometrical and wave, paraxial ~ Schrodinger, fourier optics), GR (exterior derivatives ) , Quantum (wave matching problems. What can we do about hydrogen? WKB, QHE) rutherford scattering, Weiss mean field, canonical transformations, Rotations. Clebsh-Gordon coefficients

Fiddling around with validated ODE integration, Sum of Squares, Taylor Models.

As I have gotten more into the concerns of formal methods, I’ve become unsure that ODEs actually exist. These are concerns that did not bother me much when I defined myself as being more in the physics game. How times change. Here’s a rough cut.

A difficulty with ODE error analysis is that it is very confusing how to get the error on something you are having difficulty approximating in the first place.

If I wanted to know the error of using a finite step size dt vs a size dt/10, great. Just compute both and compare. However, no amount of this seems to bootstrap you down to the continuum. And so I thought, you’re screwed in regards to using numerics in order to get true hard facts about the true solution. You have to go to paper and pencil considerations of equations and variables and epsilons and deltas and things. It is now clearer to me that this is not true. There is a field of verified/validated numerics.

A key piece of this seems to be interval arithmetic. https://en.wikipedia.org/wiki/Interval_arithmetic An interval can be concretely represented by its left and right point. If you use rational numbers, you can represent the interval precisely. Interval arithmetic over approximates operations on intervals in such a way as to keep things easily computable. One way it does this is by ignoring dependencies between different terms. Check out Moore et al’s book for more.

What switching over to intervals does is you think about sets as the things you’re operating on rather than points. For ODEs (and other things), this shift of perspective is to no longer consider individual functions, but instead sets of functions. And not arbitrary extremely complicated sets, only those which are concretely manipulable and storable on a computer like intervals. Taylor models are a particular choice of function sets. You are manipulating an interval tube around a finite polynomial. If during integration / multiplication you get higher powers, truncate the polynomials by dumping the excess into the interval term. This keeps the complexity under wraps and closes the loop of the descriptive system.

If we have an iterative, contractive process for getting better and better solutions of a problem (like a newton method or some iterative linear algebra method), we can get definite bounds on the solution if we can demonstrate that a set maps into itself under this operation. If this is the case and we know there is a unique solution, then it must be in this set.

It is wise if at all possible to convert an ODE into integral form. \dot{x}= f(x,t) is the same as x(t) = x_0 + \int f(x,t)dt.

For ODEs, the common example of such an operation is known as Picard iteration. In physical terms, this is something like the impulse approximation / born approximation. One assumes that the ODE evolves according to a known trajectory x_0(t) as a first approximation. Then one plugs in the trajectory to the equations of motion f(x_0,t) to determine the “force” it would feel and integrate up all this force. This creates a better approximation x_1(t) (probably) which you can plug back in to create an even better approximation.

If we instead do this iteration on an intervally function set / taylor model thing, and can show that the set maps into itself, we know the true solution lies in this interval. The term to search for is Taylor Models (also some links below).

I was tinkering with whether sum of squares optimization might tie in to this. I have not seen SOS used in this context, but it probably has or is worthless.

An aspect of sum of squares optimization that I thought was very cool is that it gives you a simple numerical certificate that confirms that at the infinitude of points for which you could evaluate a polynomial, it comes out positive. This is pretty cool. http://www.philipzucker.com/deriving-the-chebyshev-polynomials-using-sum-of-squares-optimization-with-sympy-and-cvxpy/

But that isn’t really what makes Sum of squares special. There are other methods by which to do this.

There are very related methods called DSOS and SDSOS https://arxiv.org/abs/1706.02586 which are approximations of the SOS method. They replace the SDP constraint at the core with a more restrictive constraint that can be expressed with LP and socp respectively. These methods lose some of the universality of the SOS method and became basis dependent on your choice of polynomials. DSOS in fact is based around the concept of a diagonally dominant matrix, which means that you should know roughly what basis your certificate should be in.

This made me realize there is an even more elementary version of DSOS that perhaps should have been obvious to me from the outset. Suppose we have a set of functions we already know are positive everywhere on a domain of interest. A useful example is the raised chebyshev polynomials. https://en.wikipedia.org/wiki/Chebyshev_polynomials The appropriate chebyshev polynomials oscillate between [-1,1] on the interval [-1,1], so if you add 1 to them they are positive over the whole interval [-1,1]. Then nonnegative linear sums of them are also positive. Bing bang boom. And that compiles down into a simple linear program (inequality constraints on the coefficients) with significantly less variables than DSOS. What we are doing is restricting ourselves to completely positive diagonal matrices again, which are of course positive semidefinite. It is less flexible, but it also has more obvious knobs to throw in domain specific knowledge. You can use a significantly over complete basis and finding this basis is where you can insert your prior knowledge.

It is not at all clear there is any benefit over interval based methods.

Here is a sketch I wrote for x'=x which has solution e^t. I used raised chebyshev polynomials to enforce positive polynomial constraints and tossed in a little taylor model / interval arithmetic to truncate off the highest terms.

I’m using my helper functions for translating between sympy and cvxpy expressions. https://github.com/philzook58/cvxpy-helpers Sympy is great for collecting up the coefficients on terms and polynomial multiplication integration differentiation etc. I do it by basically creating sympy matrix variables corresponding to cvxpy variables which I compile to cvxpy expressions using lambdify with an explicit variable dictionary.

Seems to work, but I’ve been burned before.

man, LP solvers are so much better than SDP solvers


Random junk and links: Should I be more ashamed of dumps like this? I don’t expect you to read this.

https://github.com/JuliaIntervals/TaylorModels.jl

https://github.com/JuliaIntervals

Functional analysis by and large analyzes functions by analogy with more familiar properties of finite dimensional vector spaces. In ordinary 2d space, it is convenient to work with rectangular regions or polytopic regions.

Suppose I had a damped oscillator converging to some unknown point. If we can show that every point in a set maps within the set, we can show that the function

One model of a program is that it is some kind of kooky complicated hyper nonlinear discrete time dynamical system. And vice versa, dynamical systems are continuous time programs. The techniques for analyzing either have analogs in the other domain. Invariants of programs are essential for determining correctness properties of loops. Invariants like energy and momentum are essential for determining what physical systems can and cannot do. Lyapunov functions demonstrate that control systems are converging to the set point. Terminating metrics are showing that loops and recursion must eventually end.

If instead you use interval arithmetic for a bound on your solution rather than your best current solution, and if you can show the interval maps inside itself, then you know that the iterative process must converge inside of the interval, hence that is where the true solution lies.

A very simple bound for an integral \int_a^b f(x)dx is \int max_{x \in [a,b]}f(x)  dx= max_{x \in [a,b]}f(x) \int dx = max_{x \in [a,b]}f(x) (b - a)

The integral is a very nice operator. The result of the integral is a positive linear sum of the values of a function. This means it plays nice with inequalities.

Rigorously Bounding ODE solutions with Sum of Squares optimization – Intervals

Intervals – Moore book. Computational functional analaysis. Tucker book. Coqintervals. fixed point theorem. Hardware acceleration? Interval valued functions. Interval extensions.

  • Banach fixed point – contraction mapping
  • Brouwer fixed point
  • Schauder
  • Knaster Tarski

Picard iteration vs? Allowing flex on boundary conditions via an interval?

Interval book had an interesting integral form for the 2-D

sympy has cool stuff

google scholar search z3, sympy brings up interesting things

https://moorepants.github.io/eme171/resources.html

The pydy guy Moore has a lot of good shit. resonance https://www.moorepants.info/blog/introducing-resonance.html

Lyapunov functions. Piecewise affine lyapunov funcions. Are lyapunov functions kind of like a PDE? Value functions are pdes. If the system is piecewise affine we can define a grid on the same piecewise affine thingo. Compositional convexity. Could we use compositional convexity + Relu style piecewise affinity to get complicated lyapunov functions. Lyapunov functions don’t have to be continiuous, they just have to be decreasing. The Lie derivative wrt the flow is always negative, i.e gradeint of function points roughly in direction of flow. trangulate around equilibrium if you want to avoid quadratic lyapunov. For guarded system, can relax lyapunov constrain outside of guard if you tighten inside guard. Ax>= 0 is guard. Its S-procedure.

Best piecewise approximation with point choice?

http://theory.stanford.edu/~arbrad/papers/lr.ps linear ranking functions

Connection to petri nets?

https://ths.rwth-aachen.de/wp-content/uploads/sites/4/hs_lecture_notes.pdf

https://www.cs.colorado.edu/~xich8622/papers/rtss12.pdf

KoAt, LoAT. AProve. Integer transition systems. Termination analysis. Loops?

https://lfcps.org/pub/Pegasus.pdf darboux polynomials. barreir certificates. prelle-singer method. first integrals.

method 1. arbitrary polynomial p(t). calculate p'(t). find coefficents that make p'(t) = 0 by linear algebra. Idea: near invaraints? min max|p'(t) |

Lie Algebra method

https://www.researchgate.net/publication/233653257_Solving_Differential_Equations_by_Symmetry_Groups sympy links this paper. Sympy has some lie algebra stuff in there

https://www-users.math.umn.edu/~olver/sm.html Peter Olver tutorial

http://www-users.math.umn.edu/~olver/talk.html olver talks

https://www-sop.inria.fr/members/Evelyne.Hubert/publications/PDF/Hubert_HDR.pdf

https://www.cs.cmu.edu/~aplatzer/logic/diffinv.html andre platzer. Zach says Darboux polynomials?

https://sylph.io/blog/math.html

Books: Birhoff and Rota, Guggenheimer, different Olver books, prwctical guide to invaraints https://www.amazon.com/Practical-Invariant-Monographs-Computational-Mathematics/dp/0521857015

Idea: Approximate invariants? At least this ought to make a good coordinate system to work in where the dynamics are slow. Like action-angle and adiabatic transformations. Could also perhaps bound the

Picard Iteration

I have a method that I’m not sure is ultimately sound. The idea is to start with

Error analysis most often uses an appeal to Taylor’s theorem and Taylor’s theorem is usually derived from them mean value theorem or intermediate value theorem. Maybe that’s fine. But the mean value theorem is some heavy stuff. There are computational doo dads that use these bounds + interval analysis to rigorously integrate ODEs. See https://github.com/JuliaIntervals/TaylorModels.jl

The beauty of sum of squares certificates is that they are very primitive proofs of positivity for a function on a domain of infinitely many values. If I give you a way to write an expression as a sum of square terms, it is then quite obvious that it has to be always positive. This is algebra rather than analysis.

y(t) = \lambda(t) \and \lambda(t) is SOS \Rightarrow \forall t. y(t) >= 0. Sum of squares is a kind of a quantifier elimination method. The reverse direction of the above implication is the subject of the positivstullensatz, a theorem of real algebraic geometry. At the very least, we can use the SOS constraint as a relaxation of the quantified constraint.

So, I think by using sum of squares, we can turn a differential equation into a differential inequation. If we force the highest derivative to be larger than the required differential equation, we will get an overestimate of the required function.

A function that is dominated by another in derivative, will be dominated in value also. You can integrate over inequalities (I think. You have to be careful about such things. ) \forall t. \frac{dx}{dt} >= \frac{dx}{dt} \Rightarrow x(t) – x(0) >= y(t) – y(0) $

The derivative of a polynomial can be thought of as a completely formal operation, with no necessarily implied calculus meaning. It seems we can play a funny kind of shell game to avoid the bulk of calculus.

As an example, let’s take \frac{dx}{dt}=y y(0) = 1 with the solution y = e^t. e is a transcendental

The S-procedure is trick by which you can relax a sum of squares inequality to only need to be enforced in a domain. If you build a polynomials function that describes the domain, it that it is positive inside the domain and negative outside the domain, you can add a positive multiple of that to your SOS inequalities. Inside the domain you care about, you’ve only made them harder to satisfy, not easier. But outside the domain you have made it easier because you can have negative slack.

For the domain t \in [0,1] the polynomial (1 - t)t works as our domain polynomial.

We parametrize our solution as an explicit polynomial x(t) = a_0 + a_1 t + a_2 t^2 + .... It is important to note that what follows is always linear in the a_i.

\frac{dx}{dt} - x >= 0 can be relaxed to \frac{dx}{dt} - x(t) + \lambda(t)(1-t)t >= 0 with \lambda(t) is SOS.

So with that we get a reasonable formulation of finding a polynomial upper bound solution of the differential equation

\min x(1)

\frac{dx}{dt} - x(t) + \lambda_1(t)(1-t)t =  \lambda_2(t)

\lambda_{1,2}(t) is SOS.

And here it is written out in python using my cvxpy-helpers which bridge the gap between sympy polynomials and cvxpy.

We can go backwards to figure out sufficient conditions for a bound. We want x_u(t_f) \gte x(t_f). It is sufficient that \forall t. x_u(t) \gte x(t). For this it is sufficient that \forall t. x_u'(t)  >= x'(t) \and x_u(t_i) >= x(t_i) . We follow this down in derivative until we get the lowest derivative in the differential equation. Then we can use the linear differential equation itself x^{(n)}(t) = \sum_i a_i(t) x^{(i)}(t). x_u^{(n)}(t) >= \sum max(a_i x^{(i)}_u, x^{(i)}_l).

a(t) * x(t) >= \max a(t) x_u(t), a(t) x_l(t). This accounts for the possibility of terms changing signs. Or you could separate the terms into regions of constant sign.

The minimization characterization of the bound is useful. For any class of functions that contains our degree-d polynomial, we can show that the minimum of the same optimization problem is less than or equal to our value.

Is the dual value useful? The lower bound on the least upper bound

Doesn’t seem like the method will work for nonlinear odes. Maybe it will if you relax the nonlinearity. Or you could use perhaps a MIDSP to make piecewise linear approximations of the nonlinearity?

It is interesting to investigtae linear programming models. It is simpler and more concrete to examine how well different step sizes approximate each other rather than worry about the differential case.

We can explicit compute a finite difference solution in the LP, which is a power that is difficult to achieve in general for differential equations.

We can instead remove the exact solution by a convservative bound.

While we can differentiate through an equality, we can’t differentiate through an inequality. Differentiation involves negation, which plays havoc with inequalities. We can however integrate through inequalities.

\frac{dx}{dt} >= f \and x(0) >= a \Rightarrow x(t) >= \int^t_0 f(x) + a$

As a generalization we can integrate \int p(x) over inequalities as long as p(x) \gte 0

In particular \forall t. \frac{dx}{dt} >= \frac{dx}{dt} \Rightarrow x(t) – x(0) >= y(t) – y(0) $

We can convert a differential equation into a differential inequation. It is not entirely clear to me that there is a canonical way to do this. But it works to take the biggest.

\frac{dx}{dt} = A(t)x + f(t)

Is there a tightest

We can integrate

Here let’s calculate e

https://tel.archives-ouvertes.fr/tel-00657843v2/document Thesis on ODE bounds in Isabelle

myfunc x y = 3

not so good. very small

Solving the Laplace Equations with Linear Relations

The Laplace equation is ubiquitous in physics and engineering.

$latex \nabla^2 \phi = 0 = \partial_x^2 \phi + \partial_y^2 \phi = 0

It and slight variants of it describes electrostatics, magnetostatics, steady state heat flow, elastic flex, pressure, velocity potentials.

There are a couple reasons for that.

  • It results from minimizing the squared gradient of a field |\nabla \phi |^2 which can make sense from an energy minimization perspective.
  • Similarly it results from the combination of a flow conservation law and a linear constitutive relation connecting flow and field (such as Ohm’s law, Fick’s law, or Hooke’s law).
  • It also gets used even if not particularly appropriate because we know how to mathematically deal with it, for example in image processing.

There are a couple of questions we may want to ask about a Laplace equation system

  • Given the field on the boundary, determine the field in the interior (Dirchlet problem)
  • Given the normal derivative of the field on the boundary determine the field in the interior (Neumann problem)
  • Given sources in the interior and 0 boundary condition, determine the field. The Laplace equation is called the Poisson equation when you allow a source term on the right hand side. \nabla^2 \phi = \rho.
  • Given the field at the boundary, determine the derivative at the boundary. Dirichlet-to-Neumann map or Poincare-Steklov operator.

Given the Dirichlet to Neumann map, you do not have to consider the interior of a region to use it. The Dirichlet to Neumann map is sort of the same thing as an effective resistance or scattering matrix. It gives you a black box representation of a region based solely on the variables at its boundary.

This linear relation algebra is useful for many things that I’d have considered a use case for the Schur complement. The Schur complement arises when you do Gaussian elimination on a blocked matrix. It is good that this pattern has a name, because once you know about it, you’ll see it in many places. Domain decomposition, marginalized gaussian distributions, low-rank update, Scattering matrices.

By composing the linear relations corresponding to the Dirchlet-Neumann relations of regions, we can build the Dirichlet-Neumann relations of larger regions.

To make this more concrete, let us take the example of electrical circuits like before. A grid of resistors is a finite difference approximation to the continuous problem

-\nabla \phi = E Electric field is gradient of potential

E = \rho j continuum ohm’s law

\nabla\cdot j = 0 current conservation

In this post, I mentioned how you can make reasonable 2 dimensional diagrams out of a monoidal category, by sort of arbitrarily flipping one wire up and one wire down as in the diagram below. This defines a horizontal and vertical composition which have to do the required book-keeping (associations) to keep an arrow in canonical form. I had considered this for the management method of weights in neural networks, but it is way more natural as the actual geometrical layout of a finite difference grid of a laplace equation.

So we can reuse our categorical circuit combinators to build a finite difference Laplace equation.

Just showing how you can bend a 4-wire monoidal box into a 2-d diagram. Ignore the labels.

This can be implemented in Haskell doing the following. Neato.

Bits and Bobbles

  • Not the tightest post, but I needed to get it out there. I have a short attention span.
  • Homology and defining simplices as categories. One way of describing Homology is about linear operators that are analogues of finite difference operators (or better yet, discrete differential geometry operators / exterior derivatives). To some degree, it is analyzing the required boundary conditions to fully define differential equations on weirdo topological surfaces, which correspond to geometrical loops/holes. You can figure this out by looking at subspaces and quotients of the difference operators. Here we have here a very category theory way of looking at partial differential equations. How does it all connect?
  • Continuous circuit models – https://en.wikipedia.org/wiki/Distributed-element_model Telegrapher’s equation is classic example.
  • Cody mentioned that I could actually build circuits and measure categorical identities in a sense. That’s kind of cool. Or I could draw conductive ink on carbon paper and actually make my string diagrams into circuits? That is also brain tickling
  • Network circuits
  • I really want to get coefficients that aren’t just doubles. allowing rational functions of a frequency \omega would allow analysis of capacitor/inductor circuits, but also tight binding model systems for fun things like topological insulators and the Haldane model http://www.philipzucker.com/topologically-non-trivial-circuit-making-haldane-model-gyrator/ . I may need to leave Haskell. I’m not seeing quite the functionality I need. Use Sympy? https://arxiv.org/abs/1605.02532 HLinear. Flint bindings for haskell? Looks unmaintained. Could also use a grobner basis package as dynamite for a mouse.
  • This is relevant for the boundary element method. Some really cool other stuff relevant here. http://people.maths.ox.ac.uk/martinsson/2014_CBMS/

Blah Blah blah: The subtlest aspect of differential equations is that of boundary conditions. It is more correct usually to consider the system of the interior differential equation and the boundary conditions as equally essential parts of the statement of the problem you are considering.

Sum of Squares optimization for Minimax Optimal Differential Eq Residuals

Huh. This doesn’t embed very well. Maybe you’re better off just clicking into the thing. It’s nice not to let things rot too long though. *shrug*

Other ideas: Can I not come up with some scheme to use Sum of Squares for rigorous upper and lower bound regions like in https://github.com/JuliaIntervals/TaylorModels.jl ? Maybe a next post.

Linear Relation Algebra of Circuits with HMatrix

Oooh this is a fun one.

I’ve talked before about relation algebra and I think it is pretty neat. http://www.philipzucker.com/a-short-skinny-on-relations-towards-the-algebra-of-programming/. In that blog post, I used finite relations. In principle, they are simple to work with. We can perform relation algebra operations like composition, meet, and join by brute force enumeration.

Unfortunately, brute force may not always be an option. First off, the finite relations grow so enormous as to be make this infeasible. Secondly, it is not insane to talk about relations or regions with an infinite number of elements, such as some continuous blob in 2D space. In that case, we can’t even in principle enumerate all the points in the region. What are we to do? We need to develop some kind of finite parametrization of regions to manipulate. This parametrization basically can’t possibly be complete in some sense, and we may choose more or less powerful systems of description for computational reasons.

In this post, we are going to be talking about linear or affine subspaces of a continuous space. These subspaces are hyperplanes. Linear subspaces have to go through the origin, while affine spaces can have an offset from the origin.

In the previous post, I mentioned that the finite relations formed a lattice, with operations meet and join. These operations were the same as set intersection and union so the introduction of the extra terminology meet and join felt a bit unwarranted. Now the meet and join aren’t union and intersection anymore. We have chosen to not have the capability to represent the union of two vectors, instead we can only represent the smallest subspace that contains them both, which is the union closed under vector addition. For example, the join of a line and point will be the plane that goes through both.

Linear/Affine stuff is great because it is so computational. Most questions you cant to ask are answerable by readily available numerical linear algebra packages. In this case, we’ll use the Haskell package HMatrix, which is something like a numpy/scipy equivalent for Haskell. We’re going to use type-level indices to denote the sizes and partitioning of these spaces so we’ll need some helper functions.

Matrices are my patronum. They make everything good. Artwork courtesy of David

In case I miss any extensions, make typos, etc, you can find a complete compiling version here https://github.com/philzook58/ConvexCat/blob/master/src/LinRel.hs

type BEnum a = (Enum a, Bounded a) 

-- cardinality. `size` was already taken by HMatrix :(
card :: forall a. (BEnum a) => Int
card = (fromEnum (maxBound @a)) - (fromEnum (minBound @a)) + 1

In analogy with sets of tuples for defining finite relations, we partition the components of the linear spaces to be “input” and “output” indices/variables \begin{bmatrix} x_1 & x_2 & x_3 & ... & y_1 & y_2 & y_3 & ... \end{bmatrix}. This partition is somewhat arbitrary and easily moved around, but the weakening of strict notions of input and output as compared to functions is the source of the greater descriptive power of relations.

Relations are extensions of functions, so linear relations are an extension of linear maps. A linear map has the form y = Ax. A linear relation has the form Ax + By = 0. An affine map has the form y = Ax + b and an affine relation has the form Ax + By = b.

There are at least two useful concrete representation for subspaces.

  1. We can write a matrix A and vector b down that corresponds to affine constraints. Ax = b. The subspace described is the nullspace of A plus a solution of the equation. The rows of A are orthogonal to the space.
  2. We can hold onto generators of subspace. x = A' l+b where l parametrizes the subspace. In other words, the subspace is generated by / is the span of the columns of A'. It is the range of A'.

We’ll call these two representations the H-Rep and V-Rep, borrowing terminology from similar representations in polytopes (describing a polytope by the inequalities that define it’s faces or as the convex combination of it’s vertices). https://inf.ethz.ch/personal/fukudak/lect/pclect/notes2015/PolyComp2015.pdf These two representations are dual in many respects.

-- HLinRel holds A x = b constraint
data HLinRel a b = HLinRel (Matrix Double) (Vector Double) deriving Show

-- x = A l + b. Generator constraint. 
data VLinRel a b = VLinRel (Matrix Double) (Vector Double) deriving Show

It is useful to have both reps and interconversion routines, because different operations are easy in the two representations. Any operations defined on one can be defined on the other by sandwiching between these conversion functions. Hence, we basically only need to define operations for one of the reps (if we don’t care too much about efficiency loss which, fair warning, is out the window for today). The bulk of computation will actually be performed by these interconversion routines. The HMatrix function nullspace performs an SVD under the hood and gathers up the space with 0 singular values.

-- if A x = b then x is in the nullspace + a vector b' solves the equation
h2v :: HLinRel a b -> VLinRel a b
h2v (HLinRel a b) = VLinRel a' b' where
        b' = a <\> b -- least squares solution
        a' = nullspace a

-- if x = A l + b, then A' . x = A' A l + A' b = A' b because A' A = 0
v2h :: VLinRel a b -> HLinRel a b
v2h (VLinRel a' b') = HLinRel a b where
        b = a #> b' -- matrix multiply
        a = tr $ nullspace (tr a') -- orthogonal space to range of a.
-- tr is transpose and not trace? A little bit odd, HMatrix.

These linear relations form a category. I’m not using the Category typeclass because I need BEnum constraints hanging around. The identity relations is x = y aka Ix - Iy = 0.

hid :: forall a. BEnum a => HLinRel a a
hid =  HLinRel (i ||| (- i)) (vzero s) where 
                            s = card @a
                            i = ident s

Composing relations is done by combining the constraints of the two relations and then projecting out the interior variables. Taking the conjunction of constraints is easiest in the H-Rep, where we just need to vertically stack the individual constraints. Projection easily done in the V-rep, where you just need to drop the appropriate section of the generator vectors. So we implement this operation by flipping between the two.

hcompose :: forall a b c. (BEnum a, BEnum b, BEnum c) => HLinRel b c -> HLinRel a b -> HLinRel a c
hcompose (HLinRel m b) (HLinRel m' b') = let a'' = fromBlocks [[       ma',           mb' ,    0       ],
                                                               [         0 ,    mb,        mc          ]] in
                                         let b'' = vjoin [b', b] in 
                                         let (VLinRel q p) = h2v (HLinRel a'' b'') in -- kind of a misuse
                                         let q' = (takeRows ca q)  -- drop rows belonging to @b
                                                       === 
                                                  (dropRows (ca + cb) q) in
                                         let [x,y,z] =  takesV [ca,cb,cc] p in
                                         let p'=  vjoin [x,z] in -- rebuild without rows for @b
                                         v2h (VLinRel q' p') -- reconstruct HLinRel
                                       where 
                                           ca = card @a
                                           cb = card @b 
                                           cc = card @c
                                           sb = size b -- number of constraints in first relation
                                           sb' = size b' -- number of constraints in second relation
                                           ma' = takeColumns ca m'
                                           mb' = dropColumns ca m'
                                           mb = takeColumns cb m
                                           mc = dropColumns cb m

(<<<) :: forall a b c. (BEnum a, BEnum b, BEnum c) => HLinRel b c -> HLinRel a b -> HLinRel a c
(<<<) = hcompose

We can implement the general cadre of relation operators, meet, join, converse. I feel the converse is the most relational thing of all. It makes inverting a function nearly a no-op.

hjoin :: HLinRel a b -> HLinRel a b -> HLinRel a b
hjoin v w = v2h $ vjoin' (h2v v) (h2v w)

-- hmatrix took vjoin from me :(
-- joining means combining generators and adding a new generator
-- Closed under affine combination l * x1 + (1 - l) * x2 
vjoin' :: VLinRel a b -> VLinRel a b -> VLinRel a b
vjoin' (VLinRel a b) (VLinRel a' b') = VLinRel (a ||| a' ||| (asColumn (b - b'))) b

-- no constraints, everything
-- trivially true
htop :: forall a b. (BEnum a, BEnum b) => HLinRel a b 
htop = HLinRel (vzero (1,ca + cb)) (konst 0 1) where 
                                      ca = card @a
                                      cb = card @b 

-- hbottom?
                              
hconverse :: forall a b. (BEnum a, BEnum b) => HLinRel a b -> HLinRel b a 
hconverse (HLinRel a b) = HLinRel ( (dropColumns ca a) |||  (takeColumns ca a)) b where 
    ca = card @a
    cb = card @b  

Relational inclusion is the question of subspace inclusion. It is fairly easy to check if a VRep is in an HRep (just see plug the generators into the constraints and see if they obey them) and by using the conversion functions we can define it for arbitrary combos of H and V.


-- forall l. A' ( A l + b) == b'
-- is this numerically ok? I'm open to suggestions.
vhsub :: VLinRel a b -> HLinRel a b -> Bool
vhsub (VLinRel a b) (HLinRel a' b') = (naa' <=  1e-10 * (norm_2 a') * (norm_2 a)  ) && ((norm_2 ((a' #> b) - b')) <= 1e-10 * (norm_2 b')  ) where
          naa' = norm_2 (a' <> a)

hsub :: HLinRel a b -> HLinRel a b -> Bool
hsub h1 h2 = vhsub (h2v h1) h2

heq :: HLinRel a b -> HLinRel a b -> Bool
heq a b = (hsub a b) && (hsub b a)


instance Ord (HLinRel a b) where
  (<=) = hsub
  (>=) = flip hsub 

instance Eq (HLinRel a b) where
  (==) = heq

It is useful the use the direct sum of the spaces as a monoidal product.

hpar :: HLinRel a b -> HLinRel c d -> HLinRel (Either a c) (Either b d)
hpar (HLinRel mab v) (HLinRel mcd v') = HLinRel (fromBlocks [ [mab, 0], [0 , mcd]]) (vjoin [v, v']) where

hleft :: forall a b. (BEnum a, BEnum b) => HLinRel a (Either a b)
hleft = HLinRel ( i ||| (- i) ||| (konst 0 (ca,cb))) (konst 0 ca) where 
    ca = card @a
    cb = card @b  
    i = ident ca

hright :: forall a b. (BEnum a, BEnum b) => HLinRel b (Either a b)
hright = HLinRel ( i ||| (konst 0 (cb,ca)) ||| (- i) ) (konst 0 cb) where 
    ca = card @a
    cb = card @b  
    i = ident cb

htrans :: HLinRel a (Either b c) -> HLinRel (Either a b) c 
htrans (HLinRel m v) = HLinRel m v

hswap :: forall a b. (BEnum a, BEnum b) => HLinRel (Either a b) (Either b a)
hswap = HLinRel (fromBlocks [[ia ,0,0 ,-ia], [0, ib,-ib,0]]) (konst 0 (ca + cb)) where 
        ca = card @a
        cb = card @b  
        ia = ident ca
        ib = ident cb


hsum :: forall a. BEnum a => HLinRel (Either a a) a
hsum = HLinRel ( i ||| i ||| - i ) (konst 0 ca)  where 
        ca = card @a 
        i= ident ca

hdup :: forall a. BEnum a => HLinRel a (Either a a)
hdup = HLinRel (fromBlocks [[i, -i,0 ], [i, 0, -i]]) (konst 0 (ca + ca))  where 
        ca = card @a 
        i= ident ca

hdump :: HLinRel a Void
hdump = HLinRel 0 0

hlabsorb ::forall a. BEnum a => HLinRel (Either Void a) a
hlabsorb = HLinRel m v where (HLinRel m v) = hid @a 

A side note: Void causes some consternation. Void is the type with no elements and is the index type of a 0 dimensional space. It is the unit object of the monoidal product. Unfortunately by an accident of the standard Haskell definitions, actual Void is not a BEnum. So, I did a disgusting hack. Let us not discuss it more.

Circuits

Baez and Fong have an interesting paper where they describe building circuits using a categorical graphical calculus. We have the pieces to go about something similar. What we have here is a precise way in which circuit diagrams can be though of as string diagrams in a monoidal category of linear relations.

An idealized wire has two quantities associated with it, the current flowing through it and the voltage it is at.

-- a 2d space at every wire or current and voltage.
data IV = I | V deriving (Show, Enum, Bounded, Eq, Ord)

When we connect wires, the currents must be conserved and the voltages must be equal. hid and hcompose from above still achieve that. Composing two independent circuits in parallel is achieve by hpar.

Independent resistors in parallel.

We will want some basic tinker toys to work with.

A resistor in series has the same current at both ends and a voltage drop proportional to the current

resistor :: Double -> HLinRel IV IV
resistor r = HLinRel ( (2><4)  [ 1,0,-1,   0,
                                 r, 1, 0, -1]) (konst 0 2) 

Composing two resistors in parallel adds the resistance. (resistor r1) <<< (resistor r2) == resistor (r1 + r2))

A bridging resistor allows current to flow between the two branches

bridge :: Double -> HLinRel (Either IV IV) (Either IV IV)
bridge r = HLinRel (  (4><8) [ 1,0, 1,  0, -1, 0, -1,  0, -- current conservation
                               0, 1, 0, 0, 0, -1 , 0,  0, --voltage maintained left
                               0, 0, 0, 1, 0,  0,  0, -1, -- voltage maintained right
                               r, 1, 0,-1, -r,  0,  0, 0  ]) (konst 0 4)  
A bridging resistor

Composing two bridge circuits is putting the bridge resistors in parallel. The conductance G=\frac{1}{R} of resistors in parallel adds. hcompose (bridge r1) (bridge r2) == bridge 1 / (1/r1 + 1/r2).

parallel resistors compose

An open circuit allows no current to flow and ends a wire. open ~ resistor infinity

open :: HLinRel IV Void
open = HLinRel (fromList [[1,0]]) (konst 0 1)

At branching points, the voltage is maintained, but the current splits.

cmerge :: HLinRel (Either IV IV) IV
cmerge = HLinRel (fromList [[1, 0, 1, 0, -1, 0],
                           [0,1,0,0,0 ,-1  ],
                           [0,0,0,1, 0, -1]])  (konst 0 3)

This cmerge combinator could also be built using a short == bridge 0 , composing a branch with open, and then absorbing the Void away.

We can bend wires up or down by using a composition of cmerge and open.

cap :: HLinRel  (Either IV IV) Void
cap  = hcompose open cmerge

cup :: HLinRel Void (Either IV IV)
cup = hconverse cap

ground :: HLinRel IV Void
ground = HLinRel ( (1><2) [ 0 , 1 ]) (vzero 1) 

Voltage and current sources enforce current and voltage to be certain values

vsource :: Double -> HLinRel IV IV
vsource v = HLinRel ( (2><4) [ 1,0,-1,   0,
                               0, 1, 0, -1]) (fromList [0,v])  

isource :: Double -> HLinRel IV IV
isource i = HLinRel (fromList [ [1,0, -1,   0], -- current conservation
                                [1, 0, 0,  0]]) (fromList [0,i])  

Measurements of circuits proceed by probes.

type VProbe = ()
vprobe :: HLinRel IV VProbe
vprobe = HLinRel ( (2><3)  [1,0,0,
                            0,1,-1]) (konst 0 2)  

Inductors and capacitors could be included easily, but would require the entries of the HMatrix values to be polynomials in the frequency \omega, which it does not support (but it could!). We'll leave those off for another day.

We actually can determine that the rules suggested above are being followed by computation.

r20 :: HLinRel IV IV
r20 = resistor 20

main :: IO ()
main = do
            print (r20 == (hid <<< r20))
            print (r20 == r20 <<< hid)
            print (r20 == (hmeet r20 r20))
            print $ resistor 50 == r20 <<< (resistor 30)
            print $ (bridge 10) <<< (bridge 10) == (bridge 5)
            print $ v2h (h2v r20) == r20
            print $ r20 <= htop
            print $ hconverse (hconverse r20) == r20
            print $ (open <<< r20) == open

Bits and Bobbles

  • Homogenous systems are usually a bit more elegant to deal with, although a bit more unfamiliar and abstract.
  • Could make a pandas like interface for linear relations that uses numpy/scipy.sparse for the computation. All the swapping and associating is kind of fun to design, not so much to use. Labelled n-way relations are nice for users.
  • Implicit/Lazy evaluation. We should let the good solvers do the work when possible. We implemented our operations eagerly. We don't have to. By allowing hidden variables inside our relations, we can avoid the expensive linear operations until it is useful to actually compute on them.
  • Relational division = quotient spaces?
  • DSL. One of the beauties of the pointfree/categorical approach is that you avoid the need for binding forms. This makes for a very easily manipulated DSL. The transformations feel like those of ordinary algebra and you don't have to worry about the subtleties of index renaming or substitution under binders.
  • Sparse is probably really good. We have lots of identity matrices and simple rearrangements. It is very wasteful to use dense operations on these.
  • Schur complement https://en.wikipedia.org/wiki/Schur_complement are the name in the game for projecting out pieces of linear problems. We have some overlap.
  • Linear relations -> Polyhedral relations -> Convex Relations. Linear is super computable, polyhedral can blow up. Rearrange a DSL to abuse Linear programming as much as possible for queries.
  • Network circuits. There is an interesting subclass of circuits that is designed to be pretty composable.

https://en.wikipedia.org/wiki/Two-port_network Two port networks are a very useful subclass of electrical circuits. They model transmission lines fairly well, and easily composable for filter construction.

It is standard to describe these networks by giving a linear function between two variables and the other two variables. Depending on your choice of which variables depend on which, these are called the z-parameters, y-parameters, h-parameters, scattering parameters, abcd parameters. There are tables of formula for converting from one form to the others. The different parameters hold different use cases for composition and combining in parallel or series. From the perspective of linear relations this all seems rather silly. The necessity for so many descriptions and the confusing relationship between them comes from the unnecessary and overly rigid requirement of have a linear function-like relationship rather than just a general relation, which depending of the circuit may not even be available (there are degenerate configurations where two of the variables do not imply the values of the other two). A function relationship is always a lie (although a sometimes useful one), as there is always back-reaction of new connections.

-- voltage divider
divider :: Double -> Double -> HLinRel (Either IV IV) (Either IV IV)
divider r1 r2 = hcompose (bridge r2) (hpar (resistor r1) hid) 

The relation model also makes clearer how to build lumped models out of continuous ones. https://en.wikipedia.org/wiki/Lumped-element_model

https://en.wikipedia.org/wiki/Transmission_line https://en.wikipedia.org/wiki/Distributed-element_model

    null
  • Because the type indices have no connection to the actual data types (they are phantom) it is a wise idea to use smart constructors that check that the sizes of the matrices makes sense.

-- smart constructors
hLinRel :: forall a b. (BEnum a, BEnum b) => Matrix Double -> Vector Double -> Maybe (HLinRel a b) 
hLinRel m v | cols m == (ca + cb) &&  (size v == rows m)  = Just (HLinRel m v)
            |  otherwise = Nothing  where 
                 ca = card @a
                 cb = card @b  
  • Nonlinear circuits. Grobner Bases and polynomial relations?
  • Quadratic optimization under linear constraints. Can't get it to come out right yet. Clutch for Kalman filters. Nice for many formulations like least power, least action, minimum energy principles. Edit: I did more in this direction here http://www.philipzucker.com/categorical-lqr-control-with-linear-relations/
  • Quadratic Operators -> Convex operators. See last chapter of Rockafellar.
  • Duality of controllers and filters. It is well known (I think) that for ever controller algorithm there is a filter algorithm that is basically the same thing.
    • LQR - Kalman
    • Viterbi filter - Value function table
    • particle filter - Monte Carlo control
    • Extended Kalman - iLQR-ish? Use local approximation of dynamics
    • unscented kalman - ?

Refs

Gröbner Bases and Optics

Geometrical optics is a pretty interesting topic. It really is almost pure geometry/math rather than physics.

Huygens principle says that we can compute the propagation of a wave by considering the wavelets produced by each point on the wavefront separately.

In physical optics, this corresponds to the linear superposition of the waves produced at each point by a propagator function \int dx' G(x,x'). In geometrical optics, which was Huygens original intent I think (these old school guys were VERY geometrical), this is the curious operation of taking the geometrical envelope of the little waves produced by each point.

The gist of Huygens principles. Ripped from wikipedia

https://en.wikipedia.org/wiki/Envelope_(mathematics) The envelope is an operation on a family of curves. You can approximate it by a finite difference procedure. Take two subsequent curves close together in the family, find their intersection. Keep doing that and the join up all the intersections. This is roughly the approach I took in this post http://www.philipzucker.com/elm-eikonal-sol-lewitt/

Taking the envelope of a family of lines. Ripped from wikipedia

You can describe a geometrical wavefront implicitly with an equations \phi(x,y) = 0. Maybe the wavefront is a circle, or a line, or some wacky shape.

The wavelet produced by the point x,y after a time t is described implicitly by d(\vec{x},\vec{x'})^2 - t^2 = (x-x')^2 + (y-y')^2 - t^2 = 0.

This described a family of curves, the circles produced by the different points of the original wavefront. If you take the envelope of this family you get the new wavefront at time t.

How do we do this? Grobner bases is one way if we make \phi a polynomial equation. For today’s purposes, Grobner bases are a method for solving multivariate polynomial equations. Kind of surprising that such a thing even exists. It’s actually a guaranteed terminating algorithm with horrific asymptotic complexity. Sympy has an implementation. For more on Grobner bases, the links here are useful http://www.philipzucker.com/dump-of-nonlinear-algebra-algebraic-geometry-notes-good-links-though/. Especially check out the Cox Little O’Shea books

The algorithm churns on a set of multivariate polynomials and spits out a new set that is equivalent in the sense that the new set is equal to zero if and only if the original set was. However, now (if you ask for the appropriate term ordering) the polynomials are organized in such a way that they have an increasing number of variables in them. So you solve the 1-variable equation (easy), and substitute into the 2 variable equation. Then that is a 1-variable equation, which you solve (easy) and then you substitute into the three variable equation, and so on. It’s analogous to gaussian elimination.

So check this out

from sympy import *


x1, y1, x2, y2, dx, dy = symbols('x1, y1, x2, y2, dx, dy')

def dist(x,y,d):
    return x**2 + y**2 - d**2

e1 = dist(x1,y1,2) # the original circle of radius 2
e2 = dist(x1-x2 ,y1 - y2 , 1) # the parametrized wavefront after time 1


# The two envelope conditions.
e3 = diff(e1,x1)*dx + diff(e1,y1)*1
e4 = diff(e2,x1)*dx + diff(e2,y1)*1


envelope = groebner([e1,e2,e3,e4], y1, x1, dx, dy, x2, y2, order='lex')[-1]
plot_implicit(e1, show=False)
plot_implicit(envelope, show = True)

The envelope conditions can be found by introducing two new differential variables dx, and dy. They are constrained to lie tangent to the point on the original circle by the differential equation e3, and then we require that the two subsequent members of the curve family intersect by the equation e4. Hence we get the envelope. Ask for the Grobner basis with that variable ordering gives us an implicit equations for x2, y2 with no mention of the rest if we just look at the last equation of the Grobner basis.

I set arbitrarily dy = 1 because the overall scale of them does not matter, only the direction. If you don’t do this, the final equation is scaled homogenously in dy.

This does indeed show the two new wavefronts at radius 1 and radius 3.

Original circle radius = 2

x1**2 + y1**2 – 4 = 0
Evolved circles found via grobner basis.


(x2**2 + y2**2 – 9)*(x2**2 + y2**2 – 1) = 0

Here’s a different one of a parabola using e1 =  y1 – x1 + x1**2

Original curve y1 – x1 + x1**2 = 0
After 1 time step.




16*x2**6 – 48*x2**5 + 16*x2**4*y2**2 + 32*x2**4*y2 + 4*x2**4 – 32*x2**3*y2**2 – 64*x2**3*y2 + 72*x2**3 + 32*x2**2*y2**3 + 48*x2**2*y2 – 40*x2**2 – 32*x2*y2**3 + 16*x2*y2**2 – 16*x2*y2 – 4*x2 + 16*y2**4 + 32*y2**3 – 20*y2**2 – 36*y2 – 11 = 0

The weird lumpiness here is plot_implicit’s inability to cope, not the actually curve shape Those funky prongs are from a singularity that forms as the wavefront folds over itself.

I tried using a cubic curve x**3 and the grobner basis algorithm seems to crash my computer. 🙁 Perhaps this is unsurprising. https://en.wikipedia.org/wiki/Elliptic_curve ?

I don’t know how to get the wavefront to go in only 1 direction? As is, it is propagating into the past and the future. Would this require inequalities? Sum of squares optimization perhaps?

Edit:

It’s been suggested on reddit that I’d have better luck using other packages, like Macaulay2, MAGMA, or Singular. Good point

Also it was suggested I use the Dixon resultant, for which there is an implementation in sympy, albeit hidden. Something to investigate

https://github.com/sympy/sympy/blob/master/sympy/polys/multivariate_resultants.py

https://nikoleta-v3.github.io/blog/2018/06/05/resultant-theory.html

Another interesting angle might be to try to go numerical with a homotopy continuation method with phcpy

http://homepages.math.uic.edu/~jan/phcpy_doc_html/welcome.html

https://www.semion.io/doc/solving-polynomial-systems-with-phcpy

or pybertini https://ofloveandhate-pybertini.readthedocs.io/en/feature-readthedocs_integration/intro.html

The Classical Coulomb Gas as a Mixed Integer Quadratic Program

The coulomb gas is a model of electrostatics where you take the discreteness of charge into account. That is what makes it hard compared to the continuous charge problem. Previously, I’ve used mixed integer programming to find lowest energy states of the ising model. This is even more obvious application of mixed integer programming to a physics model.

We ordinarily consider electric charge to be a continuum, but it isn’t. It comes in chunks of the electron charge. Historically, people didn’t even know that for quite a while. It is usually a reasonable approximation for most purposes to consider electric charge to be continuous

If you consider a network of capacitors cooled to the the level that there is not enough thermal energy to borrow to get an electron to jump, the charges on the capacitors will be observably discretized. With a sufficiently slow cooling to this state, the charges should arrange themselves into the lowest energy state.

The coulomb gas model also is of interest due to its connections to the XY model, which I’ve taken a stab at with mixed integer programming before. The coulomb gas models the energy of vortices in that model. I think the connection between the models actually requires a statistical or quantum mechanical context though, whereas we’ve been looking at the classical energy minimization.

We can formulate the classical coulomb gas problem very straightforwardly as a mixed integer quadratic program. We can easily include an externally applied field and a charge conservation constraint if we so desire within the framework.

We write this down in python using the cvxpy library, which has a built in free MIQP solver, albeit not a very good one. Commercial solvers are probably quite a bit better.

import cvxpy as cvx
import numpy as np
#grid size
N = 5
# charge variables
q = cvx.Variable( N*N ,integer=True)

# build our grid
x = np.linspace(0,1,N) 
y = np.linspace(0,1,N) 
X, Y = np.meshgrid(x,y, indexing='ij')
x1 = X.reshape(N,N,1,1)
y1 = Y.reshape(N,N,1,1)
x2 = X.reshape(1,1,N,N)
y2 = Y.reshape(1,1,N,N)
eps = 0.1 / N #regularization factor for self energy. convenience mostly
V = 1. / ((x1-x2)**2 + (y1-y2)**2 + eps**2)** ( 1 / 2)
V = V.reshape( (N*N,N*N) )

U_external = 100 * Y.flatten() # a constant electric field in the Y direction 
energy = cvx.quad_form(q,V) + U_external*q

# charge conservation constraint
prob = cvx.Problem(cvx.Minimize(energy),[cvx.sum(q)==0])
res = prob.solve(verbose=True)

print(q.value.reshape((N,N)))

#plotting junk

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D  
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

ax.plot_surface(X, Y, q.value.reshape((N,N)))
plt.show()
A plot of charge in a constant external electric field.

The results seems reasonable. It makes sense for charge to go in the direction of the electric field. Going to the corners makes sense because then like charges are far apart. So this might be working. Who knows.

Interesting places to go with this:

Prof Vanderbei shows how you can embed an FFT to enable making statements about both the time and frequency domain while keeping the problem sparse. The low time/memory N log(N) complexity of the FFT is reflected in the spasity of the resulting linear program.

https://vanderbei.princeton.edu/tex/ffOpt/ffOptMPCrev4.pdf

Here’s a sketch about what this might look like. Curiously, looking at the actual number of nonzeros in the problem matrices, there are way too many. I am not sure what is going on. Something is not performing as i expect in the following code

import cvxpy as cvx
import numpy as np
import scipy.fftpack # import fft, ifft
def swizzle(x,y):
    assert(x.size == y.size)
    N = x.size
    s =  np.exp(-2.j * np.pi * np.arange(N) / N)
    #print(s)
    #ret = cvx.hstack( [x + s*y, x - s*y]) 
    #print(ret.shape)
    return cvx.hstack( [x - s*y, x + s*y]) 
    

def fft(x):
    N = x.size
    #assert(2**int(log2(N)) == N) # power of 2

    if N == 1:
        return x, []
    else:
        y = cvx.reshape(x,(N//2,2))
        c = []
        even, ce = fft(y[:,0])
        c += ce
        odd, co = fft(y[:,1])
        c += co
        z = cvx.Variable(N, complex=True)
        c += [z == swizzle(even,odd)]
        return z, c

N = 256
x = cvx.Variable(N, complex=True)
z, c = fft(x)
v = np.zeros(N) #np.ones(N) #np.random.rand(N)
v[0]= 1
c += [x == v]
prob = cvx.Problem( cvx.Minimize(1), c)
#print(prob.get_problem_data(cvx.OSQP))
res = prob.solve(verbose=True)
#print(x.value)
print(z.value)
print(scipy.fftpack.fft(v))
print(scipy.fftpack.fft(v) - z.value)

The equivalent dense DFT:

x = cvx.Variable(N, complex=True)
fred = cvx.Variable(N, complex=True)
c = [fred == np.exp(-2.j * np.pi * np.arange(N).reshape((N,1)) * np.arange(N).reshape((1,N)) / N) * x]
prob = cvx.Problem( cvx.Minimize(1), c)
print(prob.get_problem_data(cvx.OSQP))

It would be possible to use a frequency domain solution of the interparticle energy rather than the explicit coulomb law form. Hypothetically this might increase the sparsity of the problem.

It seems very possible to me in a similar manner to embed a fast multipole method or barnes-hut approximation within a MIQP. Introducing explicit charge summary variables for blocks would create a sparse version of the interaction matrix. So that’s fun.

Annihilating My Friend Will with a Python Fluid Simulation, Like the Cur He Is

Will, SUNDER!
A color version

As part of my random walk through topics, I was playing with shaders. I switched over to python because I didn’t feel like hacking out a linear solver.

There are a number of different methods for simulating fluids. I had seen Dan Piponi’s talk on youtube where he describes Jos Stam’s stable fluids and thought it made it all seem pretty straightforward. Absolutely PHENOMENAL talk. Check it out! We need to

  • 1. apply forces. I applied a gravitational force proportional to the total white of the image at that point
  • 2. project velocity to be divergence free. This makes it an incompressible fluid. We also may want to project the velocity to be zero on boundaries. I’ve done a sketchy job of that. This requires solving a Laplace equation. A sketch:
    • v_{orig} = v_{incomp} + \nabla w
    • \nabla \cdot v_{incomp}=0
    • \nabla ^2 w = \nabla \cdot v_{orig}. Solve for w
    • v_{incomp}=v_{orig} - \nabla w
  • 3. Advect using interpolation. Advect backwards in time. Use v(x,t+dt) \approx v(x-v(x)*dt,t) rather than v(x,t+dt) \approx v(x,t)+dv(x,t)*dt . This is intuitively related to the fact that backward Euler is more stable than forward Euler. Numpy had a very convenient function for this step https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.map_coordinates.html#scipy.ndimage.map_coordinates

Given those basic ideas, I was flying very much by the seat of my pants. I wasn’t really following any other codes. I made this to look cool. It is not a scientific calculation. I have no idea what the error is like. With a critical eye, I can definitely spot weird oscillatory artifacts. Maybe a small diffusion term would help?

When you solve for the corrections necessary to the velocity to make it incompressible \nabla \cdot v = 0 , add the correction ONLY to the original field. As part of the incompressible solving step, you smooth out the original velocity field some. You probably don’t want that. By adding only the correction to the original field, you maintain the details in the original

When you discretize a domain, there are vertices, edges, and faces in your discretization. It is useful to think about upon which of these you should place your field values (velocity, pressure, electric field etc). I take it as a rule of thumb that if you do the discretization naturally, you are more likely to get a good numerical method. For example, I discretized my velocity field in two ways. A very natural way is on the edges of the graph. This is because velocity is really a stand in for material flux. The x component of the velocity belongs on the x oriented edges of the graph on the y component of velocity on the y oriented edges. If you count edges, this means that they actually in an arrays with different dimensions. There are one less edges than there are vertices.

This grid is 6×4 of vertices, but the vx edges are 6×3, and the vy edges are 5×4. The boxes are a grid 5×3.

For each box, we want to constrain that the sum of velocities coming out = 0. This is the discretization of the \nabla \cdot v = 0 constraint. I’m basing this on my vague recollections of discrete differential geometry and some other things I’ve see. That fields sometimes live on the edges of the discretization is very important for gauge fields, if that means anything to you. I did not try it another way, so maybe it is an unnecessary complication.

Since I needed velocities at the vertices of the grid, I do have a simple interpolation step from the vertices to the edges. So I have velocities being computed at both places. The one that is maintained between iterations lives on the vertices.

At small resolutions the code runs at real time. For the videos I made, it is probably running ~10x slower than real time. I guarantee you can make it better. Good enough for me at the moment. An FFT based Laplace solver would be fast. Could also go into GPU land? Multigrid? Me dunno.

I tried using cvxpy for the incompressibility solve, which gives a pleasant interface and great power of adding constraints, but wasn’t getting good results. i may have had a bug

This is some code just to perform the velocity projection step and plot the field. I performed the projection to 0 on the boundaries using an alternating projection method (as discussed in Piponi’s talk), which is very simple and flexible but inefficient. It probably is a lot more appropriate when you have strange changing boundaries. I could have built the K matrix system to do that too.

The input velocity field is spiraling outwards (not divergence free, there is a fluid source in the center)
We project out the divergence free part of that velocity field, and project it such that the velocity does not point out at the boundary. Lookin good.

Presolving the laplacian matrix vastly sped up each iteration. Makes sense.

Why in gods name does sparse.kron_sum have the argument ordering it does? I had a LOT of trouble with x vs y ordering. np.meshgrid wasn’t working like I though it should. Images might have a weird convention? What a nightmare. I think it’s ok now? Looks good enough anyway.

And here is the code to make the video. I converted to image sequence to an mp4 using ffmpeg

ffmpeg -i ./%06d.jpg will.mp4
import numpy as np
import cv2
from scipy import interpolate
from scipy import ndimage
from scipy import sparse
import scipy.sparse.linalg as linalg # import spsolve

#ffmpeg -i ./%06d.jpg will.mp4

### Setup 

dt = 0.01

img = cv2.imread('will.jpg')
# make image smaller to make run faster if you want
#img = cv2.pyrDown(img)
#img = cv2.pyrDown(img)

Nx = img.shape[0]
Ny = img.shape[1] 


v = np.zeros((Nx,Ny,2))

x = np.linspace(0,1,Nx, endpoint=False) 
y = np.linspace(0,1,Ny, endpoint=False) 
X, Y = np.meshgrid(x,y, indexing='ij')

#v[:,:,0] = -Y + 0.5
#v[:,:,1] = X - 0.5


#### Build necessary derivative and interpolation matrices

def build_grad(N):
    # builds N-1 x N finite difference matrix 
    data = np.array([-np.ones(N), np.ones(N-1)])
    return sparse.diags(data, np.array([0, 1]), shape= (N-1,N))

# gradient operators
gradx = sparse.kron(build_grad(Nx), sparse.identity(Ny-1))
grady = sparse.kron(sparse.identity(Nx-1), build_grad(Ny))

def build_K(N):
    # builds N-1 x N - 1   K second defivative matrix
    data = np.array([-np.ones(N-2), 2*np.ones(N-1), -np.ones(N-2)])
    diags =np.array([-1, 0, 1])
    return sparse.diags(data, diags )

# Laplacian operator . Zero dirichlet boundary conditions
# why the hell is this reversed? Sigh.
K = sparse.kronsum(build_K(Ny),build_K(Nx))
Ksolve = linalg.factorized(K)

def build_interp(N):
    data = np.array([np.ones(N)/2., np.ones(N-1)/2.])
    diags = np.array([0, 1])
    return sparse.diags(data, diags, shape= (N-1,N))
interpy = sparse.kron(sparse.identity(Nx), build_interp(Ny))
interpx = sparse.kron(build_interp(Nx), sparse.identity(Ny))


def projection_pass(vx,vy):
    # alternating projection? Not necessary. In fact stupid. but easy.
    '''
    vx[0,:] = 0
    vx[-1,:] = 0
    vy[:,0] = 0
    vy[:,-1] = 0
    '''
    vx[0,:] /= 2.
    vx[-1,:] /= 2.
    vy[:,0] /= 2.
    vy[:,-1] /= 2.

    div = gradx.dot(vx.flatten()) + grady.dot(vy.flatten()) #calculate divergence

    w = Ksolve(div.flatten())#spsolve(K, div.flatten()) #solve potential

    return gradx.T.dot(w).reshape(Nx,Ny-1), grady.T.dot(w).reshape(Nx-1,Ny)
    
for i in range(300):
    #while True: #
    v[:,:,0] += np.linalg.norm(img,axis=2) * dt * 0.001 # gravity force

    # interpolate onto edges
    vx = interpy.dot(v[:,:,0].flatten()).reshape(Nx,Ny-1)
    vy = interpx.dot(v[:,:,1].flatten()).reshape(Nx-1,Ny)
    # project incomperessible

    dvx, dvy = projection_pass(vx,vy)

    #interpolate change back to original grid
    v[:,:,0] -= interpy.T.dot(dvx.flatten()).reshape(Nx,Ny)
    v[:,:,1] -= interpx.T.dot(dvy.flatten()).reshape(Nx,Ny)

    #advect everything
    coords = np.stack( [(X - v[:,:,0]*dt)*Nx, (Y - v[:,:,1]*dt)*Ny], axis=0)
    print(coords.shape)
    print(v.shape)
    for j in range(3):
        img[:,:,j] = ndimage.map_coordinates(img[:,:,j], coords, order=5, mode='wrap')
    v[:,:,0] = ndimage.map_coordinates(v[:,:,0], coords, order=5, mode='wrap')
    v[:,:,1] = ndimage.map_coordinates(v[:,:,1], coords, order=5, mode='wrap')

    cv2.imshow('image',img)

    cv2.imwrite(f'will_anim3/{i:06}.jpg',img)
    k = cv2.waitKey(30) & 0xFF
    if k == ord(' '):
       break

cv2.destroyAllWindows()

Code to produce the velocity graphs above.

import cvxpy as cvx
import numpy as np
from scipy import sparse
from scipy.sparse.linalg import spsolve
import matplotlib.pyplot as plt

Nx = 50
Ny = 30
# velcitites live on the edges
vx = np.zeros((Nx,Ny-1))
vy = np.zeros((Nx-1,Ny))
x = np.linspace(0,1,Nx, endpoint=False) 
y = np.linspace(0,1,Ny, endpoint=False) 
X, Y = np.meshgrid(x,y, indexing='ij')
print(X[0,:])
print(X.shape)
vx[:,:] = Y[:,1:] - 1 + X[:,1:]
vy[:,:] = -X[1:,:]  + Y[1:,:]



data = np.array([-np.ones(Nx), np.ones(Nx-1)])
diags = np.array([0, 1])
grad = sparse.diags(data, diags, shape= (Nx-1,Nx))
print(grad.toarray())

gradx = sparse.kron(grad, sparse.identity(Ny-1))

data = np.array([-np.ones(Ny), np.ones(Ny-1)])
diags = np.array([0, 1])
grad = sparse.diags(data, diags, shape= (Ny-1,Ny))
print(grad.toarray())

grady = sparse.kron(sparse.identity(Nx-1), grad)
print(gradx.shape)


data = np.array([-np.ones(Nx-2), 2*np.ones(Nx-1), -np.ones(Nx-2)])
diags =np.array([-1, 0, 1])
Kx = sparse.diags(data, diags )

data = np.array([-np.ones(Ny-2), 2*np.ones(Ny-1), -np.ones(Ny-2)])
diags =np.array([-1, 0, 1])
Ky = sparse.diags(data, diags )

K = sparse.kronsum(Ky,Kx)

plt.quiver(X[1:,1:], Y[1:,1:], vx[1:,:] + vx[:-1,:], vy[:,1:] + vy[:,:-1])

for i in range(60):
    div = gradx.dot(vx.flatten()) + grady.dot(vy.flatten())
    print("div size", np.linalg.norm(div))
    div = div.reshape(Nx-1,Ny-1)

    w = spsolve(K, div.flatten())

    vx -= gradx.T.dot(w).reshape(Nx,Ny-1)
    vy -= grady.T.dot(w).reshape(Nx-1,Ny)
    
    # alternating projection? Not necessary. In fact stupid. but easy.
    div = gradx.dot(vx.flatten()) + grady.dot(vy.flatten())
    print("new div size", np.linalg.norm(div))
    vx[0,:] = 0
    vx[-1,:] = 0
    vy[:,0] = 0
    vy[:,-1] = 0
div = gradx.dot(vx.flatten()) + grady.dot(vy.flatten())
print("new div size", np.linalg.norm(div))

print(vx)
plt.figure()
plt.quiver(X[1:,1:], Y[1:,1:], vx[1:,:] + vx[:-1,:], vy[:,1:] + vy[:,:-1])
plt.show()

I should give a particle in cell code a try

Links

Edit:

GregTJ found this post useful and made an even better simulator! Nice

https://github.com/GregTJ/stable-fluids

Solving the XY Model using Mixed Integer Optimization in Python

There are many problems in physics that take the form of minimizing the energy. Often this energy is taken to be quadratic in the field. The canonical example is electrostatics. The derivative of the potential \phi gives the electric field E. The energy is given as \int (|\nabla \phi|^2 + \phi \rho) d^3 x . We can encode a finite difference version of this (with boundary conditions!) directly into a convex optimization modelling language like so.

import cvxpy as cvx
import numpy as np
import matplotlib.pyplot as plt
import scipy.linalg
from mpl_toolkits import mplot3d


N = 10

# building a finite difference matrix. It is rectangle of size N x (N-1). It maps from the vertices of our grid to the lines in between them, where derivatives live.
col = np.zeros(N)
col[0] = -1
col[1] = 1
delta = scipy.linalg.toeplitz(col, np.zeros(N-1)).T
print(delta)

gradx = np.kron(delta, np.eye(N))
grady = np.kron(np.eye(N), delta)

# a variable for our potential
phi = cvx.Variable((N, N))

# vectorization is useful. It flattens out the x-y 2Dness.
phivec = cvx.vec(phi)
gradxvec = gradx.reshape(-1, N*N)
gradyvec = grady.reshape(-1, N*N)

V = cvx.sum_squares(gradxvec * phivec) + cvx.sum_squares(gradyvec * phivec)

constraints = []
# boundary conditions. Dirichlet
constraints += [phi[:,0] == 0, phi[0,:] == 0, phi[:,-1] == 0, phi[-1,:] == 0 ]

# fixed charge density rho
rho = np.zeros((N,N))
rho[N//2,N//2] = 1
print(rho)

# objective is energy
objective = cvx.Minimize(V + cvx.sum(cvx.multiply(rho,phi)))
prob = cvx.Problem(objective, constraints)
res = prob.solve()
print(res)
print(phi.value)

# Plotting 
x = np.linspace(-6, 6, N)
y = np.linspace(-6, 6, N)

X, Y = np.meshgrid(x, y)
fig = plt.figure()

ax = plt.axes(projection='3d')
ax.plot_surface(X, Y, phi.value, rstride=1, cstride=1,
                cmap='viridis', edgecolor='none')
plt.show()
The resulting logarithm potential

It is noted rarely in physics, but often in the convex optimization world that the barrier between easy and hard problems is not linear vs. nonlinear, it is actually more like convex vs. nonconvex. Convex problems are those that are bowl shaped, on round domains. When your problem is convex, you can’t get caught in valleys or on corners, hence greedy local methods like gradient descent and smarter methods work to find the global minimum. When you differentiate the energy above, it results in the linear Laplace equations \nabla^2 \phi = \rho. However, from the perspective of solvability, there is not much difference if we replace the quadratic energy with a convex alternative.

def sum_abs(x):
    return cvx.sum(cvx.abs(x))
V = cvx.sum_squares(gradxvec * phivec) + cvx.sum_squares(gradyvec * phivec)
V = cvx.sum(cvx.huber(gradxvec * phivec)) + cvx.sum(cvx.huber(gradyvec * phivec))
V = cvx.pnorm(gradxvec * phivec, 3) + cvx.pnorm(gradyvec * phivec, 3)

a = 1 
dxphi = gradxvec * phivec
dyphi = gradyvec * phivec
V = cvx.sum(cvx.maximum( -a - dxphi, dxphi - a, 0 )) + cvx.sum(cvx.maximum( -a - dyphi, dyphi - a, 0 ))

Materials do actually have non-linear permittivity and permeability, this may be useful in modelling that. It is also possible to consider the convex relaxation of truly hard nonlinear problems and hope you get the echoes of the phenomenology that occurs there.

Another approach is to go mixed integer. Mixed Integer programming allows you to force that some variables take on integer values. There is then a natural relaxation problem where you forget the integer variables have to be integers. Mixed integer programming combines a discrete flavor with the continuous flavor of convex programming. I’ve previously shown how you can use mixed integer programming to find the lowest energy states of the Ising model but today let’s see how to use it for a problem of a more continuous flavor.

As I’ve described previously, in the context of robotics, the non-convex constraint that variables lie on the surface of a circle can be approximated using mixed integer programming. We can mix this fairly trivially with the above to make a global solver for the minimum energy state of the XY model. The XY model is a 2d field theory where the value of the field is constrained to lie on a circle. It is a model of a number of physical systems, such as superconductivity, and is the playground for a number of interesting phenomenon, like the Kosterlitz-Thouless phase transition. Our encoding is very similar to the above except we make two copies of the field phi and we then force them to lie on a circle. I’m trying to factor out the circle thing into my library cvxpy-helpers, which is definitely a work in progress.

import cvxpy as cvx
import numpy as np
import matplotlib.pyplot as plt
import scipy.linalg
from mpl_toolkits import mplot3d
from cvxpyhelpers import cvxpyhelpers as mip

N = 6

# building a finite difference matrix. It is rectangle of size Nx(N-1). It maps from the vertices of our grid to the lines in between them, where derivatives live.
col = np.zeros(N)
col[0] = -1
col[1] = 1
delta = scipy.linalg.toeplitz(col, np.zeros(N-1)).T
print(delta)

gradx = np.kron(delta, np.eye(N))
grady = np.kron(np.eye(N), delta)

# a variable for our potential
phix = cvx.Variable((N, N))
phiy = cvx.Variable((N, N))

# vectorization is useful. It flattens out the x-y 2Dness.
phixvec = cvx.vec(phix)
phiyvec = cvx.vec(phiy)
gradxvec = gradx.reshape(-1, N*N)
gradyvec = grady.reshape(-1, N*N)

def sum_abs(x):
    return cvx.sum(cvx.abs(x))

#V = cvx.sum_squares(gradxvec * phixvec) + cvx.sum_squares(gradyvec * phixvec) + cvx.sum_squares(gradxvec * phiyvec) + cvx.sum_squares(gradyvec * phiyvec) 
V = sum_abs(gradxvec * phixvec) + sum_abs(gradyvec * phixvec) + sum_abs(gradxvec * phiyvec) + sum_abs(gradyvec * phiyvec) 
constraints = []
# coundary conditions. Nice and vortexy.
constraints += [phix[:,0] >= 0.9, phiy[0,1:-1] >= 0.9, phix[:,-1] <= -0.9, phiy[-1,1:-1] <= -0.9 ]

for i in range(N):
    for j in range(N):
        x, y, c = mip.circle(4)
        constraints += c
        constraints += [phix[i,j] == x]
        constraints += [phiy[i,j] == y]

# fixed charge density rho
rho = np.ones((N,N)) * 0.01
rho[N//2,N//2] = 1
print(rho)

# objective is energy
objective = cvx.Minimize(V + cvx.sum(cvx.multiply(rho,phix)))
prob = cvx.Problem(objective, constraints)
print("solving problem")
res = prob.solve(verbose=True, solver=cvx.GLPK_MI)
print(res)
print(phix.value)

# Plotting 
x = np.linspace(-6, 6, N)
y = np.linspace(-6, 6, N)

X, Y = np.meshgrid(x, y)
fig = plt.figure()

plt.quiver(X,Y, phix.value, phiy.value)
plt.show()

Now, this isn't really an unmitigated success as is. I switched to an absolute value potential because GLPK_MI needs it to be linear. ECOS_BB works with a quadratic potential, but it was not doing a great job. The commercial solvers (Gurobi, CPlex, Mosek) are supposed to be a great deal better. Perhaps switching to Julia, with it's richer ecosystem might be a good idea too. I don't really like how the solution of the absolute value potential looks. Also, even at such a small grid size it still takes around a minute to solve. When you think about it, it is exploring a ridiculously massive space and still doing ok. There are hundreds of binary variables in this example. But there is a lot of room for tweaking and I think the approach is intriguing.

Musings:

  • Can one do steepest descent style analysis for low energy statistical mechanics or quantum field theory?
  • Is the trace of the mixed integer program search tree useful for perturbative analysis? It seems intuitively reasonable that it visits low lying states
  • The Coulomb gas is a very obvious candidate for mixed integer programming. Let the charge variables on each grid point = integers. Then use the coulomb potential as a quadratic energy. The coulomb gas is dual to the XY model. Does this exhibit itself in the mixed integer formalism?
  • Coulomb Blockade?
  • Nothing special about the circle. It is not unreasonable to make piecewise linear approximations or other convex approximations of the sphere or of Lie groups (circle is U(1) ). This is already discussed in particular about SO(3) which is useful in robotic kinematics and other engineering topics.

Edit: /u/mofo69extreme writes:

"

By absolute value potential, I mean using |del phi| as compared to a more ordinary quadratic |del phi|2.

This is where I'm getting confused. As you say later, you are actually using two fields, phi_x and phi_y. So I'm guessing your potential is the "L1 norm"

|del phi| = |del phi_x| + |del phi_y|

? This is the only linear thing I can think of.

I don't feel that the exact specifics of the XY model actually matter all the much.

You should be careful here though. A key point in the XY model is the O(2) symmetry of the potential: you can multiply the vector (phi_x,phi_y) by a 2D rotation matrix and the Hamiltonian is unchanged. You have explicitly broken this symmetry down to Z_4 if your potential is as I have written above. In this case, the results of the famous JKKN paper and this followup by Kadanoff suggest that you'll actually get a phase transition of the so-called "Ashkin-Teller" universality class. These are actually closely related to the Kosterlitz-Thouless transitions of the XY model; the full set of Ashkin-Teller phase transitions actually continuously link the XY transition with that of two decoupled Ising models.

You should still get an interesting phase transition in any case! Just wanted to give some background, as the physics here is extremely rich. The critical exponents you see will be different from the XY model, and you will actually get an ordered Z_4 phase at low temperatures rather than the quasi-long range order seen in the low temperature phase of the XY model. (You should be in the positive h_4 region of the bottom phase diagram of Figure 1 of the linked JKKN paper.)"

These are some interesting points and references.