## A Smattering of Physics in Sympy

Sympy is fun. I’ve been enjoying trying out some simple physics problems and seeing what kind of fun angles sympy brings to the table. It does pretty good on concrete problems, not so good at abstract derivations.

### Kinematics

Ah such fond memories! In high school, I was taught by Ric Thompson “the big four”. $x_f = x_i + v_i t + \frac{1}{2} a t^2$ $v_f = v_i + a t$ $v_f^2 = v_i^2 + 2 a d$ $d = \frac{v_f + v_i}{2} t$

The equations are of course, overcomplete. They are all implied by $\frac{d^2}{dt^2}x = a$, but even with only algebra then second two are derivable from the first two.

Of course a natural way of deriving the equations is to solve one equation for a variable and substitute it into the other equation. sympy makes this pretty dang easy.

from sympy import *
init_printing()
t,a,d,vf,vi = symbols("t a d vf vi")
e1 = Eq(d , vi * t + 1/2 * a * t ** 2)
tsub = solve(Eq(vf , vi + a * t),t)
print(tsub) # This is assuming a is nonzero though.
expand(simplify(e1.subs(t,tsub)))
(vf - vi)/a
Eq(d, 0.5*vf**2/a - 0.5*vi**2/a)

However, there is a more automated approach.

It turns out that a decent chunk of physics equations are or can be well approximated by a system of polynomial equations. There are systematic methods that are guaranteed to solve the problem (albeit maybe not in the lifetime of the universe).

A grobner basis is an equivalent set of polynomial equations that has useful properties. For some simple purposes, all you need to know is that if you give the variables you want to eliminate first, the Groebner basis will contain equations without those variable. Here we specify t as one to eliminate, so we get an equation without t in it


G = groebner(  [vi * t + 1/2 * a * t ** 2 - d,
vi + a * t - vf] ,
t,vf,d,a,vi  )
for e in G:
print(e)
-2.0*d + 1.0*t*vf + 1.0*t*vi
1.0*a*t - 1.0*vf + 1.0*vi
-2.0*a*d + 1.0*vf**2 - 1.0*vi**2

I’ve actually been pleasantly surprised at how many physics problems reduce ultimately to systems of polynomial constraints. Energy and momentum conservation are polynomial constraints (classical feynman diagrams kind of). Special relativity questions can be reduced to polynomial constraints using the proper time.

#elephant problem
# elephants give birth at 21 months. On a rocket at velocity v
# how long T until you see it give birth?
tau , t1, t2, x1, v, c, T = symbols("tau t1 t2 c1 v c T")

eqs = [
tau**2 - (t1**2 - x1**2 / c**2), # proper time
x1 - v * t1, # distance away
c * t2 - x1, # time for light to travel back
T - t1 - t2, # total time
tau - 21 # proper time is 21 months

]

groebner(eqs, tau , t1, t2, x1, v, T)

### Lagrangian Mechanics

The Structure and Interpretation of Classical Mechanics is an interesting book.

It points out that notation we use is extremely imprecise and implicit. This is a source of great confusion.

A great benefit of programming up such examples is that it makes explicit (sometimes painfully so) steps that were implicit before.

In the Euler Lagrange equation, first partially differentiates considering q and $\dot{q}$ to be independent parameters. Then a substitution is makde for a function $x(t)$ and then we procede with a differentiation with respect to time.

# simple harmonic oscillator lagrangian style
m, k = symbols("m k", real = True, positive=True)
v, q = symbols("v q")
K = Rational(1,2) * m * v ** 2 #kinetic energy
V = Rational(1,2) * k * q ** 2 # potential energy
L =  K - V  #Lagrangian
F = diff(L,q) # force
p = diff(L,v) # momentum

x_ = Function("x")
t = symbols("t")

x = x_(t)

subst = { v : diff(x,t),
q : x} # replacement to turn q into a function x

# euler-lagrange equations of motion
eq = F.subs( subst ) - diff( p.subs(subst)  , t )
dsolve(eq) # general solution cosine and sines

Here’s an analogous thing for a pendulum

#simple harmonic oscillator lagrangian style
m, g, L = symbols("m g L", real = True, positive=True)
theta, w = symbols("theta omega")
K = Rational(1,2) * m * (w * L) ** 2 #kinetic energy
V = - Rational(1,2) * m * g * L * cos(theta) # potential energy. angle is defined as 0 = hanging down
L =  K - V  #Lagrangian
F = diff(L,theta) # force
p = diff(L,w) # momentum
F
p

x_ = Function("theta")
t = symbols("t")

x = x_(t)

subst = { w : diff(x,t),
theta : x} # replacement to turn q into a function x

# euler-lagrange equations of motion
eq = F.subs( subst ) - diff( p.subs(subst)  , t )
eq
#dsolve(eq) 

Another place where an implicit stated substitution is absolutely vital is in the Legendre transform going from the Lagrangian to the Hamiltonian.

# legendre transformation to hamiltonian
p = symbols( "p" )
H_ = p * v - L # hamiltonian but we haven't solved out v yet
v_of_pq = solve(diff(H_, v), v) # set derivative to 0 to solve for v.
H = simplify(H_.subs(v, v_of_pq )) # substitue back in. Here is the actual hamiltonian
H

#### Statistical Mechanics

Sympy can do Gaussian integrals! How convenient. It can also do power series expansions. And differentiate. So it takes the drudgery out of some simple calculations

# ideal gas partition function
beta, m, V, N, kb, T  = symbols("beta m V N k_b T", real=True, positive=True)
p = symbols("p", real=True)
Z = integrate( exp( - beta * Rational(1,2) * p ** 2 / m ), (p,-oo,oo))**(3*N) * V**N #partition function
def avg_energy(Z):
return - diff(ln(Z), beta).subs(beta, 1/ kb / T)
print(avg_energy(Z)) #
F = (-ln(Z) / beta).subs(beta, 1 / kb / T) #helmholtz free energy
S = diff(F , T) # sentropy is derivative of helmholtz wrt T
S # the functional dependence on T and V I think is correct
P = -diff(F , V) # pressure is - derivative of V
P
# Neato
# hamrmonic oscillator partition function
beta, m, k = symbols("beta m k ", real=True, positive=True)
p, x = symbols("p x", real=True)
E = R(1,2) * p ** 2 / m  + R(1,2) * k * x ** 2
Z = integrate( integrate( exp( - beta * E ), (p,-oo,oo)) , (x,-oo, oo))**N
diff(-ln(Z),beta)

Perturbation theory of the partition function of an anharmonic oscillator. Pretty easy. It is interesting to note that this is the very simplest schematic of how perturbation theory can be approached for quantum field theory.

# pertubration theory of anharmonic oscillator
beta, m, k, g = symbols("beta m k g ", real=True, positive=True)
p, x = symbols("p x", real=True)
E = Rational(1,2) * ( p ** 2 / m  +  k * x ** 2) + g * x ** 4
series(exp( - beta * E ), g).removeO()
Z = integrate( integrate( series(exp( - beta * E ), g, n=2).removeO(), (p,-oo,oo)) , (x,-oo, oo))
simplify(diff(-ln(Z),beta)) #E
simplify(diff(-ln(Z),k)/beta) #<x**> 

Other things that might be interesing : 2 oscillators, A chain of oscillators, virial expansion

#### Thermo and Legendre Tranformations

Thermodynamics is a poorly communicated topic. Which variables remain in expressions and what things are held constant when differentiating are crucial and yet poorly communicated and the notation is trash. Sympy helps make some things explicit. It’s fun.

u,s,t,p,v,n,r = symbols("u s t p v n r")

du,ds,dt,dp,dv = symbols("du ds dt dp dv")
# taylor series in stuff?

e1 = p * v - n * r * t
e2 = u - Rational(3 , 2) * n * r * t

state = [  (u,du), (s,ds), (t,dt) , (p,dp) , (v,dv) ]

def differential(e):
return sum( [ diff(e,x) * dx  for x,dx in state]   )

de1 = differential(e1 )
de2 = differential(e2 )

e3 = du - (t * ds - p * dv)

eqs = [e1,e2,de1,de2,e3]
print(eqs)
G = groebner( eqs, u , du,  t, dt, p, dp, v, dv,  ds )
for e in G:
print(e)
R = Rational
U,S,T,P,V,N, k = symbols("U S T P V N k")

cv = R(3,2) * N * k
e1 = U - cv * T
e2 = P * V - N * k * T
e3 = S - cv * ln(T) + N * k * ln(V)

elim = [P,T]
Ps = solve([e1,e2,e3], P)
Ps
es = [ e.subs(Ps) for e in [e1,e2,e3] ]
Ts = solve(e3, T)
es = [  e.subs(T,Ts) for e in es ]
Usv = solve(es,U)
psv = diff(Usv,V)
tsv = diff( Usv , S )

#solve(es, V)

Hsv = Usv + P * V  # enthalpy and legendre trnasformation
Vps = solve(diff(Hsv, V) , V)
H =  Hsv.subs(V, Vps)
simplify(H)

There are so many other things!

What about a Van Der Waals equation? Optics (geometrical and wave, paraxial ~ Schrodinger, fourier optics), GR (exterior derivatives ) , Quantum (wave matching problems. What can we do about hydrogen? WKB, QHE) rutherford scattering, Weiss mean field, canonical transformations, Rotations. Clebsh-Gordon coefficients

## Computing Syzygy Modules in Sympy

Reading about the methods of computational algebra is really compelling to me because some domains that seem like natural fits

I used to have no idea that multivariate polynomial equations had guaranteed methods that in some sense solve those systems. It’s pretty cool.

However, when I was pouring over the two Cox Little O’shea volumes, the chapter on modules made my eyes glaze over. Who ordered that? From my perspective, modules are vector spaces where you cripple the ability to divide scalars. Fair enough, but the language is extremely confusing and off-putting. Syzygy? Free Resolution? Everything described as homomorphisms and exact sequences? Forget it. Why do this? This is too abstract.

So I’ve been on the lookout for some application to motivate them. And I think I have at least one. Capacitor Inductor circuits.

A pure resistive circuit can be treated by linear algebra. The voltages and currents are connected by linear relations. http://www.philipzucker.com/linear-relation-algebra-of-circuits-with-hmatrix/

The common way to describe inductor capacitors circuits is by using phasor analysis, where the resistances become impedances which have a frequency parameter in them. I’m not entirely convinced that it isn’t acceptable to just use linear algebra over rational functions of the frequency, but I have some reason to believe more carefulness regarding division may bear fruit. I suspect that carefulness with division corresponds to always sticky issues of boundary conditions.

On a slightly different front, I was very impressed by Jan Willems Open Dynamical systems. https://homes.esat.kuleuven.be/~sistawww/smc/jwillems/Articles/JournalArticles/2007.1.pdf In it, he talks about differential equations as describing sets of possible trajectories of systems. He uses module theory as a way to manipulate those sets and conditions from module theory to describe interesting qualitative features like controllability of those systems.

He sticks to the tools of Hermite and Smith forms of matrices, as he is mostly interested in single variable polynomials as the ring in question. Here’s my issues

1. I’m not really familiar with these forms
2. I can’t find good implementations of these. Perhaps here https://desr.readthedocs.io/en/latest/index.html (Differential Equation Symmetry Reduction), which seems like an interesting project for other reasons as well. Maybe I’m a fool, but I’d like to stick to python for the moment.
3. I also have an inkling that modules over multivariate polynomials will come in handy for playing around with band theory (or partial different equations for that matter). Maybe something interesting to be said regarding topological materials?

It seems like Groebner basis techniques should acceptably solve these systems as well. Converting between the analog of range and nullspace representations as I did in my previous post corresponds to syzygy calculations in the terminology of modules

Sympy does supply a Groebner basis algorithm, but not much beyond that. The AGCA module that should supply calculations over modules is mostly a lie. The documentation lists many functions that are not implemented. Which is too bad.

However, you can can hack in syzygy calculation into a Groebner basis calculation. I started pouring over chapter 5 of Using Algebra again, and it really has everything you need.

When one converts a set of polynomials to a Groebner basis, one is getting an equivalent set of polynomials with excellent properties. A Groebner basis is an analog of reduced echelon form of a matrix (these rows are equivalent to the old rows), and Buchberger’s algorithm is an analog of gaussian elimination. https://mattpap.github.io/masters-thesis/html/src/groebner.html#special-case-1-gauss-algorithm . You can find a decomposition of a polynomial in your ideal by a multivariate division algorithm with respect to the Groebner basis. This is the analog of the ability of an upper triangular matrix to be easily inverted.

You can perform a number of tricks by adding in dummy variables to the Groebner basis algorithm. The first thing you can do with this sort of trick is track how to write the Groebner basis in terms of the original basis. This is the analog of working with an augmented matrix during gaussian elimination. https://en.wikipedia.org/wiki/Augmented_matrix

I found this Maple documentation helpful in this regard (although formatted horrifically)

https://www.maplesoft.com/support/help/Maple/view.aspx?path=Groebner%2fBasis_details

We want to track a matrix A that writes the Groebner basis vector G to the original vector of polynomials F. $G = AF$. We do it by attaching the each generator f of F a fresh marker variable f + m. Then the coefficients on m in the extended Groebner basis correspond to the matrix A. Think about it.

The other direction matrix can be found via the reduction algorithm with respect to the Grobner basis $F = BG$ . This is pretty straightforward given that sympy implemented reduction for us.

From these we determine that

G = GBA
F = FAB

Finding the syzygies of a set of generators is the analog of finding a nullspace of a matrix. A syzygy is a set of coefficients to “dot” onto the generators and get zero. In linear algebra talk, they are sort of orthogonal to the generator set.

The ability to find a nullspace gives you a lot of juice. One can phrase many problems, including solving a $Ax=b$ system of equations as a nullspace finding problem.

Proposition 3.3 of Using Algebra tells us how to calculate the generators of a syzygy module for a Groebner basis. It’s a little strange. The S-polynomial of two generators from the basis is zero after reduction by the basis. The S-polynomial plus the reduction = 0 gives us a very interesting identity on the generators (a syzygy) and it turns out that actually these generate all possible syzygies. This is still not obvious to me but the book does explain it.

Proposition 3.8 of Using Algebra tells us how to get the syzygies of the original generators given the previous information. We map back the generators of G and append the columns I – AB also

I – AB columns are syzygys of F. F (I – AB) = F – FAB = F- F = 0 using the equation from above F = FAB

I’m still trying to figure out how to do calculations on modules proper. I think it can be done be using dummy variables to turn module vectors into single expressions. There is an exercise in Using Algebra that mentions this.

Grobner basis reference suggestions:

## Fiddling around with validated ODE integration, Sum of Squares, Taylor Models.

As I have gotten more into the concerns of formal methods, I’ve become unsure that ODEs actually exist. These are concerns that did not bother me much when I defined myself as being more in the physics game. How times change. Here’s a rough cut.

A difficulty with ODE error analysis is that it is very confusing how to get the error on something you are having difficulty approximating in the first place.

If I wanted to know the error of using a finite step size dt vs a size dt/10, great. Just compute both and compare. However, no amount of this seems to bootstrap you down to the continuum. And so I thought, you’re screwed in regards to using numerics in order to get true hard facts about the true solution. You have to go to paper and pencil considerations of equations and variables and epsilons and deltas and things. It is now clearer to me that this is not true. There is a field of verified/validated numerics.

A key piece of this seems to be interval arithmetic. https://en.wikipedia.org/wiki/Interval_arithmetic An interval can be concretely represented by its left and right point. If you use rational numbers, you can represent the interval precisely. Interval arithmetic over approximates operations on intervals in such a way as to keep things easily computable. One way it does this is by ignoring dependencies between different terms. Check out Moore et al’s book for more.

What switching over to intervals does is you think about sets as the things you’re operating on rather than points. For ODEs (and other things), this shift of perspective is to no longer consider individual functions, but instead sets of functions. And not arbitrary extremely complicated sets, only those which are concretely manipulable and storable on a computer like intervals. Taylor models are a particular choice of function sets. You are manipulating an interval tube around a finite polynomial. If during integration / multiplication you get higher powers, truncate the polynomials by dumping the excess into the interval term. This keeps the complexity under wraps and closes the loop of the descriptive system.

If we have an iterative, contractive process for getting better and better solutions of a problem (like a newton method or some iterative linear algebra method), we can get definite bounds on the solution if we can demonstrate that a set maps into itself under this operation. If this is the case and we know there is a unique solution, then it must be in this set.

It is wise if at all possible to convert an ODE into integral form. $\dot{x}= f(x,t)$ is the same as $x(t) = x_0 + \int f(x,t)dt$.

For ODEs, the common example of such an operation is known as Picard iteration. In physical terms, this is something like the impulse approximation / born approximation. One assumes that the ODE evolves according to a known trajectory $x_0(t)$ as a first approximation. Then one plugs in the trajectory to the equations of motion $f(x_0,t)$ to determine the “force” it would feel and integrate up all this force. This creates a better approximation $x_1(t)$ (probably) which you can plug back in to create an even better approximation.

If we instead do this iteration on an intervally function set / taylor model thing, and can show that the set maps into itself, we know the true solution lies in this interval. The term to search for is Taylor Models (also some links below).

I was tinkering with whether sum of squares optimization might tie in to this. I have not seen SOS used in this context, but it probably has or is worthless.

An aspect of sum of squares optimization that I thought was very cool is that it gives you a simple numerical certificate that confirms that at the infinitude of points for which you could evaluate a polynomial, it comes out positive. This is pretty cool. http://www.philipzucker.com/deriving-the-chebyshev-polynomials-using-sum-of-squares-optimization-with-sympy-and-cvxpy/

But that isn’t really what makes Sum of squares special. There are other methods by which to do this.

There are very related methods called DSOS and SDSOS https://arxiv.org/abs/1706.02586 which are approximations of the SOS method. They replace the SDP constraint at the core with a more restrictive constraint that can be expressed with LP and socp respectively. These methods lose some of the universality of the SOS method and became basis dependent on your choice of polynomials. DSOS in fact is based around the concept of a diagonally dominant matrix, which means that you should know roughly what basis your certificate should be in.

This made me realize there is an even more elementary version of DSOS that perhaps should have been obvious to me from the outset. Suppose we have a set of functions we already know are positive everywhere on a domain of interest. A useful example is the raised chebyshev polynomials. https://en.wikipedia.org/wiki/Chebyshev_polynomials The appropriate chebyshev polynomials oscillate between [-1,1] on the interval [-1,1], so if you add 1 to them they are positive over the whole interval [-1,1]. Then nonnegative linear sums of them are also positive. Bing bang boom. And that compiles down into a simple linear program (inequality constraints on the coefficients) with significantly less variables than DSOS. What we are doing is restricting ourselves to completely positive diagonal matrices again, which are of course positive semidefinite. It is less flexible, but it also has more obvious knobs to throw in domain specific knowledge. You can use a significantly over complete basis and finding this basis is where you can insert your prior knowledge.

It is not at all clear there is any benefit over interval based methods.

Here is a sketch I wrote for $x'=x$ which has solution $e^t$. I used raised chebyshev polynomials to enforce positive polynomial constraints and tossed in a little taylor model / interval arithmetic to truncate off the highest terms.

I’m using my helper functions for translating between sympy and cvxpy expressions. https://github.com/philzook58/cvxpy-helpers Sympy is great for collecting up the coefficients on terms and polynomial multiplication integration differentiation etc. I do it by basically creating sympy matrix variables corresponding to cvxpy variables which I compile to cvxpy expressions using lambdify with an explicit variable dictionary.

Seems to work, but I’ve been burned before.

man, LP solvers are so much better than SDP solvers

Random junk and links: Should I be more ashamed of dumps like this? I don’t expect you to read this.

https://github.com/JuliaIntervals/TaylorModels.jl

https://github.com/JuliaIntervals

Functional analysis by and large analyzes functions by analogy with more familiar properties of finite dimensional vector spaces. In ordinary 2d space, it is convenient to work with rectangular regions or polytopic regions.

Suppose I had a damped oscillator converging to some unknown point. If we can show that every point in a set maps within the set, we can show that the function

One model of a program is that it is some kind of kooky complicated hyper nonlinear discrete time dynamical system. And vice versa, dynamical systems are continuous time programs. The techniques for analyzing either have analogs in the other domain. Invariants of programs are essential for determining correctness properties of loops. Invariants like energy and momentum are essential for determining what physical systems can and cannot do. Lyapunov functions demonstrate that control systems are converging to the set point. Terminating metrics are showing that loops and recursion must eventually end.

If instead you use interval arithmetic for a bound on your solution rather than your best current solution, and if you can show the interval maps inside itself, then you know that the iterative process must converge inside of the interval, hence that is where the true solution lies.

A very simple bound for an integral $\int_a^b f(x)dx$ is $\int max_{x \in [a,b]}f(x) dx= max_{x \in [a,b]}f(x) \int dx = max_{x \in [a,b]}f(x) (b - a)$

The integral is a very nice operator. The result of the integral is a positive linear sum of the values of a function. This means it plays nice with inequalities.

Rigorously Bounding ODE solutions with Sum of Squares optimization – Intervals

Intervals – Moore book. Computational functional analaysis. Tucker book. Coqintervals. fixed point theorem. Hardware acceleration? Interval valued functions. Interval extensions.

• Banach fixed point – contraction mapping
• Brouwer fixed point
• Schauder
• Knaster Tarski

Picard iteration vs? Allowing flex on boundary conditions via an interval?

Interval book had an interesting integral form for the 2-D

sympy has cool stuff

google scholar search z3, sympy brings up interesting things

https://moorepants.github.io/eme171/resources.html

The pydy guy Moore has a lot of good shit. resonance https://www.moorepants.info/blog/introducing-resonance.html

Lyapunov functions. Piecewise affine lyapunov funcions. Are lyapunov functions kind of like a PDE? Value functions are pdes. If the system is piecewise affine we can define a grid on the same piecewise affine thingo. Compositional convexity. Could we use compositional convexity + Relu style piecewise affinity to get complicated lyapunov functions. Lyapunov functions don’t have to be continiuous, they just have to be decreasing. The Lie derivative wrt the flow is always negative, i.e gradeint of function points roughly in direction of flow. trangulate around equilibrium if you want to avoid quadratic lyapunov. For guarded system, can relax lyapunov constrain outside of guard if you tighten inside guard. Ax>= 0 is guard. Its S-procedure.

Best piecewise approximation with point choice?

Connection to petri nets?

KoAt, LoAT. AProve. Integer transition systems. Termination analysis. Loops?

https://lfcps.org/pub/Pegasus.pdf darboux polynomials. barreir certificates. prelle-singer method. first integrals.

method 1. arbitrary polynomial p(t). calculate p'(t). find coefficents that make p'(t) = 0 by linear algebra. Idea: near invaraints? min max|p'(t) |

Lie Algebra method

https://www.researchgate.net/publication/233653257_Solving_Differential_Equations_by_Symmetry_Groups sympy links this paper. Sympy has some lie algebra stuff in there

https://www-users.math.umn.edu/~olver/sm.html Peter Olver tutorial

https://www-sop.inria.fr/members/Evelyne.Hubert/publications/PDF/Hubert_HDR.pdf

https://www.cs.cmu.edu/~aplatzer/logic/diffinv.html andre platzer. Zach says Darboux polynomials?

Books: Birhoff and Rota, Guggenheimer, different Olver books, prwctical guide to invaraints https://www.amazon.com/Practical-Invariant-Monographs-Computational-Mathematics/dp/0521857015

Idea: Approximate invariants? At least this ought to make a good coordinate system to work in where the dynamics are slow. Like action-angle and adiabatic transformations. Could also perhaps bound the

Picard Iteration

I have a method that I’m not sure is ultimately sound. The idea is to start with

Error analysis most often uses an appeal to Taylor’s theorem and Taylor’s theorem is usually derived from them mean value theorem or intermediate value theorem. Maybe that’s fine. But the mean value theorem is some heavy stuff. There are computational doo dads that use these bounds + interval analysis to rigorously integrate ODEs. See https://github.com/JuliaIntervals/TaylorModels.jl

The beauty of sum of squares certificates is that they are very primitive proofs of positivity for a function on a domain of infinitely many values. If I give you a way to write an expression as a sum of square terms, it is then quite obvious that it has to be always positive. This is algebra rather than analysis. $y(t) = \lambda(t) \and \lambda(t) is SOS \Rightarrow \forall t. y(t) >= 0$. Sum of squares is a kind of a quantifier elimination method. The reverse direction of the above implication is the subject of the positivstullensatz, a theorem of real algebraic geometry. At the very least, we can use the SOS constraint as a relaxation of the quantified constraint.

So, I think by using sum of squares, we can turn a differential equation into a differential inequation. If we force the highest derivative to be larger than the required differential equation, we will get an overestimate of the required function.

A function that is dominated by another in derivative, will be dominated in value also. You can integrate over inequalities (I think. You have to be careful about such things. ) $\forall t. \frac{dx}{dt} >= \frac{dx}{dt} \Rightarrow$ x(t) – x(0) >= y(t) – y(0) $The derivative of a polynomial can be thought of as a completely formal operation, with no necessarily implied calculus meaning. It seems we can play a funny kind of shell game to avoid the bulk of calculus. As an example, let’s take $\frac{dx}{dt}=y$ $y(0) = 1$ with the solution $y = e^t$. $e$ is a transcendental The S-procedure is trick by which you can relax a sum of squares inequality to only need to be enforced in a domain. If you build a polynomials function that describes the domain, it that it is positive inside the domain and negative outside the domain, you can add a positive multiple of that to your SOS inequalities. Inside the domain you care about, you’ve only made them harder to satisfy, not easier. But outside the domain you have made it easier because you can have negative slack. For the domain $t \in [0,1]$ the polynomial $(1 - t)t$ works as our domain polynomial. We parametrize our solution as an explicit polynomial $x(t) = a_0 + a_1 t + a_2 t^2 + ...$. It is important to note that what follows is always linear in the $a_i$. $\frac{dx}{dt} - x >= 0$ can be relaxed to $\frac{dx}{dt} - x(t) + \lambda(t)(1-t)t >= 0$ with $\lambda(t) is SOS$. So with that we get a reasonable formulation of finding a polynomial upper bound solution of the differential equation $\min x(1)$ $\frac{dx}{dt} - x(t) + \lambda_1(t)(1-t)t = \lambda_2(t)$ $\lambda_{1,2}(t) is SOS$. And here it is written out in python using my cvxpy-helpers which bridge the gap between sympy polynomials and cvxpy. We can go backwards to figure out sufficient conditions for a bound. We want $x_u(t_f) \gte x(t_f)$. It is sufficient that $\forall t. x_u(t) \gte x(t)$. For this it is sufficient that $\forall t. x_u'(t) >= x'(t) \and x_u(t_i) >= x(t_i)$. We follow this down in derivative until we get the lowest derivative in the differential equation. Then we can use the linear differential equation itself $x^{(n)}(t) = \sum_i a_i(t) x^{(i)}(t)$. $x_u^{(n)}(t) >= \sum max(a_i x^{(i)}_u, x^{(i)}_l)$. $a(t) * x(t) >= \max a(t) x_u(t), a(t) x_l(t)$. This accounts for the possibility of terms changing signs. Or you could separate the terms into regions of constant sign. The minimization characterization of the bound is useful. For any class of functions that contains our degree-d polynomial, we can show that the minimum of the same optimization problem is less than or equal to our value. Is the dual value useful? The lower bound on the least upper bound Doesn’t seem like the method will work for nonlinear odes. Maybe it will if you relax the nonlinearity. Or you could use perhaps a MIDSP to make piecewise linear approximations of the nonlinearity? It is interesting to investigtae linear programming models. It is simpler and more concrete to examine how well different step sizes approximate each other rather than worry about the differential case. We can explicit compute a finite difference solution in the LP, which is a power that is difficult to achieve in general for differential equations. We can instead remove the exact solution by a convservative bound. While we can differentiate through an equality, we can’t differentiate through an inequality. Differentiation involves negation, which plays havoc with inequalities. We can however integrate through inequalities. $\frac{dx}{dt} >= f \and x(0) >= a \Rightarrow$ x(t) >= \int^t_0 f(x) + a$

As a generalization we can integrate $\int p(x)$ over inequalities as long as $p(x) \gte 0$

In particular $\forall t. \frac{dx}{dt} >= \frac{dx}{dt} \Rightarrow$ x(t) – x(0) >= y(t) – y(0) \$

We can convert a differential equation into a differential inequation. It is not entirely clear to me that there is a canonical way to do this. But it works to take the biggest. $\frac{dx}{dt} = A(t)x + f(t)$

Is there a tightest

We can integrate

Here let’s calculate e

https://tel.archives-ouvertes.fr/tel-00657843v2/document Thesis on ODE bounds in Isabelle

myfunc x y = 3

not so good. very small

## Deriving the Chebyshev Polynomials using Sum of Squares optimization with Sympy and Cvxpy

Least squares fitting $\sum (f(x_i)-y_i)^2$ is very commonly used and well loved. Sum of squared fitting can be solved using just linear algebra. One of the most convincing use cases to me of linear programming is doing sum of absolute value fits $\sum |f(x_i)-y_i|$  and maximum deviation fits $\max_i |f(x_i)-y_i|$. These two quality of fits are basically just as tractable as least squares, which is pretty cool.

The trick to turning an absolute value into an LP is to look at the region above the graph of absolute value. This region is defined by $y \ge x$ and $y \ge -x$. So you introduce a new variable y. Then the LP $\min y$ subject to those constraints will minimize the absolute value. For a sum of absolute values, introduce a variable $y_i$ for each absolute value you have. Then minimize $\sum_i y_i$. If you want to do min max optimization, use the same y value for every absolute value function. $\min y$ $\forall i. -y \le x_i \le y$

Let’s change topic a bit. Chebyshev polynomials are awesome. They are basically the polynomials you want to use in numerics.

Chebyshev polynomials are sines and cosines in disguise. They inherit tons of properties from them. One very important property is the equioscillation property. The Chebyshev polynomials are the polynomials that stay closest to zero while keeping the x^n coefficient nonzero (2^(n-2) by convention). They oscillate perfectly between -1 and 1 on the interval $x \in [-1,1]$ just like sort of a stretched out sine. It turns out this equioscillation property defines the Chebyshev polynomials

We can approximate the Chebyshev polynomials via sampling many points between [-1,1]. Then we do min of the max absolute error optimization using linear programming. What we get out does approximate the Chebyshev polynomials.

import cvxpy as cvx
import numpy as np
import matplotlib.pyplot as plt

# try finding the 3 through 5 chebyshev polynomial
for N in range(3,6):
a = cvx.Variable(N) #polynomial coefficients
t = cvx.Variable()
n = np.arange(N) #exponents

xs = np.linspace(-1,1,100)
chebcoeff = np.zeros(N)
chebcoeff[-1] = 1
plt.plot(xs, np.polynomial.chebyshev.chebval(xs, chebcoeff), color='r')

constraints = [a[-1]==2**(N-2)] # have to have highest power
for k in range(100):
x = np.random.rand()*2-1 #pick random x in [-1,1]
c = cvx.sum(a * x**n) #evaluate polynomial

constraints.append(c <= t)
constraints.append(-t <= c)

obj = cvx.Minimize(t) #minimize maximum aboslute value
prob = cvx.Problem(obj,constraints)
prob.solve()
plt.plot(xs, np.polynomial.polynomial.polyval(xs, a.value), color='g')
print(a.value)

plt.show() Found Coefficients:
[-9.95353054e-01  1.33115281e-03  1.99999613e+00]
[-0.01601964 -2.83172979  0.05364805  4.00000197]
[ 0.86388003 -0.33517716 -7.4286604   0.6983382   8.00000211]

Red is the actual Chebyshev polynomials and green is the solved for polynomials. It does a decent job. More samples will do even better, and if we picked the Chebyshev points it would be perfect.

Can we do better? Yes we can. Let’s go on a little optimization journey.

Semidefinite programming allows you to optimize matrix variables with the constraint that they have all positive eigenvalues. In a way it lets you add an infinite number of linear constraints. Another way of stating the eigenvalue constraint is that $\forall q. q^T X q \ge 0$

You could sample a finite number of random q vectors and take the conjunction of all these constraints. Once you had enough, this is probably a pretty good approximation of the Semidefinite constraint. But semidefinite programming let’s you have an infinite number of the constraints in the sense that $\forall q$ is referencing an infinite number of possible q , which is pretty remarkable.

Finite Sampling the qs has similarity to the previously discussed sampling method for absolute value minimization.

Sum of Squares optimization allows you to pick optimal polynomials with the constraint that they can be written as a sum of squares polynomials. In this form, the polynomials are manifestly positive everywhere. Sum of Squares programming is a perspective to take on Semidefinite programming. They are equivalent in power. You solve SOS programs under the hood by transforming them into semidefinite ones.

You can write a polynomial as a vector of coefficients $\tilde{a}$. $\tilde{x} = \begin{bmatrix} 1 \\ x \\ x^2 \\ x^3 \\ \vdots \end{bmatrix}$ $\tilde{a} = \begin{bmatrix} a_0 \\ a_1 \\ a_2 \\ a_3 \\ \vdots \end{bmatrix}$ $p(x)=\tilde{a}^T \tilde{x}$

Instead we represent the polynomial with the matrix $Q$ $p(x) = \tilde{x}^T Q \tilde{x}$

If the matrix is positive semidefinite, then it can be diagonalized into the sum of squares form.

In all honesty, this all sounds quite esoteric, and it kind of is. I struggle to find problems to solve with this stuff. But here we are! We’ve got one! We’re gonna find the Chebyshev polynomials exactly by translating the previous method to SOS.

The formulation is a direct transcription of the above tricks. $\min t$ $-t \le p(x) \le t$  by which I mean $p(x) + t$ is SOS and $t - p(x)$ is SOS.

There are a couple packages available for python already that already do SOS, .

SumofSquares.jl for Julia and SOSTools for Matlab. YalMip too I think. Instead of using those packages, I want to roll my own, like a doofus.

Sympy already has very useful polynomial manipulation functionality. What we’re going to do is form up the appropriate expressions by collecting powers of x, and then turn them into cvxpy expressions term by term. The transcription from sympy to cvxpy isn’t so bad, especially with a couple helper functions.

One annoying extra thing we have to do is known as the S-procedure. We don’t care about regions outside of $x \in [-1,1]$. We can specify this with a polynomial inequality $(x+1)(x-1) \ge 0$. If we multiply this polynomial by any manifestly positive polynomial (a SOS polynomial in particular will work), it will remain positive in the region we care about. We can then add this function into all of our SOS inequalities to make them easier to satisfy. This is very similar to a Lagrange multiplier procedure.

Now all of this seems reasonable. But it is not clear to me that we have the truly best polynomial in hand with this s-procedure business. But it seems to works out.

from sympy import *
import cvxpy as cvx
import matplotlib.pyplot as plt
import numpy as np

#build corresponding cvx variable for sympy variable
def cvxvar(expr, PSD=True):
if expr.func == MatrixSymbol:
i = int(expr.shape.evalf())
j = int(expr.shape.evalf())
return cvx.Variable((i,j), PSD=PSD)
elif expr.func == Symbol:
return cvx.Variable()

def cvxify(expr, cvxdict): # replaces sympy variables with cvx variables
return lambdify(tuple(cvxdict.keys()), expr)(*cvxdict.values())

xs = np.linspace(-1,1,100)

for N in range(3,6):
#plot actual chebyshev
chebcoeff = np.zeros(N)
chebcoeff[-1] = 1
plt.plot(xs, np.polynomial.chebyshev.chebval(xs, chebcoeff), color='r')

cvxdict = {}
# Going to need symbolic polynomials in x
x = Symbol('x')
xn = Matrix([x**n for n in range(N)]);

def sospoly(varname):
Q = MatrixSymbol(varname, N,N)
p = (xn.T * Matrix(Q) * xn)
return p, Q

t = Symbol('t')
cvxdict[t] = cvxvar(t)

#lagrange multiplier polynomial 1
pl1, l1 = sospoly('l1')
cvxdict[l1] = cvxvar(l1)

#lagrange multiplier polynomial 2
pl2, l2 = sospoly('l2')
cvxdict[l2] = cvxvar(l2)

#Equation SOS Polynomial 1
peq1, eq1 = sospoly('eq1')
cvxdict[eq1] = cvxvar(eq1)

#Equation SOS Polynomial 2
peq2, eq2 = sospoly('eq2')
cvxdict[eq2] = cvxvar(eq2)

a = MatrixSymbol("a", N,1)
pa = Matrix(a).T*xn #sum([polcoeff[k] * x**k for k in range(n)]);
pa = pa
cvxdict[a] = cvxvar(a, PSD=False)

constraints = []

# Rough Derivation for upper constraint
# pol <= t
# 0 <= t - pol + lam * (x+1)(x-1)  # relax constraint with lambda
# eq1 = t - pol + lam
# 0 = t - pol + lam - eq1
z1 = t - pa + pl1 * (x+1)*(x-1) - peq1
z1 = Poly(z1, x).all_coeffs()
constraints += [cvxify(expr, cvxdict) == 0 for expr in z1]

# Derivation for lower constraint
# -t <= pol
# 0 <= pol + t + lam * (x+1)(x-1) # relax constraint with lambda
# eq2 = pol + t + lam     # eq2 is SOS
# 0 = t - pol + lam - eq2     #Rearrange to equal zero.
z2 = pa + t + pl2 * (x+1)*(x-1) - peq2
z2 = Poly(z2, x).all_coeffs()
constraints += [cvxify(expr, cvxdict) == 0 for expr in z2]

constraints += [cvxdict[a][N-1,0] == 2**(N-2) ]
obj = cvx.Minimize(cvxdict[t]) #minimize maximum absolute value
prob = cvx.Problem(obj,constraints)
prob.solve()

print(cvxdict[a].value.flatten())
plt.plot(xs, np.polynomial.polynomial.polyval(xs, cvxdict[a].value.flatten()), color='g')

plt.show()


Coefficients:
[-1.00000000e+00 -1.02219773e-15  2.00000001e+00]
[-1.23103133e-13 -2.99999967e+00  1.97810058e-13  4.00001268e+00]
[ 1.00000088e+00 -1.39748880e-15 -7.99999704e+00 -3.96420452e-15
7.99999691e+00] Ooooooh yeah. Those curves are so similar you can’t even see the difference. NICE. JUICY.

There are a couple interesting extension to this. We could find global under or over approximating polynomials. This might be nice for a verified compression of a big polynomial to smaller, simpler polynomials for example. We could also similar form the pointwise best approximation of any arbitrary polynomial f(x) rather than the constant 0 like we did above (replace $-t \le p(x) \le t$ for $-t \le p(x) - f(x) \le t$ in the above). Or perhaps we could use it to find a best polynomial fit for some differential equation according to a pointwise error.

I think we could also extend this method to minimizing the mean absolute value integral just like we did in the sampling case. $\min \int_0^1 t(x)dx$ $-t(x) \le p(x) \le t(x)$

More references on Sum of Squares optimization:

http://www.mit.edu/~parrilo/