Mathematical Programming
- Optimization
- Linear Programming
- Convex Optimization
- Duality
- Quadratic Programming
- Conic Programming
- Semi-Infinite Programming
- Semi definite programs
- Convex Optimization
- Mixed Integer
- Extended Mathematical Programming
- Stochastic programming
- Robust Optimization
- Complementarity Problems
- Parametric Programming
- Difference of Convex Programming
- Bilevel programming
- Tools
- Resources
Optimization
I should renamed this article to Mathematical Programming / Operations Research / maybe convex optimization. More generally, optimization also includes general non linear optimization. Things like gradient descent, genetic algorithms, interval stuff etc etc.
SAT vs SMT vs CSP vs MIP vs NLP
When and why? Hard to know.
Linear Programming
\(\min c^T x\) \(Ax \le b\)
Tricks
Encoding equality in terms of inequality
\(Ax \le b\) and Ax \ge b $$ gives an equality
Making the objective a single variable
Make a new variable \(t\) and add constraint \(t = c^T x\). Now you can minimize \(t\) instead.
Absolute value encoding
min \sum_i | c_i^T x |
You can create a new variable t_i for each absolute value term replace with
min \sum_i t_i \forall_i t_i <= c_i^T x <= t_i
Sparsity Heuristic
Adding an absolute value regularization term $ \sum |c|$ has a tendency to for the c exactly to zero. You can tune the amount of sparsity you want. Then you can also force the sparsity as a new problem or as new linear constraintst c = 0, and resolve without the regulairization to finetune.
Minimax encoding
A similar feeling trick
min ( max_i c_i^T x)
create a shared bound t
min t \forall_i c_i^T <= t
Because minimizing t tries to push against these new constraints, t will end up being the maximum of them. The constraint correspond to this will be tight, whereas the others will be slack. This trick does not generalize as much as one might naively hope. Nested optimization is often difficult. See bilevel optimization.
Linear in what?
Polytope inclusion
Applications
- least absolute value optimization
- Relaxations of discrete optimization problems
- Network flow
- how do strings hang
- Contact constraints
Discrete Optimization Problems
If an optimizaion problem has a poly time algorithm, there’s a decent chance it isn’t so bad to embed in linear programming
Control
- cvxpy control demo
- Categorical Combinators for Convex Optimization and Model Predictive Control using Cvxpy
- 2D Robot Arm Inverse Kinematics using Mixed Integer Programming in Cvxpy
- Boundcing a Ball with Mixed Integer Programming
- Trajectory Optimization of a Pendulum with Mixed Integer Linear Programming
- Mixed Integer Programming & Quantization Error
- Model Predictive Control of CartPole in OpenAI Gym using OSQP
- LQR with cvxpy
- Flappy Bird as a MIP
Q Learning
More Reinforcement Learning with cvxpy Q learning with linear programming
PDEs
You can encode some PDEs to mip to find the minimal energy ground state. XY model with MIP Coulomb Gas as Mixed integer program Solving the Ising model with MIP
Denoising
Total variation reconstruction https://en.wikipedia.org/wiki/Total_variation_denoising
Compressed sensing
https://en.wikipedia.org/wiki/Compressed_sensing
Inverse Problems
https://en.wikipedia.org/wiki/Inverse_problem Inverse problems are trying to determine stuff from observations. In particular, you’re trying to infer the systems of equations that define a system from different solutions of the system. The system of equations of a wave system or laplace system will include the shape of objects in the system
Optimal Transport
Discrete optimal transports problems can be encoded as an LP. Optimal transports is an idea with a long history (Monge) and a lot of work (fields medals and such). Kind of curious.
Optimal transports is moving hunks of dirt from points A to points B. If these sites are discrete, you can encode how much went along each pathway and have a cost with that pathway associated with it’s geometric distance.
System Identification
https://www.philipzucker.com/system-identification-of-a-pendulum-with-scikit-learn/
Min Path
import cvxpy as cvx
edge = cvx.Variable((3,3), nonneg=True)
path = cvx.Variable((3,3), nonneg=True)
constraints = []
constraints += [path[i,j] >= edge[i,j] for i in range(3) for j in range(3)]
constraints += [path[i,k] >= edge[i,j] + path[j,k] for i in range(3) for j in range(3) for k in range(3)]
constraints += [edge[0,1] == 1, edge[1,2] == 1] # Plus all non existent edges have weight inifinty
prob = cvx.Problem(cvx.Minimize(cvx.sum(path) + cvx.sum(edge)), constraints)
print(prob.solve())
Conversely, min-path algorithms can solve problems of this form.
Max flow
https://en.wikipedia.org/wiki/Maximum_flow_problem
Polynomial Solutions
Fitting floating point polynomials
Floating point up to 32 bit is tractable via complete enumeration of the space. You can design polynomial approximations that directly use the tabulation of true floating point result. See Rlibm project.
Quantization
Polyhedral Computation
We have methods for really dominating polyhedra. You can answer a lot of difficult questions about them. However, linear programmig is cheap, polyhedral methods are expensive
FORMALIZING THE FACE LATTICE OF POLYHEDRA in coq
The face lattice is a neat idea face lattice Abstract polytope Steinitz’s theorem algorithmi steinitz problem. Recognize face lattice of polytopes. https://mathoverflow.net/questions/432636/determining-whether-a-lattice-is-the-face-lattice-of-a-polytope-np-hard-or-und Polyhedral Combinatorics
Ziegler - Lectures on polytopes. 0-1 poltopes
Volume computation Simplicial decomposition
Fourier Motzkin
Fourier motzkin is conceptually simple. It is the inequality analog of gauss elimination, but much more computationally complex. Pick a variable x to project out of your equations. Isolate this variable. Because of minus signs, sometimes you’ll have x less than, sometimes x greater than. \(x <= f_j(y)\) \(g_i(y) <= x\)
Now plug together the n <= inequalities, with the m >= inquealities to produce n*m inequalities that don’t have x. This is a project set of inequalities for which if they are satsifiable, you can produce an x that is also satisfiable. To do so, find y. Find the maximum of \(g(y)\) and the minimum of the \(f(y)\). Pick an arbitrary x between these bounds. \(max_i g_i(y) <= x <= min_j f_j(y)\)
Hrep Vrep
Polyhedra can be dually described as the convex hull of their vertices (Vrep) or the intersection of the halfspaces of their faces (Hrep). Both of these can be concretely represented by collections of vectors. These different representations make different operations easy. Converting between them is hard in general. Some polytope only even have succinct representation in one rep vs the other. For example, a cube has 2*d faces, but 2^d vertices for dimension d.
convex union is easy in VRep, just take the union of the vertices (and cull the interior points?)
interesection is easy in Hrep, take both sets of half spaces
Zonotopes
Zonotopes are the linear images of cubes. This is a special form of polyhedra that sometimes has nice properties
Resources
Methods
Simplex
Interior Point
Convex Optimization
Duality
Cody seems to be under an impression that I understand duality in the context of convex optimization and linear programming. I don’t know nothing about it I suppose.
A convex optimization problem is the following
$latex \min \phi(x)$
$latex g(x) \ge 0$
Where both $latex \phi$ and $latex g$ have to be convex functions. Convex means bowl shaped.
For smooth functions, convex means that the tangent line globally underestimates the value of the function itself.
More generally it means that $latex 0 \le \lambda \le 1 \rightarrow \phi(\lambda x + (1 - \lambda)y) \le \lambda \phi(x) + (1 - \lambda) \phi(y)$
This matters because corners show up pretty often in convex optimization.
This condition is sort of a sister inequality version of the more familiar condition of linearity. $latex f(\alpha x + \beta y) = \alpha f(x) + \beta f(y)$. It allows you to pull stuff out.
These problems are basically polynomial time solvable (i.e. fast) . The reason this is true is that local good moves are global good moves because of the convexity condition. Roughly greedy works, greedy often being gradient descent or a Newton method (using second derivatives). The simplex algorithm is also a greedy algorithm.
A special but very important case of convex optimization is that of linear programming
$latex \min c^T x $
$latex Ax \ge b$
Geometrically this is trying to find the farthest point in the $latex -c$ direction that is in the polytope described by $latex Ax\ge b$. A polytope is a shape made of a bunch of flat sides, the higher dimensional analog of a polygon. Every row of the matrix A correspond to a face of the polytope. The row is the normal vector to the face and the entry in b controls an offset from the origin.
Another way of phrasing this is that we are in a polytope that is the conjunction (And) of the half spaces described by the rows of A.
Linear programming is so important because basically it straddles the discrete and the continuous. Polytopes have a finite number of discrete corners.
There is always corner of the polytope is always at least as good as any point on a face as far as the objective is concerned. This is the seeds of the simplex algorithm
In principle, a Bogo-simplex algorithm would be to take every d possible selections of the rows of A, find the vertex that describes via Gaussian elimination, check that this vertex is in the half space of all the other vectors, and then evaluate the objective on it. This would be an insane combinatorial explosion. The remarkable thing is that this process is not necessary.
Nevertheless, this procedure demonstrates that the solutions are going to be fractions.
Suppose you have an optimal solution to an LP. How do you know it is optimal? There is a simple proof.
Duals as Proof
I can derive new inequality bounds by combining the rows of A. When you add together inequalities, you have to multiply by positive coefficients only, or you will flip the direction of the inequality. If I left multiply by a vector $latex l \ge 0$ I can derive new bounds from the assumption that you are in the polytope. This is similar to deriving new equations from sum of the rows of a matrix that occurs during Gaussian elimination.
$latex l^TAx \ge l^Tb$
How can we use that to find a bound on the objective. Well if we add the additional constraint that $latex A^T l = c$ then we derive the new bound $latex l^TAx = c^T x\ge l^Tb$. So by fiddling with the values of l, we may try to create better and better lower bounds on the objective function derived from the polytope constraints.
If I can exhibit both an $latex l^$ and and $latex x$ such that $latex c^T x = l^*^T b$ then I’m golden no matter how I cheated and killed to find them.
It is a remarkable fact that these vectors are in fact guaranteed to exist. That is the content of strong duality and Farkas Lemma.
Sometimes the dual is called a certificate of optimality. But we can also consider it to be a proof object. It is data that somehow encodes a particular kind of linear sum of inequalities proof.
Geometrical Perspective
As I said above, each row of the matrix A corresponds to a half space.
More generally, any convex set can be considered the as the convex union of all it’s points or dually as the intersections of all the half spaces that fully include the object.
Half spaces can be coordinatized as a normal vector l and an offset parameter r such that $latex { x | l^T x \ge r }$. |
Looking at only the extremal bits (the boundary), a convex set is the convex hull of it’s boundary and the set is the envelope of it’s extremal halfspaces (those that actually touch the set).
$latex { x | x \elem C }$ $latex { (l ,b) | \forall x \in C, l^T x \ge b \cap }$ |
$latex {l | l \in C^_}$ $latex {x | \forall l \in C^_ }$ |
In the affine geometry we’re considering, these two things don’t
The duality becomes much cleaner when we remove the offsets and work in projective geometry. Everything becomes homogenous coordinates and instead of talking about convex sets, we talk about conic sets.
We can recover the affine considerations by taking a slice of the cone z = 1. This is the standard way of going back and forth between homogenous coordinates and regular coordinates.
A Cone is a set that is closed under non-negative combinations of it’s points. The dual cone is the set of all half spaces at the origin that fully contain the primal cone.
Lagrange Multipliers and Legendre Transformation
Quadratic Programming
Conic Programming
Semi-Infinite Programming
infintie number of constraints or variables
EAGO.jl https://github.com/PSORLab/EAGO.jl https://psorlab.github.io/EAGO.jl/dev/semiinfinite/semiinfinite/
Semi definite programs
optimization over positive semidefinite matrices. Very powerful https://arxiv.org/abs/2202.12374 - iterate sdd to get sdp. sounds good quantum and sdp
Tricks
Semidefinite kind of gets you multiplication
Applications
- lyapunov matrices. You need to show that something is stable. The interesting example is that you have linear dynamical system and a discrete set of possible dynamics. You can find a common lyapunov matrix to all of them
- density matrices
- sum of squares polynomial optimization
Sum of Squares
Again we should ask linear in what?
m = [1 x x^2]
m^T Q m is a sum of squares polynomial if Q is positive semidefinite. The eigenvectors are the terms of the sum.
This is the analog of $a^Tm$ being a linear parametrization of an arbitrary polynomial
Trick: you can find the optimal value of a polynomial by considering $p(x) + t$. You want to minimize t such that $p(x) + t$ stays positive.
Moments: This one always kind of confused me. Linear functionals s.t. if f(x) is sos that L[f] >= 0
Laserre hierachy
Adam sherali mention in linear programming? Probablistic moments.
We have a problem that involves x, y, and x*y but is otherwise linear. Suppose we introduce a variable called “xy”, and the constraint xy = x * y
. As a relaxation we can just forget the constraint. This is a pretty loose relaxation.
Convex Optimization
More general solvers, are less specific. Convex Sets Convex Cones Epigraphs Geometric programming - I don’t really know about this one Barrier method
Mixed Integer
Big-M
Encoding boolean functions/relations
Any boolean function/relation is a convex polytope subset of the hypercube + integer constraints.
Piecewise linear encodings
Unrolling time
Products of Variables in Mixed Integer Programming Gurobi tutorial Mccormick relaxation
Techniques papers from the COAT slack
https://pubsonline.informs.org/doi/10.1287/ited.7.2.153
https://link.springer.com/chapter/10.1007/11493853_27
https://dspace.mit.edu/handle/1721.1/96480
https://link.springer.com/chapter/10.1007/11493853_27
https://docs.mosek.com/modeling-cookbook/index.html
Extended Mathematical Programming
extended mathematical programming
disjunctive programming
Stochastic programming
https://en.wikipedia.org/wiki/Stochastic_programming The parameters of the future stage program are unknown. You need to optimize with respect to their expectation
Determinisitc equivalent
statistical sampling / monte carlo
Robust Optimization
Complementarity Problems
xy = 0 constraints, which is another way of saying only one of x and y can be nonzero
Complementrarity Book
If you pick which variables are 0, what remains is a convex optimization problem. So there is sort of a layered thing going on.
Parametric Programming
Difference of Convex Programming
difference of convex programming http://www.lita.univ-lorraine.fr/~lethi/index.php/dca.html https://www.ceremade.dauphine.fr/~carlier/toland.pdf
Bilevel programming
Review on Bilevel optimization Gentle and incomcplete introduction to bilevel optimization yalmip bilevel
Bilevel programming is a natural idea of modelling nested optimization problems. They occur when you want to optimize given that someone else is optimizing For example, suppose you want to optimize a trajectory for worst case error forces. You minimize a cost while errors are trying to maximize it. Or the two problems can have unrelated objectives
Mathematical programming with equilibrium constraints a
modeling bilevel programs in pyomo
Game Theory Nash equilbirium
Follower leader problems interdiction problem
pyomo bilevel deprecated?
from pyomo.environ import *
from pyomo.bilevel import *
model = ConcreteModel()
model.x = Var(bounds=(1,2))
model.v = Var(bounds=(1,2))
model.sub = SubModel()
model.sub.y = Var(bounds=(1,2))
model.sub.w = Var(bounds=(-1,1))
model.o = Objective(expr=model.x + model.sub.y + model.v)
model.c = Constraint(expr=model.x + model.v >= 1.5)
model.sub.o = Objective(expr=model.x+model.sub.w, sense=maximize)
model.sub.c = Constraint(expr=model.sub.y + model.sub.w <= 2.5)
method 1 KKT condition
Optimal points of the lower problem hav KKT conditions ~ gradient = 0. These are equality constraints. This turns it into a nonlinear optimization problem. It is not entirely clear to me that any point that satisfies kkt conditions is necessarily a global optimum. That sounds ok… hmm
KKT conditions include complementarity conditions, converting into a complementarity problem. This can in turn perhaps be encoded into MIP. Or possibly you might end up with a system of polynomial equations which can be treated (at small scale) using homotopy continuation methods or other algebraic methods.
Duality
You can reformulate to the dual problem. This will cause xy terms in the inner objective. Not awesweom
Tools
NEOS server
Online job submission to many solvers. Good place to find out about solvers https://neos-server.org/neos/
COIN-OR
CLP
Free and very fast linear programming solver.
SCIP
many parameters. Fuzz them?
JuMP
CVXPY
PYOMO
YALMIP
yalmip matlab toolbox for optimization. Lots of cool little things though. See tutorials for example
Commercial
There’s nothing wrong with wanting to make money off your hard work, but commercial solvers are less useful/interesting to me personally. They are annoying enough to even get free licences for.
Gams
Ampl
Gurobi
CPLEX
Mosek
Highs
https://ergo-code.github.io/HiGHS/index.html Interesting. Rust, Julia, javascript bindings. MIP solver. Performance seems roughly competititive
COPT
INteresting new performer https://www.shanshu.ai/copt LP, MIP, SOCP, SDP, convex QP and convex QCP Non free
https://github.com/oxfordcontrol
- Clarabel.rs and julia version
- Cosmo
osqp quadratic optimization solver
Resources
- Stephen Boyd
- MIP school
- Operations research stack exchange
- DTU summer school
- Advanced Optimization and Game Theory for Energy Systems
- Hans Mittlemann benchmarks This was kind of causing some drama
resotta opf AC optimal power flow model Power flow is a place where mathemtical optimization is pretty interesting
cvxpygen
Yet Another Math Programming Consultant
«««< HEAD https://twitter.com/caglar_ee ths guy is tweeting a ton f courses
linear optimziation UW-madison
intro to optimization in julia