Optimization

I should renamed this article to Mathematical Programming / Operations Research / maybe convex optimization. More generally, optimization also includes general non linear optimization. Things like gradient descent, genetic algorithms, interval stuff etc etc.

SAT vs SMT vs CSP vs MIP vs NLP

When and why? Hard to know.

Linear Programming

\(\min c^T x\) \(Ax \le b\)

Tricks

Encoding equality in terms of inequality

\(Ax \le b\) and Ax \ge b $$ gives an equality

Making the objective a single variable

Make a new variable \(t\) and add constraint \(t = c^T x\). Now you can minimize \(t\) instead.

Absolute value encoding

min \sum_i c_i^T x

You can create a new variable t_i for each absolute value term replace with

min \sum_i t_i \forall_i t_i <= c_i^T x <= t_i

Sparsity Heuristic

Adding an absolute value regularization term $ \sum |c|$ has a tendency to for the c exactly to zero. You can tune the amount of sparsity you want. Then you can also force the sparsity as a new problem or as new linear constraintst c = 0, and resolve without the regulairization to finetune.

Minimax encoding

A similar feeling trick

min ( max_i c_i^T x)

create a shared bound t

min t \forall_i c_i^T <= t

Because minimizing t tries to push against these new constraints, t will end up being the maximum of them. The constraint correspond to this will be tight, whereas the others will be slack. This trick does not generalize as much as one might naively hope. Nested optimization is often difficult. See bilevel optimization.

Linear in what?

Polytope inclusion

Applications

cvxpy examples

  • least absolute value optimization
  • Relaxations of discrete optimization problems
  • Network flow
  • how do strings hang
  • Contact constraints

Discrete Optimization Problems

If an optimizaion problem has a poly time algorithm, there’s a decent chance it isn’t so bad to embed in linear programming

Control

Q Learning

More Reinforcement Learning with cvxpy Q learning with linear programming

PDEs

You can encode some PDEs to mip to find the minimal energy ground state. XY model with MIP Coulomb Gas as Mixed integer program Solving the Ising model with MIP

Denoising

Total variation reconstruction https://en.wikipedia.org/wiki/Total_variation_denoising

Compressed sensing

https://en.wikipedia.org/wiki/Compressed_sensing

Inverse Problems

https://en.wikipedia.org/wiki/Inverse_problem Inverse problems are trying to determine stuff from observations. In particular, you’re trying to infer the systems of equations that define a system from different solutions of the system. The system of equations of a wave system or laplace system will include the shape of objects in the system

Optimal Transport

Discrete optimal transports problems can be encoded as an LP. Optimal transports is an idea with a long history (Monge) and a lot of work (fields medals and such). Kind of curious.

Optimal transports is moving hunks of dirt from points A to points B. If these sites are discrete, you can encode how much went along each pathway and have a cost with that pathway associated with it’s geometric distance.

System Identification

https://www.philipzucker.com/system-identification-of-a-pendulum-with-scikit-learn/

Min Path

import cvxpy as cvx
edge = cvx.Variable((3,3), nonneg=True)
path = cvx.Variable((3,3), nonneg=True)

constraints = []
constraints += [path[i,j] >= edge[i,j]  for i in range(3) for j in range(3)]
constraints += [path[i,k] >= edge[i,j] + path[j,k] for i in range(3) for j in range(3) for k in range(3)]
constraints += [edge[0,1] == 1, edge[1,2] == 1] # Plus all non existent edges have weight inifinty

prob = cvx.Problem(cvx.Minimize(cvx.sum(path) + cvx.sum(edge)), constraints)
print(prob.solve())

Conversely, min-path algorithms can solve problems of this form.

Max flow

https://en.wikipedia.org/wiki/Maximum_flow_problem

Polynomial Solutions

Fitting floating point polynomials

Floating point up to 32 bit is tractable via complete enumeration of the space. You can design polynomial approximations that directly use the tabulation of true floating point result. See Rlibm project.

Quantization

Polyhedral Computation

We have methods for really dominating polyhedra. You can answer a lot of difficult questions about them. However, linear programmig is cheap, polyhedral methods are expensive

FORMALIZING THE FACE LATTICE OF POLYHEDRA in coq

The face lattice is a neat idea face lattice Abstract polytope Steinitz’s theorem algorithmi steinitz problem. Recognize face lattice of polytopes. https://mathoverflow.net/questions/432636/determining-whether-a-lattice-is-the-face-lattice-of-a-polytope-np-hard-or-und Polyhedral Combinatorics

Hirsch Conjecture

Ziegler - Lectures on polytopes. 0-1 poltopes

Volume computation Simplicial decomposition

Fourier Motzkin

Fourier motzkin is conceptually simple. It is the inequality analog of gauss elimination, but much more computationally complex. Pick a variable x to project out of your equations. Isolate this variable. Because of minus signs, sometimes you’ll have x less than, sometimes x greater than. \(x <= f_j(y)\) \(g_i(y) <= x\)

Now plug together the n <= inequalities, with the m >= inquealities to produce n*m inequalities that don’t have x. This is a project set of inequalities for which if they are satsifiable, you can produce an x that is also satisfiable. To do so, find y. Find the maximum of \(g(y)\) and the minimum of the \(f(y)\). Pick an arbitrary x between these bounds. \(max_i g_i(y) <= x <= min_j f_j(y)\)

Hrep Vrep

Polyhedra can be dually described as the convex hull of their vertices (Vrep) or the intersection of the halfspaces of their faces (Hrep). Both of these can be concretely represented by collections of vectors. These different representations make different operations easy. Converting between them is hard in general. Some polytope only even have succinct representation in one rep vs the other. For example, a cube has 2*d faces, but 2^d vertices for dimension d.

convex union is easy in VRep, just take the union of the vertices (and cull the interior points?)

interesection is easy in Hrep, take both sets of half spaces

Zonotopes

Zonotopes are the linear images of cubes. This is a special form of polyhedra that sometimes has nice properties

Resources

Fukuda lecture notes

Methods

Simplex

Interior Point

Convex Optimization

Duality

Cody seems to be under an impression that I understand duality in the context of convex optimization and linear programming. I don’t know nothing about it I suppose.

A convex optimization problem is the following

$latex \min \phi(x)$

$latex g(x) \ge 0$

Where both $latex \phi$ and $latex g$ have to be convex functions. Convex means bowl shaped.

For smooth functions, convex means that the tangent line globally underestimates the value of the function itself.

More generally it means that $latex 0 \le \lambda \le 1 \rightarrow \phi(\lambda x + (1 - \lambda)y) \le \lambda \phi(x) + (1 - \lambda) \phi(y)$

This matters because corners show up pretty often in convex optimization.

This condition is sort of a sister inequality version of the more familiar condition of linearity. $latex f(\alpha x + \beta y) = \alpha f(x) + \beta f(y)$. It allows you to pull stuff out.

These problems are basically polynomial time solvable (i.e. fast) . The reason this is true is that local good moves are global good moves because of the convexity condition. Roughly greedy works, greedy often being gradient descent or a Newton method (using second derivatives). The simplex algorithm is also a greedy algorithm.

A special but very important case of convex optimization is that of linear programming

$latex \min c^T x $

$latex Ax \ge b$

Geometrically this is trying to find the farthest point in the $latex -c$ direction that is in the polytope described by $latex Ax\ge b$. A polytope is a shape made of a bunch of flat sides, the higher dimensional analog of a polygon. Every row of the matrix A correspond to a face of the polytope. The row is the normal vector to the face and the entry in b controls an offset from the origin.

Another way of phrasing this is that we are in a polytope that is the conjunction (And) of the half spaces described by the rows of A.

Linear programming is so important because basically it straddles the discrete and the continuous. Polytopes have a finite number of discrete corners.

There is always corner of the polytope is always at least as good as any point on a face as far as the objective is concerned. This is the seeds of the simplex algorithm

In principle, a Bogo-simplex algorithm would be to take every d possible selections of the rows of A, find the vertex that describes via Gaussian elimination, check that this vertex is in the half space of all the other vectors, and then evaluate the objective on it. This would be an insane combinatorial explosion. The remarkable thing is that this process is not necessary.

Nevertheless, this procedure demonstrates that the solutions are going to be fractions.

Suppose you have an optimal solution to an LP. How do you know it is optimal? There is a simple proof.

Duals as Proof

I can derive new inequality bounds by combining the rows of A. When you add together inequalities, you have to multiply by positive coefficients only, or you will flip the direction of the inequality. If I left multiply by a vector $latex l \ge 0$ I can derive new bounds from the assumption that you are in the polytope. This is similar to deriving new equations from sum of the rows of a matrix that occurs during Gaussian elimination.

$latex l^TAx \ge l^Tb$

How can we use that to find a bound on the objective. Well if we add the additional constraint that $latex A^T l = c$ then we derive the new bound $latex l^TAx = c^T x\ge l^Tb$. So by fiddling with the values of l, we may try to create better and better lower bounds on the objective function derived from the polytope constraints.

If I can exhibit both an $latex l^$ and and $latex x$ such that $latex c^T x = l^*^T b$ then I’m golden no matter how I cheated and killed to find them.

It is a remarkable fact that these vectors are in fact guaranteed to exist. That is the content of strong duality and Farkas Lemma.

Sometimes the dual is called a certificate of optimality. But we can also consider it to be a proof object. It is data that somehow encodes a particular kind of linear sum of inequalities proof.

Geometrical Perspective

As I said above, each row of the matrix A corresponds to a half space.

More generally, any convex set can be considered the as the convex union of all it’s points or dually as the intersections of all the half spaces that fully include the object.

Half spaces can be coordinatized as a normal vector l and an offset parameter r such that $latex { x l^T x \ge r }$.

Looking at only the extremal bits (the boundary), a convex set is the convex hull of it’s boundary and the set is the envelope of it’s extremal halfspaces (those that actually touch the set).

$latex { x x \elem C }$ $latex { (l ,b) \forall x \in C, l^T x \ge b \cap }$
$latex {l l \in C^_}$ $latex {x \forall l \in C^_ }$

In the affine geometry we’re considering, these two things don’t

The duality becomes much cleaner when we remove the offsets and work in projective geometry. Everything becomes homogenous coordinates and instead of talking about convex sets, we talk about conic sets.

We can recover the affine considerations by taking a slice of the cone z = 1. This is the standard way of going back and forth between homogenous coordinates and regular coordinates.

A Cone is a set that is closed under non-negative combinations of it’s points. The dual cone is the set of all half spaces at the origin that fully contain the primal cone.

Lagrange Multipliers and Legendre Transformation

Quadratic Programming

Conic Programming

Semi-Infinite Programming

infintie number of constraints or variables

EAGO.jl https://github.com/PSORLab/EAGO.jl https://psorlab.github.io/EAGO.jl/dev/semiinfinite/semiinfinite/

Semi definite programs

optimization over positive semidefinite matrices. Very powerful https://arxiv.org/abs/2202.12374 - iterate sdd to get sdp. sounds good quantum and sdp

Tricks

Semidefinite kind of gets you multiplication

Applications

  • lyapunov matrices. You need to show that something is stable. The interesting example is that you have linear dynamical system and a discrete set of possible dynamics. You can find a common lyapunov matrix to all of them
  • density matrices
  • sum of squares polynomial optimization

Sum of Squares

Again we should ask linear in what?

m = [1 x x^2]

m^T Q m is a sum of squares polynomial if Q is positive semidefinite. The eigenvectors are the terms of the sum.

This is the analog of $a^Tm$ being a linear parametrization of an arbitrary polynomial

Trick: you can find the optimal value of a polynomial by considering $p(x) + t$. You want to minimize t such that $p(x) + t$ stays positive.

Moments: This one always kind of confused me. Linear functionals s.t. if f(x) is sos that L[f] >= 0

Laserre hierachy

Adam sherali mention in linear programming? Probablistic moments.

We have a problem that involves x, y, and x*y but is otherwise linear. Suppose we introduce a variable called “xy”, and the constraint xy = x * y. As a relaxation we can just forget the constraint. This is a pretty loose relaxation.

Convex Optimization

More general solvers, are less specific. Convex Sets Convex Cones Epigraphs Geometric programming - I don’t really know about this one Barrier method

Mixed Integer

Big-M

Encoding boolean functions/relations

Any boolean function/relation is a convex polytope subset of the hypercube + integer constraints.

Piecewise linear encodings

PiecewiseLinearOpt.jl paper

Unrolling time

Products of Variables in Mixed Integer Programming Gurobi tutorial Mccormick relaxation

Techniques papers from the COAT slack

https://pubsonline.informs.org/doi/10.1287/ited.7.2.153

https://link.springer.com/chapter/10.1007/11493853_27

https://dspace.mit.edu/handle/1721.1/96480

https://link.springer.com/chapter/10.1007/11493853_27

https://docs.mosek.com/modeling-cookbook/index.html

Extended Mathematical Programming

extended mathematical programming

disjunctive programming

Stochastic programming

https://en.wikipedia.org/wiki/Stochastic_programming The parameters of the future stage program are unknown. You need to optimize with respect to their expectation

Determinisitc equivalent

statistical sampling / monte carlo

Robust Optimization

Complementarity Problems

xy = 0 constraints, which is another way of saying only one of x and y can be nonzero

Complementrarity Book

If you pick which variables are 0, what remains is a convex optimization problem. So there is sort of a layered thing going on.

Parametric Programming

Difference of Convex Programming

difference of convex programming http://www.lita.univ-lorraine.fr/~lethi/index.php/dca.html https://www.ceremade.dauphine.fr/~carlier/toland.pdf

Bilevel programming

Review on Bilevel optimization Gentle and incomcplete introduction to bilevel optimization yalmip bilevel

Bilevel programming is a natural idea of modelling nested optimization problems. They occur when you want to optimize given that someone else is optimizing For example, suppose you want to optimize a trajectory for worst case error forces. You minimize a cost while errors are trying to maximize it. Or the two problems can have unrelated objectives

GAMS

Mathematical programming with equilibrium constraints a

modeling bilevel programs in pyomo

Game Theory Nash equilbirium

Follower leader problems interdiction problem

pyomo bilevel deprecated?

from pyomo.environ import *
from pyomo.bilevel import *
model = ConcreteModel()
model.x = Var(bounds=(1,2))
model.v = Var(bounds=(1,2))
model.sub = SubModel()
model.sub.y = Var(bounds=(1,2))
model.sub.w = Var(bounds=(-1,1))
model.o = Objective(expr=model.x + model.sub.y + model.v)
model.c = Constraint(expr=model.x + model.v >= 1.5)
model.sub.o = Objective(expr=model.x+model.sub.w, sense=maximize)
model.sub.c = Constraint(expr=model.sub.y + model.sub.w <= 2.5)

method 1 KKT condition

Optimal points of the lower problem hav KKT conditions ~ gradient = 0. These are equality constraints. This turns it into a nonlinear optimization problem. It is not entirely clear to me that any point that satisfies kkt conditions is necessarily a global optimum. That sounds ok… hmm

KKT conditions include complementarity conditions, converting into a complementarity problem. This can in turn perhaps be encoded into MIP. Or possibly you might end up with a system of polynomial equations which can be treated (at small scale) using homotopy continuation methods or other algebraic methods.

Duality

You can reformulate to the dual problem. This will cause xy terms in the inner objective. Not awesweom

Tools

NEOS server

Online job submission to many solvers. Good place to find out about solvers https://neos-server.org/neos/

COIN-OR

CLP

Free and very fast linear programming solver.

SCIP

many parameters. Fuzz them?

JuMP

CVXPY

PYOMO

YALMIP

yalmip matlab toolbox for optimization. Lots of cool little things though. See tutorials for example

Commercial

There’s nothing wrong with wanting to make money off your hard work, but commercial solvers are less useful/interesting to me personally. They are annoying enough to even get free licences for.

Gams

Ampl

Gurobi

CPLEX

Mosek

Highs

https://ergo-code.github.io/HiGHS/index.html Interesting. Rust, Julia, javascript bindings. MIP solver. Performance seems roughly competititive

COPT

INteresting new performer https://www.shanshu.ai/copt LP, MIP, SOCP, SDP, convex QP and convex QCP Non free

https://github.com/oxfordcontrol

osqp quadratic optimization solver

Resources

resotta opf AC optimal power flow model Power flow is a place where mathemtical optimization is pretty interesting

Gekko

cvxpygen

Yet Another Math Programming Consultant

Max cut

«««< HEAD https://twitter.com/caglar_ee ths guy is tweeting a ton f courses

linear optimziation UW-madison

integer optimization

intro to optimization in julia