Sunday, April 29, 2018

Piecewise nonlinear functions

In [1] a question is posed how we can model the piecewise nonlinear function depicted below:

Model the dashed line h(x)

We want to model that the costs are represented by the function \(h(x)\) (the dashed curve in the picture). Let's make a few assumptions:

  • The function \(f(x)\) is a quadratic function while \(g(x)\) is linear.
  • \(f(x)\) is convex
  • We are minimizing cost (or maximizing profit)
  • Update: I also assume this is part of a larger problem, with multiple items. If this was the only curve, we just can use \(\min f(x) = \min x\) due to monotonicity (see comments below). If we want to minimize \(\displaystyle \sum_i \mathit{Cost}_i\) this is no longer applicable.

Choose cheapest

One way is to observe that for any \(x\) we choose the cheapest curve. I.e. we have \(\mathit{Cost}=\min(f(x),g(x))\). This can be modeled as:
\[\begin{align}\min \> & \mathit{Cost}\\ & \mathit{Cost}\ge f(x)-M\delta\\ &\mathit{Cost}\ge g(x)-M(1-\delta)\\&x \in [0,x_{max}]\\ & \delta \in \{0,1\}\end{align}\] The proper value for \(M\) would be \(M=f(x_{max})-g(x_{max})\). Basically this MIQCP (Mixed-Integer Quadratically Constrained Programming) formulation says: just choose one of the curves and ignore the other one. The objective will make sure the most expensive curve is ignored and the cheapest curve is retained. We did not have to use \(x_0\) at all in this formulation.

The big-M value can become large in case \(x_{max}\) is large. Sometimes a SOS1 formulation is proposed. For this case this means:\[\begin{align}\min \> & \mathit{Cost}\\ & \mathit{Cost}\ge f(x)-s_1\\ &\mathit{Cost}\ge g(x)-s_2\\&x \in [0,x_{max}]\\ & s_1,s_2 \ge 0\\& \{s_1,s_2\} \in SOS1\end{align}\]

If the solver allows indicator constraints we can write: \[\begin{align}\min \> & \mathit{Cost}\\ & \delta=0\Rightarrow \mathit{Cost}\ge f(x)\\ &\delta=1 \Rightarrow \mathit{Cost}\ge g(x)\\&x \in [0,x_{max}]\\& \delta \in \{0,1\}\end{align}\]

Use Intervals

A more standard approach would be to try to formulate: \[\mathit{Cost}=\begin{cases} f(x) & \text{if $x \le x_0$} \\ g(x) &\text{if $x\gt x_0$}\end{cases}\] Instead of letting the solver decide the best value of \(\delta\), now we use the current value of \(x\) to determine \(\delta\). The rule \[\delta = \begin{cases} 0 & \text{if $x \in [0,x_0]$} \\ 1 & \text{if $x \in [x_0,x_{max}]$}\end{cases}\] can be rewritten as:\[\begin{align} & \delta = 0 \Rightarrow x \in [0,x_0]\\ & \delta = 1 \Rightarrow  x \in [x_0,x_{max}]\end{align}\] (Note that we added some ambiguity for \(x=x0\). In practice that is no problem. Int can be argued this is good thing [2].) This expression in turn can be formulated as two inequalities: \[x_0 \delta \le x \le x_0 + (x_{max}-x_0) \delta\] We can add this to any of the earlier formulations, e.g.: \[\begin{align}\min \> & \mathit{Cost}\\ & \mathit{Cost}\ge f(x)-M\delta\\ &\mathit{Cost}\ge g(x)-M(1-\delta)\\ &x_0 \delta \le x \le x_0 + (x_{max}-x_0) \delta \\&x \in [0,x_{max}]\\ & \delta \in \{0,1\}\end{align}\]


  1. Piecewise non-linear cost function in Cplex,
  2. Strict inequalities in optimization models,

Wednesday, April 25, 2018

Regular Expression Tester

On a regular basis (pun intended), I am using regular expressions for simple parsing tasks. Writing or debugging a regular expression is not always super easy. I found that web site [1] is quite helpful to experiment with regular expressions and input strings.

Using the default prce (php) flavor

Groups captured are color coded

The tools supports different flavors (I suspect pcre (php) is the most "standard"). Indeed this version checks out my regexp. However when using the javascript version, I see:

No match with Javascript flavor
With the pcre (php) there is even a debugging tool which explains in detail how the regexp algorithm proceeds:



  1. Online regex tester and debugger: PHP, PCRE, Python, Golang and JavaScript,

Thursday, April 19, 2018

Covariance Matrix not positive definite in portfolio models

When solving mean-variance portfolio optimization models, the majority of solvers will require the variance-covariance matrix is positive definite. In theory, any covariance matrix is positive semi-definite. But sometimes in practice it is not, something I saw today.


The sample covariance matrix is \[q_{i,j} = \frac{1}{T-1}\sum_t (r_{t,i} - \mu_i)(r_{t,j} - \mu_j)\] where \(T\) are the number of time periods, and \(r_{t,i}-\mu_i\) are the mean adjusted returns. This matrix is obviously symmetric. To show it is positive semi-definite, we can prove \(x^TQx\ge 0\) for any \(x\). We have \[\begin{align}z &= x^TQx\\ &=\sum_{i,j} x_i x_j q_{i,j}\\&=\sum_{i,j} x_i x_j \left[\frac{1}{T-1}\sum_t (r_{t,i} - \mu_i)(r_{t,j} - \mu_j)\right]\\&=\frac{1}{T-1}\sum_t \left(\sum_i (r_{i,t}-\mu_i)x_i\right) \left(\sum_j (r_{j,t}-\mu_j)x_j\right)\\&= \frac{1}{T-1}\sum_t w_t^2 \ge 0 \end{align}\] where \[w_t = \sum_i (r_{i,t}-\mu_i)x_i = \sum_i \tilde{r}_{i,t} x_i\]

From this, we know the covariance matrix is in theory positive semi-definite. In practice things are more complicated.


In practice covariance matrices can become indefinite. For instance, if the number of instruments \(n\) is larger than the number of observations \(T\) we have a high probability we will have some slightly negative eigenvalues for \(Q\). In the case these eigen values become more negative, you can expect to see messages like:

*** Objective Q not PSD (diagonal adjustment of 1.0e-05 would be required) 
The quadratic coefficient matrix in the objective is not positive semidefinite as expected for a minimization problem. 
The quadratic coefficient matrix in the objective is not PSD.

Sometimes the solver can fix things up a bit:

Warning:  diagonal perturbation of 6.6e-06 required to create PSD Q. 
Warning: diagonal adjustment of 4.0e-08 performed to make Q PSD

There are different fixes suggested:

  • If \(T\lt n\) the matrix is rank-deficient we can expect some issues. With this property, some of eigenvalues will be zero (i.e. the matrix \(Q\) is not strictly positive definite but rather positive semi-definite). It does not take much − just some small inaccuracies in floating point point computations − to make \(Q\) indefinite. A standard fix is: add more data (or reduce the number of instruments \(n\)). 
  • Fix \(Q\) by adding small numbers to the diagonal. You can see from the messages, some solvers try this automatically.
  • Fix \(Q\) by performing an eigen decomposition, replace negative eigenvalues and reassemble the \(Q\) matrix [4].
  • Check the data. Sometimes outliers caused by typo's, transcription errors, dropped decimal points etc., can make things worse.
  • Use some advanced method to get better covariances, e.g. shrinkage [1]

An alternative model 

A different approach is to use a different optimization model, inspired by the piece of theory above. The standard mean variance portfolio optimization problem looks like \[\begin{align}\min\>&x^TQx\\&\sum_i x_i = 1\\&\sum_i \mu_i x_i \ge R\\&x_i \ge 0\end{align}\] Using the original mean adjusted returns \(\tilde{r}_{t,i} = r_{t,i}-\mu_i\), we can develop an alternative formulation: \[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align}\min&\sum_t w_t^2\\&w_t = \sum_i \tilde{r}_{t,i} x_i \\ &\sum_i x_i = 1\\&\sum_i \mu_i x_i \ge R\\&x_i \ge 0\end{align}}\] Now we have a totally benign diagonal \(Q\) matrix. With this formulation, you should never see any messages complaining about \(Q\) not being positive semi-definite.

If the model has \(T \ll n\) this model has another advantage: much less data is moved around. The original \(Q\) matrix is \((n\times n)\) which, in this case, is much larger than the \((T\times n)\) matrix \(\tilde{r}_{t,i}\).

The reformulated model should provide the same solution in normal cases (where the calculated \(Q\) matrix is numerically positive semi-definite). The objective is different as I dropped the \(1/(T-1)\) factor, but the optimal values for the \(x_i\) variable should be identical (up to some tolerance).

Related: least squares

This reformulation resembles how we can work around convexity issues when dealing with least squares objectives\[\min\>||Ax-b||^2\] This model is also known to occasionally produce errors relating to the quadratic coefficient matrix not being positive semi-definite. In [2] this is solved by the following reformulation:\[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align} \min&\sum_i v_i^2\\&v = Ax-b\end{align}}\]

This reformulated model has more variables and equations, but the "easy" quadratic objective can make the difference between being able to solve the problem or not. There can also be performance advantages.


  1. Ledoit, Olivier and Wolf, Michael, Honey, I Shrunk the Sample Covariance Matrix (June 2003). UPF Economics and Business Working Paper No. 691.
  2. Least Squares as QP: convexity issues,
  3. The picture is from
  4. Eigen decomposition example. SDP Model: Imputing a covariance matrix,

Friday, April 13, 2018

A difficult MIP construct: counting consecutive 1's

In [1] following modeling question is posed:

Given binary variables \(x_i \in \{0,1\}\), I want to form an integer variable \(z_i\) that has the length of the preceding string of ones (this is sometimes called the run-length). I.e.:
Given x(i) find z(i)
This is not so easy to do. Here is my attempt. The idea behind my approach, is to use another intermediate integer variable \(a_i\) (accumulator) defined by \[a_i = \begin{cases} a_{i-1}+1 & \text{if $x_i=1$ }\\ 0 & \text{otherwise}\end{cases}\]

Additional variable a(i)

Step 1: form \(a_i\)

The calculation of variable \(a_i\) can be stated as a quadratic constraint: \[a_i = x_i(a_{i-1}+1)\]

In terms of indicator constraints (implications as available in many MIP solvers) this would look as follows: \[\begin{align} &x_i=0 \Rightarrow a_i = 0\\ &x_i = 1 \Rightarrow a_i = a_{i-1}+1\end{align}\]

If the MIP solver does not support indicator constraints, we can write this as a bunch of linear inequalities: \[\begin{align} & (a_{i-1}+1) - (1-x_i)M \le a_i \le (a_{i-1}+1) + (1-x_i)M\\  & 0 \le a_i \le x_i M \end{align}\] Here \(M\) is the length of the largest possible string of consecutive 1's, e.g.\(M=\mathit{card}(i)\).

We can also linearize the quadratic constraint, but that leads to extra variables.

Step 2: form \(z_i\)

To go from \(a_i\) to \(z_i\), we can define: \[z_i = \begin{cases} a_{i-1} & \text{if $x_i=0$ and $x_{i-1}=1$ }\\ 0 & \text{otherwise}\end{cases}\] In implication form this is:\[\begin{align} & x_i=0 \text{ and } x_{i-1}=1 \Rightarrow z_i =a_{i-1}\\ & x_i=1 \text{ or } x_{i-1}=0 \Rightarrow z_i=0 \end{align}\] This is not immediately ready for use with indicator constraints (these constraints want a single binary variable at the left of \(\Rightarrow\)). We can introduce a new binary variable \(y_i \in \{0,1\}\)  and linearize \(y_i = x_{i-1} (1-x_i)\) using: \[y_i = x_{i-1} (1-x_i) \Leftrightarrow \begin{cases} y_i \le x_{i-1}\\ y_i \le 1-x_i\\ y_i \ge x_{i-1} - x_i\\ y_i \in \{0,1\}\end{cases}\] (We can relax \(y_i\) to \(y_i \in [0,1]\)). Now we are able to implement the indicator constraints:\[\begin{align}&y_i \ge x_{i-1}-x_i\\ &y_i \le x_{i-1}\\ & y_i \le 1-x_i\\ & y_i=1 \Rightarrow z_i =a_{i-1}\\ & y_i=0 \Rightarrow z_i=0 \\& y_i \in \{0,1\} \end{align}\] 

Written as linear inequalities, we have: \[\begin{align}  & a_{i-1} - x_i M - (1-x_{i-1})M \le z_i \le a_{i-1} + x_i M + (1-x_{i-1})M\\   & 0 \le z_i \le 1-x_{i-1} M \\& z_i\le (1-x_i)M \end{align}\]

An MINLP formulation is simpler: \[z_i = a_{i-1}(1-x_i) x_{i-1}\]

Putting things together

To summarize: we can come up with (at least) three different formulations:

big-M inequalitiesIndicator constraintsMINLP formulation
\[\begin{align}&a_i \le (a_{i-1}+1) + (1-x_i)M\\ & a_i \ge (a_{i-1}+1) - (1-x_i)M\\& a_i \le x_i M\\& z_i \le a_{i-1} + (1-x_{i-1}+x_i)M\\& z_i \ge a_{i-1} - (1-x_{i-1}+x_i)M \\ &z_i \le x_{i-1} M\\ & z_i\le (1-x_i)M\\& a_{i} \in \{0,1,2,\dots\}\\ & z_{i} \in \{0,1,2,\dots\} \end{align}\] \[\begin{align} & x_i=0 \Rightarrow a_i = 0\\ &x_i = 1 \Rightarrow a_i = a_{i-1}+1\\ & y_i \le 1-x_i\\& y_i \le x_{i-1}\\ &y_i \ge x_{i-1}-x_i \\ & y_i = 0 \Rightarrow z_i = 0\\& y_i=1 \Rightarrow z_i = a_{i-1}\\ &a_{i} \in \{0,1,2,\dots\}\\ &y_{i} \in \{0,1\} \\ & z_{i} \in \{0,1,2,\dots\} \end{align}\] \[\begin{align} &a_i = x_i(a_{i-1}+1)\\ &z_i = a_{i-1}(1-x_i) x_{i-1}\\& a_{i} \in \{0,1,2,\dots\}\\ & z_{i} \in \{0,1,2,\dots\} \end{align}\]

This is complex enough, we really have to try this out. The results of the big-\(M\) inequality approach look like:

----     71 VARIABLE x.L  

i3 1.000,    i4 1.000,    i7 1.000,    i8 1.000,    i9 1.000

----     71 VARIABLE a.L  

i3 1.000,    i4 2.000,    i7 1.000,    i8 2.000,    i9 3.000

----     71 VARIABLE z.L  

i5  2.000,    i10 3.000

The MINLP formulation is the most elegant: just two equations.

Can we do this without intermediate variables \(a_i\)?

Yes, at the expense of more nonzero elements in the LP matrix. We can define: \[ z_i = \begin{cases} \displaystyle \sum_{k\lt i} x_k - \displaystyle \sum_{k\lt i} z_k & \text{if $x_i=0$ and $x_{i-1}=1$ }\\ 0 & \text{otherwise}\end{cases}\]

z(i10) = green cells − blue cells

In big-\(M\) inequality form we can write: \[\begin{align}  & \sum_{k\lt i} (x_k - z_k)  - (1-x_{i-1}+x_i)M \le z_i \le  \sum_{k\lt i} (x_k -z_k)  + (1-x_{i-1}+x_i)M\\   & 0 \le z_i \le 1-x_{i-1} M \\& z_i\le (1-x_i)M \end{align}\] I prefer the versions with extra variables \(a_i\).

A stronger formulation

In the comments of [1] a much stronger formulation is proposed: no big-\(M\)'s needed. A disadvantage is that we need many variables (\(n^2/2\)) and even more constraints. The idea is to find occurrences of a run starting at position \(i\) and ending at position \(j\).
y(i,j)=1 represents a run from i to j 

Such a run can be found by: \[y_{i,j}=(1-x_{i-1})\left(\prod_{k=i}^j x_k\right) (1-x_{j+1})\] This is a product of binary variables. The basic linearization is: \[y=x_1 x_2 \Leftrightarrow \begin{cases} y \le x_1\\ y \le x_2\\ y \ge x_1+x_2-1\\ y\in \{0,1\}\end{cases}\] This formulation allows us to relax \(y\in \{0,1\}\) to \(y\in [0,1]\)  Generalizing this, leads to:\[\begin{align}&y_{i,j} \le 1-x_{i-1}\\&y_{i,j} \le x_{k}&& k=i,\dots,j\\ &y_{i,j} \le 1-x_{j+1}\\&y_{i,j}\ge \sum_{k=i}^j x_k - x_{i-1}-x_{j+1} - (j-i)\\ &y_{i,j}\in \{0,1\} && j\ge i\\ \end{align}\] Again, if we want, we can relax \(y_{i,j}\) to be continuous between zero and one. With these \(y_{i,j}\) we can calculate \(z_j\) as:\[ z_j = \sum_{i\le j} (j-i+1)y_{i,j}\]

When we try this out we see:

----    134 VARIABLE x.L  

i3 1.000,    i4 1.000,    i7 1.000,    i8 1.000,    i9 1.000

----    134 VARIABLE y.L  

             i4          i9

i3        1.000
i7                    1.000

----    134 VARIABLE z.L  

i4 2.000,    i9 3.000

For our small example with \(n=\mathit{card}(i)=11\) we end up with a total of 88 variables (66 are \(y_{i,j}\)'s) and 495 constraints. If we would have \(n=\mathit{card}(i)=100\) these numbers increase to 5,250 variables and 186,950 equations. The number of equations is roughly \(n^3/6\).

With some invented models, this formulation did not seem competitive with the earlier MIP models. There may well be models where this formulation work better. Modern MIP solvers are very good in generating cuts. This means that "smaller" (fewer variables and equations) but weaker formulations are not so bad anymore: the solver will strengthen the model as it goes.

Maximum run-length

I doubt the user really needs this. Usually if we know more about the rest of the problem, we can simplify things considerably. For instance, suppose the model really only needs to establish an upper bound on the run-length. If we only need \(z_i \le R\), we can drop a boatload of constraints. One formulation can be: \[ \begin{align} & x_i =1 \Rightarrow z_i = z_{i-1} + 1\\ & z_i \le R\end{align}\] A formulation only involving the original \(x_i\) variables can be: \[\sum_{k=i}^{i+R}x_k \le R \>\> \forall i\]

This is a good example where we need to look at the whole model to give the best possible advice on how to model things. Just looking at a fragment may lead to formulations that are too general and thus way too complicated.


  1. Linear/Integer Programming: Counting Consecutive Ones,

MATLAB: LP more expensive than MIP?

A strange report:

The model has variables that assume automatically 0-1 values. When we throw the model into intlinprog it solves much faster than when solving as an LP (using linprog).

Solve as LP

Optimization terminated.

obj =


Elapsed time is 102.261043 seconds.

Solve as MIP

LP:                Optimal objective value is -1486.439509.                                         

Optimal solution found.

Intlinprog stopped at the root node because the objective value is within a gap tolerance of the optimal value,
options.TolGapAbs = 0 (the default value). The intcon variables are integer within tolerance,
options.TolInteger = 1e-05 (the default value).

obj =


Elapsed time is 3.679962 seconds.

I am not using Matlab extensively (mainly for translating prototype algorithms into "real" code -- sometimes suffering from what can be called "write-only" Matlab code).

So I was thinking: the MIP solver starts with an LP (solving the root). So how can this happen? An exotic reason can be: when using binary variables instead of continuous variables, additional reductions can be applied in the presolve. This sounds somewhat far-fetched however.

Luckily the reason was much more straightforward. In older versions of  linprog, the default LP solver is an interior point code which is much slower. Adding more logging showed:

Solve as LP: interior point

  Residuals:   Primal     Dual     Upper    Duality     Total
               Infeas    Infeas    Bounds     Gap        Rel
               A*x-b   A'*y+z-w-f {x}+s-ub  x'*z+s'*w   Error
  Iter    0:  1.58e+06 1.07e+05 9.97e+04 1.61e+10 5.01e+04
  Iter    1:  8.61e+03 3.80e-11 5.42e+02 1.42e+08 2.72e+02
  Iter    2:  2.53e+00 1.39e-10 1.59e-01 2.07e+07 1.00e+00
  Iter    3:  2.32e+00 1.40e-10 1.46e-01 2.07e+07 1.00e+00
  Iter    4:  5.58e-14 1.79e-11 3.63e-14 2.18e+05 1.00e+00
  Iter    5:  5.75e-14 6.53e-12 4.10e-14 1.06e+05 1.00e+00
  Iter    6:  5.31e-14 1.82e-12 3.02e-14 7.26e+03 9.96e-01
  Iter    7:  5.41e-14 1.12e-12 2.88e-14 4.87e+03 9.73e-01
  Iter    8:  5.79e-14 5.98e-13 2.92e-14 2.84e+03 9.04e-01
  Iter    9:  4.47e-14 3.66e-13 3.00e-14 1.86e+03 8.04e-01
  Iter   10:  4.37e-14 2.89e-13 3.44e-14 1.63e+03 7.31e-01
  Iter   11:  3.90e-14 1.84e-13 3.58e-14 1.21e+03 6.23e-01
  Iter   12:  3.74e-14 1.66e-13 3.27e-14 8.01e+02 4.62e-01
  Iter   13:  4.10e-14 1.58e-13 2.95e-14 3.65e+02 2.28e-01
  Iter   14:  4.53e-14 1.47e-13 2.82e-14 1.14e+02 7.43e-02
  Iter   15:  4.45e-14 1.42e-13 2.74e-14 4.82e+01 3.19e-02
  Iter   16:  3.82e-14 1.39e-13 2.74e-14 2.66e+01 1.78e-02
  Iter   17:  2.16e-13 1.48e-13 2.79e-14 9.02e+00 6.05e-03
  Iter   18:  7.02e-14 1.66e-13 2.86e-14 8.61e-01 5.79e-04
  Iter   19:  2.31e-14 1.75e-13 2.93e-14 1.21e-03 8.13e-07
  Iter   20:  2.27e-14 1.70e-13 2.98e-14 1.23e-09 9.17e-13
Optimization terminated.

obj =


Elapsed time is 104.291608 seconds.

Solve as LP: dual simplex

LP preprocessing removed 251000 inequalities, 0 equalities,                                         
0 variables, and 251000 non-zero elements in 3.87 secs.                                             

    Iter      Time           Fval  Primal Infeas    Dual Infeas                                     
       0      4.07  -1.002605e+05   7.916327e+03   0.000000e+00                                     
     105      4.13  -7.847379e+04   6.550848e+03   0.000000e+00                                     
     210      4.18  -5.757528e+04   5.213011e+03   0.000000e+00                                     
     315      4.22  -3.717179e+04   3.843284e+03   0.000000e+00                                     
     420      4.26  -1.733225e+04   2.330379e+03   0.000000e+00                                     
     525      4.30  -1.699942e+03   2.700000e+01   0.000000e+00                                     
     630      4.32  -1.665130e+03   2.360085e+01   0.000000e+00                                     
     735      4.37  -1.648131e+03   2.567100e+01   0.000000e+00                                     
     840      4.42  -1.630944e+03   2.863564e+01   0.000000e+00                                     
     945      4.49  -1.615813e+03   3.931921e+01   0.000000e+00                                     
    1050      4.57  -1.598377e+03   5.398148e+01   0.000000e+00                                     
    1155      4.68  -1.580648e+03   1.585970e+02   0.000000e+00                                     
    1260      4.76  -1.513779e+03   4.815600e+01   0.000000e+00                                     
    1365      4.81  -1.502488e+03   4.176123e+01   0.000000e+00                                     
    1470      4.88  -1.496429e+03   3.964846e+01   0.000000e+00                                     
    1575      4.97  -1.493434e+03   2.932576e+01   0.000000e+00                                     
    1680      5.06  -1.489125e+03   1.624808e+01   0.000000e+00                                     
    1785      5.19  -1.486733e+03   2.319483e+01   0.000000e+00                                     
    1831      5.40  -1.486440e+03   0.000000e+00   0.000000e+00                                     

Optimal solution found.

obj =


Elapsed time is 6.533843 seconds.

Sometimes good old Simplex is not so bad. Or. may be. it is actually the presolve that makes a difference (I don't see in the log of  the interior point LP any mention of doing a presolve).

Note: newer versions of Matlab have different behavior (the default LP solver is now dual Simplex).


  1. Matlab Optimization Toolbox Release Notes.

Wednesday, April 11, 2018

FICO Xpress Mosel is now FREE and OPEN

Little bit of old news:

I think this looks a bit better than it really is. You need to hook up your favorite solver. Now. if the Xpress solver would also be open source, that would make a difference.

Sunday, April 8, 2018

K best solutions for an assignment problem

Men's K-2 1000 metres medalist at the 1960 Summer Olympics in Rome
The problem is simple:

Find the \(k\) best solutions for an assignment problem [1]

The answer can be complicated [2].

Assignment Problem

The assignment problem is somewhat special: even for large problems LP solvers can solve this problem very quickly. When writing your own version of the Hungarian algorithm, it is likely to be slower than a good LP solver.

A second observation is that complete enumeration of all possible solutions is not feasible. If \(n\) is the size of the problem (i.e., \(n\) = number source nodes = number of destination nodes), we have \(n!\) possible solutions. The next table always amazes me:


An \(n=100\) assignment problem is considered small, but this number \(100!\) should give you pause.

Lo-Tech Approach

The simplest approach is to add a cut to forbid a previously found solution, and solve again. If  we record optimal solutions of round \(k\) in \(\alpha_{k,i,j} := x^*_{i,j}\), then this cut can be written as:\[\sum_{i,j} \alpha_{k,i,j} x_{i,j} \le n-1\] I.e. we need to solve MIP models of the form\[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align} \min&\sum_{i,j} c_{i,j} x_{i,j}\\&\sum_j x_{i,j}=1 &\forall i\\&\sum_i x_{i,j}=1 &\forall j\\&\sum_{i,j} \alpha_{k,i,j} x_{i,j} \le n-1 &\forall k\\&x_{i,j}\in \{0,1\}\end{align}}\]Note that we no longer can assume we can solve this as an LP as we destroyed the network structure by adding the cuts.

For a problem with say \(k=3\) and \(n=100\) this looks like a perfectly suitable approach. I do minimization and maximization so we get two observations. I see:

Min/MaxObjectiveIterationsNodesTotal Time (GAMS+Solver)
1.7 seconds
1.8 seconds

These models solve easily (especially after being scared to death by this number \(100!\)).

There is a number of advantages using this approach:

  • It is very simple, and works with any MIP solver or modeling system.
  • The cuts are linear and don't add any variables to the model. A single cut is needed to exclude a single solution.
  • The problems can be solved with full presolving (some other methods require reducing presolve levels). Some problems are heavily affected by this.

Hi-Tech Approach

Some solvers have built-in a tool called the "solution pool". This allows us to do this in one swoop. There is some heavy machinery behind this tool [3]. The results with Gurobi are:

Min/MaxObjectiveIterationsNodesTotal Time (GAMS+Solver)
580926916.2 seconds
736900414.0 seconds


  • Interestingly our lo-tech approach outperforms the solution pool by a large margin: lo-tech is almost 10 times as fast. Of course, for difficult MIP models this will likely not be the case.
  • The Gurobi pool settings were: PoolSearchMode=2 (k-best), PoolSolutions=3.
  • I believe the Cplex implementation of the solution pool does not allow to search for the \(k\) best solutions [2]. The pool settings SolnPoolReplace=1, SolnPoolIntensity=4, SolnPoolCapacity=3 look promising, but they don't give the three best solutions.
  • Xpress also has a solution pool. It supports finding the \(k\) best solutions, although some real effort from the modeler is needed (need to turn off a bunch of heuristic and presolve options [4]).  

To be complete, I want to mention: there are specialized implementations for the \(k\)-best assignment problem (e.g. [5,6]).


  1. JGraphT: getting ranked solutions from e.g. KuhnMunkresMinimalWeightBipartitePerfectMatching,
  2. Paul Rubin, K Best Solutions,
  3. Danna E., Fenelon M., Gu Z., Wunderling R. (2007) Generating Multiple Solutions for Mixed Integer Programming Problems. In: Fischetti M., Williamson D.P. (eds) Integer Programming and Combinatorial Optimization. IPCO 2007. Lecture Notes in Computer Science, vol 4513. Springer, Berlin, Heidelberg
  4. FICO, MIP Solution Pool, Reference Manual, Release 31.01,
  5. C.R. Pedersen, L.R. Nielsen, K.A. Andersen, An algorithm for ranking assignments using reoptimization, Computers & Operations Research, 35 (11) (2008), pp. 3714-3726
  6. M.L. Miller, H.S. Stone, I.J. Cox, Optimizing Murty's ranked assignment method, IEEE Transactions on Aerospace and Electronic Systems, 33 (1997), pp. 851-862

Friday, April 6, 2018

Python constraint solver

In [1] a small problem is stated:

  • We need to assign 7 devices to 3 locations
  • The maximum number of devices assigned to a location is: loc1:3, loc2:4, loc3:2
  • Not all assignments are allowed. The allowed assignments are:
       loc1: device1, device3, device7,
       loc2: device1, device3, device4, device5, device6, device7
       loc3: device2, device4, device5, device6
  • Find all possible solutions
There is no objective and we need to enumerate all solutions. It makes sense to look at a Constraint Programming (CP) solver. A simple one is Python-constraints [2].

This looks like an assignment problem so we can define the decision variable: \[x_{L,d} = \begin{cases} 1 & \text{if device $d$ is assigned to location $L$} \\ 0 & \text{otherwise}\end{cases}\]

There are two main approaches to deal with the assignments that are not allowed.

  1. Generate a full matrix \(x_{L,d}\) and add constraints \(x_{L,d}=0\) if an assignment is not allowed.
  2. Generate only the variables \(x_{L,d}\) that are allowed.
I am partial to the second approach. With modern solvers there may not be much reason for this, but I have the feeling I know more what I am doing if I pay attention to such things. It forces me to be less sloppy.

The constraints are not very difficult:\[\begin{align} &\sum_{L|\mathit{allowed}(L,d)} x_{L,d} = 1 &\forall d\\ & \sum_{d|\mathit{allowed}(L,d)} x_{L,d} \le \mathit{MaxDev}_L& \forall L \end{align}\]

The Python code can look like:

from constraint import *

D = 7 # number of devices
L = 3 # number of locations

maxdev = [3,4,2]
allowed = [[1,3,7],[1,3,4,5,6,7],[2,4,5,6]]

problem = Problem()
problem.addVariables(["x_L%d_d%d" %(loc+1,d+1) for loc in range(L) for d in range(D) if d+1 in allowed[loc]],[0,1])
for loc in range(L):
    problem.addConstraint(MaxSumConstraint(maxdev[loc]),["x_L%d_d%d" %(loc+1,d+1) for d in range(D) if d+1 in allowed[loc]])
for d in range(D):
    problem.addConstraint(ExactSumConstraint(1),["x_L%d_d%d" %(loc+1,d+1) for loc in range(L) if d+1 in allowed[loc]])

S = problem.getSolutions()
n = len(S)

As you can see, I use the allowed list everywhere to restrict the tuples \((L,d)\). This solves very fast. There are 25 solutions for this small problem:


  • If you want to solve this as a MIP use a dummy objective, and preferably use the solution pool technology available in solvers like Cplex and Gurobi. This is also very fast. The picture above was from a GAMS/Cplex run.
  • I expect the number of the solutions to quickly rise when the problem becomes bigger.
  • The Python code above assume short lists of allowed devices in each location. If these lists become long, it is better to use a different data structure such as dicts.
  • I don't think the Python code is very easy to read. I don't immediately recognize our mathematical constraints in the code.


  1. Python constraints - constraining the amount,
  2. Python-constraint, Constraint Solving Problem resolver for Python,

Speed Dating Scheduling

Speed Networking at Cité Internationale Universitaire de Paris [source]

In [1] a question is posed about a speed dating like problem. Here is my version of this:
There are 140 people. Each of them has stated some preference (or dis-preference) to meet specific persons. There are 5 rounds. Create a schedule such that the total sum of preferences is maximized. 

The obvious encoding would be:\[x_{i,j,t} = \begin{cases} 1 & \text{if persons $i$ and $j$ meet in round $t$}\\ 0 & \text{otherwise}\end{cases}\] Obviously, we have \(i\ne j\). This will give rise to a very large number of variables: \(140\times 139 \times 5= 97,300\), so we want to be as stingy as possible. One way is to exploit symmetry: if person \(i\) meets person \(j\) we don't want to use both \(x_{i,j,t}\) and \(x_{j,i,t}\). Let's try to make our restriction \(i\ne j\) tighter: we only look at the lower-triangular part: \(i > j\). This will save about half of the number of variables. However, this also means that summations become a bit more complicated. E.g. \(\sum_j x_{i,j,t}\) now has to become two summations:

The model can look like:

Prefs is just a parameter, and can contain positive and negative values (or zero for "indifferent"). For 140 persons and 5 rounds we get a very large MIP problem:


BLOCKS OF EQUATIONS           3     SINGLE EQUATIONS       10,431
BLOCKS OF VARIABLES           2     SINGLE VARIABLES       48,651
NON ZERO ELEMENTS       155,086     DISCRETE VARIABLES     48,650

Usually this means: forget about this. However, this is an easy MIP, it solves very fast: Cplex Time: 40.64sec. The optimal solution was found while doing preprocessing and generating cuts in node 0.

Of course in a real model we may have additional considerations: are there big winners or losers in this assignment? Do we want more equitable solutions?

Although this was a simple model, cooked up quickly, there are two interesting observations:

  • Exploit symmetry, at the expense of making the constraints more complicated
  • Some very large MIP models solve very quickly.


  1. Excel: Matching names based on a matrix of options (Employees provide a list of people they want to meet with),

Wednesday, April 4, 2018

SOCP reformulations of a min distance problem

In [1] I discussed a geometric problem: find a point (in 2d or 3d space) as close as possible to a number of given lines. In the comments it was suggested to write this as a Second Order Cone Programming (SOCP) model. Let's give that a try.

The lines \(L_j\) are defined by two points \(p_{j,a}\) and \(p_{j,b}\).

Minimize sum of distances to set of lines in 2d

The min sum of distances model was formulated as a general non-linear programming (NLP) model:\[\begin{align}\min&\sum_j d_j\\ &d^2_j = \frac{|p_{j,a}-q|^2 |p_{j,b}-p_{j,a}|^2 - \left[(p_{j,a}-q)\cdot (p_{j,b}-p_{j,a}) \right]^2}{|p_{j,b}-p_{j,a}|^2}\end{align}\] Somewhat simplified: \[\begin{align}\min&\sum_j d_j\\ &d^2_j = \frac{|\alpha_j-q|^2 |\beta_j|^2 - \left[(\alpha_j-q)\cdot \beta_j \right]^2}{|\beta_j|^2}\end{align}\] or as an unconstrained problem: \[\min_q\>\sum_j \sqrt{\frac{|\alpha_j-q|^2 |\beta_j|^2 - \left[(p_{j,a}-q)\cdot \beta_j \right]^2}{|\beta_j|^2}}\]

Minimizing the sum of distances in SOCP models can look like:\[\begin{align}\min&\sum_j x_j\\& x_j^2 \ge \sum_i y_{j,i}^2\\& x_j \ge 0\end{align}\]

We need to shoehorn our problem into this format. Let's give this a try. Any point \(\gamma_j\) on a line \(L_j\) can be written as:\[\begin{align}\gamma_j &= (1-t_j)p_{j,a} + t_j  p_{j,b}\\ &= p_{j,a} + t_j (p_{j,b}-p_{j,a})\\ & = \alpha_{j} + t _j\beta_j\end{align}\] The distance between \(q\) and \(\gamma_j\) is obviously \(||q-\gamma_j||_2\) or \(||\delta_j||_2\) where \( \delta_j = q-\gamma_j\).

We are getting close. Let's denote the coordinates of \(\delta_j\) as \(\delta_{j,c}\) where \(c=\{x,y,z\}\) (or \(c=\{x,y\}\) for a 2D problem). We can finally write: \[\begin{align} \min_{q,d,\delta,t}& \sum_j d_j\\ & \delta_j = (\alpha_j + t_j \beta_j) - q\\ &\sum_c \delta^2_{j,c} \le d_j^2\\ &d_j \ge 0\end{align}\] This does not look so bad. We have introduced a whole slew of extra variables, but our efforts will be rewarded. We can solve this model with solvers like Cplex, Gurobi, Mosek and Xpress without resorting to a general nonlinear solver.

Unfortunately when trying this with GAMS/Cplex, I got:

CPLEX Error  5002: 'QCP_row_for_dist(line1)' is not convex.  

This is a longstanding, known bug (many years, ough!) in the standard GAMS/Cplex link. There is an alternative, experimental link called cplexd which has some of these bugs fixed: use option qcp=cplexd; to make this work.

Ciritique (a.k.a. rant)

Formulating proper SOCP models can be somewhat of a pain. We are giving up a more natural formulation just to accommodate the solver. I.e., we are moving away from the problem to be closer to the solver. That looks to me like the wrong direction. As an extra complication: the precise rules for solvers to accept SOCP models are not uniform (some solvers are more strict than others). Solvers may do some easy reformulations. E.g. convex QP models may be reformulated automatically into SOCP models behind the scenes. In general, however, it is not so easy for a software system (either the modeling system or the solver) to recognize natural NLP formulations and automatically reformulate into acceptable SOCP models. Which unfortunately means that the task of devising (sometimes non-obvious) reformulations is off-loaded to the modeler. (Actually AMPL has facilities to do a lot of these transformations [4]).

The example above is not so illustrative, but [4] has a better one: \[\min \sqrt{(x+2)^2+(y+1)^2} + \sqrt{(x+y)^2} \>\Longleftrightarrow \begin{align}\min\>&u+v\\&r^2+s^2\le u^2\\&t^2\le v^2\\& x+2=r \\&y+1=s\\&x+y=t\\ & u,v\ge 0 \end{align} \]

It would be great if modeling systems could become smarter, and can reformulate standard mathematical constructs into conic formulations. One would need access to a symbolic representation of the model. I.e., when the solver sees the model, it is probably already too late to do anything but the most basic reformulations. With the advent of more exotic features such as the exponential cone this problem will only become more important.

Likely, I am somewhat in the minority here with my concerns. I believe many developers and academics are of the opinion that SOCP is the best thing since sliced bread (may be after semi-definite programming). I suspect that the obscurity of SOCP formulations is adding some mystique to the trade. However, when I develop a model, I want to have it as mystique-less as possible. Often the main goal of a model is stated as a bridge between the problem and the algorithm (or from human to machine). May be an even more important role is as communication tool between people. Clarity and simplicity is then paramount.

Another intriguing point is that we have to unlearn things we have gotten used to when solving LPs and NLPs. Here is an example. If we know some of the optimal values of a problem, we can fix some variables ahead of time. In general this is a good approach: it will make the model smaller and simpler to solve. Especially with smart presolvers: once a variable is fixed a lot of other variables may be removed from the model (something like a cascading effect). So an interesting experiment would be:

I.e.: solve the model, fix the \(d_i\) variables to their optimal values and solve again. We would normally expect the second model to be easier to solve. Well, not so much with conic solvers: they have problems with the second "easy" solve. E.g. one can see things like:

Interior-point solution summary
  Problem status  : ILL_POSED
  Solution status : PRIMAL_ILLPOSED_CER
  Dual.    obj: 2.0486145225e-08    nrm: 2e+02    Viol.  con: 0e+00    var: 1e-13    cones: 0e+00  

Return code - 0 [MSK_RES_OK]: No error occurred.

The message No error occurred is somewhat optimistic: the solver did not return a solution.

With a different solver:

Barrier method finished in 1 seconds
Problem is infeasible after     11 iterations
Uncrunching matrix
   Its         Obj Value      S   Ninf  Nneg        Sum Inf  Time
     0        366.241496      B    999     0       9.758622     1
Problem is infeasible
Presolved QP is infeasible
No solution will be returned
QP is infeasible

It is noted that this happens only for some data sets. For other data these solvers get things right.

Of course, we should be happy with these new solver capabilities. Some models solve extremely fast when stated as a SOCP, and we should take whatever performance and reliability gains we can get.


  1. Geometry lesson: find point closest to set of lines,
  2. Mosek Modeling Cookbook, This has a good overview of SOCP formulation tricks.
  3. DCP, Disciplined Convex Programming, Modeling tools based on DCP can help to formulate proper SOCP models. 
  4. Victor Zverovich, Robert Fourer, Automatic Reformulation of Second-Order Cone Programming Problems,

Appendix: An example of a picky solver

The reason Cplex complains, is that GAMS generates the following ugly model:

\Problem name: gamsmodel

 _obj: d(line1) + d(line2) + d(line3) + z
Subject To
 _eline(line1.x)#0: 0.757256448 t(line1) + q(x) + delta(line1.x)
                     = -0.656505736
 _eline(line1.y)#1: - 1.084257608 t(line1) + q(y) + delta(line1.y)
                     = 0.686533416
 _eline(line2.x)#2: 0.115236774 t(line2) + q(x) + delta(line2.x)
                     = -0.415575766
 _eline(line2.y)#3: 1.26443496 t(line2) + q(y) + delta(line2.y)  = -0.551894266
 _eline(line3.x)#4: 1.862007808 t(line3) + q(x) + delta(line3.x)
                     = -0.865772554
 _eline(line3.y)#5: 0.157045418 t(line3) + q(y) + delta(line3.y)
                     = 0.000421337999999993
 dist(line1)#6:     - linear_part_of_dist(line1)  = 0
 dist(line2)#7:     - linear_part_of_dist(line2)  = 0
 dist(line3)#8:     - linear_part_of_dist(line3)  = 0
 obj#9:             z  = 0
 QCP_row_for_dist(line1): linear_part_of_dist(line1) + [ delta(line1.x) ^2
                          + delta(line1.y) ^2 - d(line1) ^2 ] <= 0
 QCP_row_for_dist(line2): linear_part_of_dist(line2) + [ delta(line2.x) ^2
                          + delta(line2.y) ^2 - d(line2) ^2 ] <= 0
 QCP_row_for_dist(line3): linear_part_of_dist(line3) + [ delta(line3.x) ^2
                          + delta(line3.y) ^2 - d(line3) ^2 ] <= 0
      t(line1) Free
      t(line2) Free
      t(line3) Free
      q(x) Free
      q(y) Free
      delta(line1.x) Free
      delta(line1.y) Free
      delta(line2.x) Free
      delta(line2.y) Free
      delta(line3.x) Free
      delta(line3.y) Free
      z Free
      linear_part_of_dist(line1) Free
      linear_part_of_dist(line2) Free
      linear_part_of_dist(line3) Free

This model leads to CPLEX Error  5002: 'QCP_row_for_dist(line1)' is not convex. If we clean this up a bit by hand, we get:

\Problem name: gamsmodel

 _obj: d(line1) + d(line2) + d(line3)
Subject To
 _eline(line1.x)#0: 0.757256448 t(line1) + q(x) + delta(line1.x)
                     = -0.656505736
 _eline(line1.y)#1: - 1.084257608 t(line1) + q(y) + delta(line1.y)
                     = 0.686533416
 _eline(line2.x)#2: 0.115236774 t(line2) + q(x) + delta(line2.x)
                     = -0.415575766
 _eline(line2.y)#3: 1.26443496 t(line2) + q(y) + delta(line2.y)  = -0.551894266
 _eline(line3.x)#4: 1.862007808 t(line3) + q(x) + delta(line3.x)
                     = -0.865772554
 _eline(line3.y)#5: 0.157045418 t(line3) + q(y) + delta(line3.y)
                     = 0.000421337999999993
 QCP_row_for_dist(line1): [ delta(line1.x) ^2
                          + delta(line1.y) ^2 - d(line1) ^2 ] <= 0
 QCP_row_for_dist(line2): [ delta(line2.x) ^2
                          + delta(line2.y) ^2 - d(line2) ^2 ] <= 0
 QCP_row_for_dist(line3): [ delta(line3.x) ^2
                          + delta(line3.y) ^2 - d(line3) ^2 ] <= 0
      t(line1) Free
      t(line2) Free
      t(line3) Free
      q(x) Free
      q(y) Free
      delta(line1.x) Free
      delta(line1.y) Free
      delta(line2.x) Free
      delta(line2.y) Free
      delta(line3.x) Free
      delta(line3.y) Free

Cplex is happy with this model. We see: Barrier - Optimal:  Objective =  1.2652779019e-01.

Tuesday, April 3, 2018

Geometry Lesson: find point closest to set of lines

In [1] a simple optimization problem is posted:

Find a single point \(q\) in 3d space that is closest to a number of given lines. Each line \(L_j\) is defined by two points \(p_{j,a}\) and \(p_{j,b}\).

This looks like a problem for which specialized algorithms have been developed. Let's just ignore that and see if we can use standard mathematical programming models to tackle this. My geometry is a bit rusty, but the necessary formulas can be found in [2]. To recap, the squared distance \(d^2\) between a point \(q\) and a single line through \(p_a\) and \(p_b\) is given by:\[d^2=\frac{|p_a-q|^2 |p_b-p_a|^2 - \left[(p_a-q)\cdot (p_b-p_a) \right]^2}{|p_b-p_a|^2}\] where \(\cdot\) indicates the dot product.

Actually just finding the shortest distance between a given line and a given point can be interpreted as an optimization problem:

Shortest distance between a line and a point

We have multiple lines and the position of the point is not given. An optimization model to find the best place for our point \(q\) can be formulated as:
\[\min_q\>\sum_j \frac{|p_{j,a}-q|^2 |p_{j,b}-p_{j,a}|^2 - \left[(p_{j,a}-q)\cdot (p_{j,b}-p_{j,a}) \right]^2}{|p_{j,b}-p_{j,a}|^2}\] After simplifying some constants we have: \[\min_q\>\sum_j \frac{|\alpha_j-q|^2 |\beta_j|^2 - \left[(\alpha_j-q)\cdot \beta_j \right]^2}{|\beta_j|^2}\] This thing is convex, so it is somewhat easy to solve as a QP (Quadratic Programming) problem.

2D example: Minimize sum of squared distances

Actually I cheated a bit: I minimized the sum of the squared distances. When really minimizing the sum of the distances we need to minimize: \[\min_q\>\sum_j \sqrt{\frac{|\alpha_j-q|^2 |\beta_j|^2 - \left[(\alpha_j-q)\cdot \beta_j \right]^2}{|\beta_j|^2}}\] This can be solved as an NLP (Nonlinear Programming) model. Of course the objective with the quadratic distances will put more weight on the larger distances.

How do results compare?  For a 3D problem with 10 random lines I see:

----     54 PARAMETER results  

                       x           y           z         obj

squared dist       0.015      -0.069      -0.282       4.087
dist               0.050      -0.142      -0.413       5.634

Here the \(x,y,z\) values are the coordinates of \(q\). The column marked obj indicates the objective function value. In our 2D example we can see the differences more clearly:

Minimization of sum of distances

The differences are a little bit more pronounced than I would have predicted.

Minimax Model

A different objective can be formed by minimizing the maximum distance:\[\begin{align} \min\> & z\\ &z \ge d_i\end{align}\] We get the same solution if we minimize the maximum of the squared distance: \[\begin{align} \min\> & z\\ &z \ge d^2_i\end{align}\] This means we have a convex quadratic model. This model puts all the emphasis on the largest distance and thus will shift our location to the right:

Minimize maximum of distances

A 3d visualization would be interesting.

SOCP Model

In the comments it is suggested to use a Second Order Cone Programming (SOCP) model. This would allow us to solve the minimum sum of distance problem using solvers like Cplex, Gurobi, Mosek and Xpress. There is no need to resort to a general purpose NLP solver. In [3] I have some results.


  1. Non-linear Optimization about Point to Lines,
  2. Eric Weisstein, Point-Line Distance -- 3 Dimensional, 
  3. SOCP reformulations of a min distance problem,