Saturday, June 9, 2018

Sparsest solution: a difficult MIP

The problem of finding a solution vector that is as sparse as possible can be formulated as a MIP.

I was looking at solving an underdetermined  problem \[Ax=b\] i.e. the number of rows \(m\) is less than the number of colums \(n\). A solution \(x\) has in general \(m\) nonzero elements. 

The actual system was: \[Ax\approx b\] or to be more precise \[-\epsilon \le Ax-b\le \epsilon \] By adding slack variables \(s_i\) we can write this as: \[\begin{align}&Ax=b+s\\&-\epsilon\le s \le \epsilon\end{align} \] Now we have more wiggle room, and can try to find the vector \(x\) that has as many zero elements as possible.

Big-M formulation

The MIP model seems obvious: \[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align} \min & \sum_j \delta_j \\ & Ax=b+s \\ & -M\delta_j \le x_j \le M\delta_j \\ & s_i \in [-\epsilon,\epsilon]\\ & \delta_j \in \{0,1\}\end{align}}\] This turns out to be a very problematic formulation. First we have no good a priori value for \(M\). I used \(M=10,000\) but that gives some real issues. Furthermore the performance is not good. For a very small problem with \(m=20, n=40\) we already see a solution time of about 20 minutes. With just 40 binary variables I expected something like a minute or two. But that is only a minor part of the problem. The results with Cplex look like:

----     82 VARIABLE x.L  

j3  -0.026,    j4   0.576,    j7   0.638,    j11  0.040,    j12 -0.747,    j14  0.039,    j15 -0.169,    j19 -0.088
j23  0.509,    j31 -0.475,    j34 -0.750,    j35 -0.509

----     82 VARIABLE delta.L  

j3  2.602265E-6,    j4        1.000,    j7        1.000,    j11 4.012925E-6,    j12       1.000,    j14 3.921840E-6
j15       1.000,    j19 8.834778E-6,    j23       1.000,    j31       1.000,    j34       1.000,    j35       1.000

----     82 PARAMETER statistics  

m               20.000,    n               40.000,    epsilon          0.050,    big M        10000.000
iterations 5.443661E+7,    nodes      5990913.000,    seconds       1004.422,    nz(x)           12.000
obj              8.000,    gap                EPS

Well, we have some problems here. The optimal objective is reported as 8, so we have 8 \(\delta_j\)'s that are equal to one. But at the same time the number of nonzero values in \(x\) is 12.  This does not match. The reason is: we have a few \(\delta_j\)'s that are very small (approx. \(10^{-6}\)). Cplex considers these small values as being zero, while the model sees opportunities to exploit these small values for what is sometimes called "trickle flow". These results are just not very reliable. We cannot just ignore the small \(\delta_j\)'s and conclude that the optimal number of non-zero elements in \(x\) is 8 (we will see later it is 11). The underlying reason is a relatively large value for \(M\) combined with a Cplex integer feasibility tolerance that is large enough to create leaks.

There a few things we can do to repair this:

  • Reduce the value of \(M\), e.g. reduce it from \(M=10,000\) to \(M=1,000\). 
  • Tighten the integer feasibility tolerance. Cplex has a default tolerance epint=1.0e-5 (it allows it to be set to zero) .
  • Use SOS1 sets instead of big-M's. Special Ordered Sets of Type 1 allow us to model the problem in a slightly different way than with binary variables. 
  • Use indicator constraints instead of big-M's. Indicator constraints can implement the implication \(\delta_j=0 \Rightarrow x_j=0\) directly. 

SOS1 formulation

We can get rid of the big-M problem by using SOS1 variables: \[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align} \max & \sum_j y_j \\ & Ax=b+s \\ & (x_j,y_j) \in \text{SOS1} \\ & s_i \in [-\epsilon,\epsilon]\\ & y_j \in [0,1]\end{align}}\] The SOS1 sets say: \(x_j = 0\) or \(y_j=0\) (or both), or in different words: at most one of \(x_j,y_j\) can be non-zero. The objective tries to make as many elements of \(y\) equal to one. The corresponding elements of \(x\) will be zero, making the \(x\) vector sparser. This SOS1 approach is more reliable but can be slower. When we try this on our small problem, we see:

----    119 VARIABLE x.L  

j4   0.601,    j7   0.650,    j11  0.072,    j12 -0.780,    j15 -0.170,    j19 -0.117,    j23  0.515,    j31 -0.483
j34 -0.750,    j35 -0.528,    j36 -0.031

----    119 PARAMETER statistics  

m               20.000,    n               40.000,    epsilon          0.050,    iterations 4.787142E+7
nodes      1.229474E+7,    seconds       3600.156,    nz(x)           11.000,    obj             29.000
gap              0.138

This model could not be solved to optimality within one hour! We stopped on a time limit of 3600 seconds. Remember, we only have 40 discrete structures in this model. The gap is still 13.8% after running for an hour. Otherwise, the results make sense: the objective of 29 corresponds to 11 nonzero values in \(x\).

Indicator constraint formulation

Finally, a formulation using indicator constraints can look like: \[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align} \min & \sum_j \delta_j \\ & Ax=b+s \\ & \delta_j = 0 \Rightarrow x_j=0 \\ & s_i \in [-\epsilon,\epsilon]\\ & \delta_j \in \{0,1\}\end{align}}\] This formulation does not use a big-M coefficient either, and the results are consistent.

Performance-wise, this does not help much:

----    178 VARIABLE x.L  

j1   0.158,    j2  -0.459,    j13 -0.155,    j14  0.470,    j17 -0.490,    j22  0.269,    j23  0.211,    j32  1.164
j33  0.147,    j38 -0.604,    j39 -0.563

----    178 VARIABLE delta.L  

j1  1.000,    j2  1.000,    j13 1.000,    j14 1.000,    j17 1.000,    j22 1.000,    j23 1.000,    j32 1.000
j33 1.000,    j38 1.000,    j39 1.000

----    178 PARAMETER statistics  

m               20.000,    n               40.000,    epsilon          0.050,    iterations 7.127289E+7
nodes      1.523042E+7,    seconds       3600.203,    nz(x)           11.000,    obj             11.000
gap              0.273

We are hitting our time limit. The gap is 27.3%.

Optimal solution

The optimal solution is indeed 11 nonzero elements in \(x\). It took me 4 hours to prove this. Here are the MIP bounds:

The optimal solution was found quite early but proving optimality took a long time. The final jump in the best bound (red line : bound on best possible integer solution) can be explained as follows. The objective (blue line) can only assume integer variables, so it jumps by one. You can see this happening early on when jumping from 12 to 11. This also means we are optimal as soon as the best bound (red line) reaches a value larger than 10. At that point we know 11 is the optimal objective value.

NB. To be complete: this run used a big-M formulation with a reduced value of \(M=1,000\) and an integer feasibility tolerance (option epint) of zero. Note that such a value is really providing us with a "paranoia mode" and may result in somewhat longer solution times. Not all solvers allow a integer feasibility tolerance of zero. In addition, I used the option mipemphasis=2 to tell Cplex we want optimality.


Here is a very small MIP model with only 40 binary variables that is just very difficult to solve to optimality. Although we can find the optimal solution relatively quickly, proving optimality proofs to be very challenging. This is not the usual case: for most models with 40 binary variables we can expect very quick turnaround. This model is unusually difficult.

In addition, this models illustrates nicely the dangers of big-M formulations. Even for modest values, like \(M=10,000\), we can be in real trouble.

This is good example that should give us some pause.