Sunday, June 17, 2018

select weights to maximize count

In [1] a problem that is simple at first sight. Looking a bit further there are some interesting angles.

The example data set is:

----     27 PARAMETER a  

            j1          j2          j3

i1       0.870       0.730       0.410
i2       0.820       0.730       0.850
i3       0.820       0.370       0.850
i4       0.580       0.950       0.420
i5       1.000       1.000       0.900

The idea is that we can apply weights \(w_j\) to calculate a final score for each row:\[F_i = \sum_j w_j a_{i,j}\] Weights obey the usual constraints: \(w_j \in [0,1]\) and \(\sum_j w_j=1\). The goal is to find optimal weights such that the number of records with final score in the bucket \([0.9, 1]\) is maximized.

Looking at the data, \(w=(0,1,0)\) is a good choice. This gives us two \(F_i\)'s in our bucket,  Let's see if we can formalize this with a model.

MIP Model

Counting is done with binary variables. So let's define\[\delta_i = \begin{cases} 1 & \text{if $L\le F_i\le U$}\\ 0 & \text{otherwise}\end{cases}\] I used \(L\) and \(U\) to indicate the bucket.

A first model can look like:\[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align} \max & \sum_i \delta_i \\  & F_i = \sum_j w_j a_{i,j}\\& \sum_i w_i = 1\\ & L - M(1-\delta_i) \le F_i \le U+M(1-\delta_i)\\ & \delta_i \in \{0,1\} \\ & w_i \in [0,1]\end{align}}\] Here \(M\) is a large enough constant.

The sandwich equation models the implication \[\delta_i=1 \Rightarrow L\le F_i \le U\] The objective will make sure that \(L\le F_i \le U \Rightarrow \delta_i=1 \) holds for the optimal solution.

This is the big picture. This model works: it finds an optimal set of weights so that as many as possible \(F_i \in [L,U]\). In the rest of this post I dive into to some nitty-gritty details. No elegant math, just largely boring stuff. However, this is typically what we need to do when working on non-trivial MIP models.


The data seems to suggest \(0 \le a_{i,j} \le 1\). Which means \(0 \le F_i \le 1\). This follows from \[\min_j a_{i,j} \le F_i \le \min_j a_{i,j}\] We can also assume \(0 \le L \le U \le 1\). We can conclude the largest difference possible between \(F_i\) and \(L\) (and \(F_i\) and \(U\)) is one. So, in this case an obvious value for \(M\) is \(M=1\). More generally, if we first make sure \(U \le \max\{a_{i,j}\}\) and \(L \ge \min\{a_{i,j}\}\) by the preprocessing step: \[\begin{align} & U := \min\{U, \max\{a_{i,j}\}\}\\ & L := \max\{L, \min\{a_{i,j}\}\}\end{align}\] we have: \[M = \max \{a_{i,j}\} - \min \{a_{i,j}\}\] We can do even better by using an \(M_i\) for each record instead of a single, global \(M\).  We will get back to this later.

More preprocessing

We can optimize things further by observing that not always both "branches" of \[L - M(1-\delta_i) \le F_i \le U+M(1-\delta_i)\] are needed. With our small example we have \(U=1\), but we know already that \(F_i \le 1\). So in this case we only need to worry about \(L - M(1-\delta_i) \le F_i\).

We can generalize this as follows. First calculate bounds \(\ell_i \le F_i\le u_i\): \[\begin{align} & \ell_i = \min_j a_{i,j}\\ & u_i = \max_j a_{i,j}\end{align}\] Then generate constraints: \[\begin{align} & L - M(1-\delta_i) \le F_i && \forall i | L > \ell_i\\ & F_i \le U+M(1-\delta_i) && \forall i | U < u_i\end{align}\]

Even more preprocessing

The first three records in the example data set are really no candidates for having \(F_i \in [0.9, 1]\). The reason is that for those records we have \(u_i \lt L\). In general we can skip all records with \(u_i \lt L\) or \(\ell_i \gt U\). These records will never have \(\delta_i=1\).

Combining things

I extended the \(a\) matrix with following columns:

  • lo: \(\ell_i=\min_j a_{i,j}\).
  • up: \(u_i = \max_j a_{i,j}\).
  • cand: boolean, indicates if this row is a candidate. We check: \(u_i \ge L\) and \(\ell_i \le U\).
  • chkL: boolean, indicates if we need to check the left "branch". This is equal to one if \(\ell_i\lt L\).
  • chkR: boolean, indicates if we need to check the right "branch". This is equal to one if \(u_i\gt L\). No records have this equal to 1. 
The matrix now looks like:

----     41 PARAMETER a  

            j1          j2          j3          lo          up        cand        chkL

i1       0.870       0.730       0.410       0.410       0.870                   1.000
i2       0.820       0.730       0.850       0.730       0.850                   1.000
i3       0.820       0.370       0.850       0.370       0.850                   1.000
i4       0.580       0.950       0.420       0.420       0.950       1.000       1.000
i5       1.000       1.000       0.900       0.900       1.000       1.000

The last record is interesting. It has \(\mathit{chkL}=\mathit{chkR}=0\). This is correct: it will always allow \(\delta_{i5}=1\) no matter what weights we choose. Note that the constraints \(L - M(1-\delta_i) \le F_i\) are only generated in case both \(\mathit{cand}=\mathit{chkL}=1\).

For a small data set this all does not make much difference, but for large ones, we make the model much smaller.


The optimal weights for this small data set are not unique. Obviously I get the same optimal number of selected records:

----     71 VARIABLE w.L  weights

j1 0.135,    j2 0.865

----     71 VARIABLE f.L  final scores

i1 0.749,    i2 0.742,    i3 0.431,    i4 0.900,    i5 1.000

----     71 VARIABLE delta.L  selected

i4 1.000,    i5 1.000

----     71 VARIABLE z.L                   =        2.000  objective

Big-M revisited

I did not pay too much attention to my big-M's. I just used the calculation \(M = \max \{a_{i,j}\} - \min \{a_{i,j}\}\), which yielded:

----     46 PARAMETER M                    =        0.630  big-M

We can use a tailored \(M^L_i, M^U_i\) for each inequality. For the lower bounds our equation looks like \[L - M^L_i(1-\delta_i) \le F_i\] This means we have \(M^L_i \ge L - F_i\). We have bounds on \(F_i\), (we stored these in \(a_{i,up}\) and \(a_{i,lo}\)).  So we can set \(M^L_i =L - a_{i,lo}\). This gives:

----     78 PARAMETER ML  

i4 0.480

Similar for the U inequality. For our small example this is not so important, but for larger instances with a wider range of data this may be essential.

Larger problem

For a larger random problem:

----     43 PARAMETER a  data

             j1          j2          j3          lo          up        cand        chkL        chkU

i1        0.737       0.866       0.893       0.737       0.893                   1.000
i2        0.434       0.654       0.606       0.434       0.654                   1.000
i3        0.721       0.987       0.648       0.648       0.987       1.000       1.000
i4        0.800       0.618       0.869       0.618       0.869                   1.000
i5        0.668       0.703       1.177       0.668       1.177       1.000       1.000       1.000
i6        0.656       0.540       0.525       0.525       0.656                   1.000
i7        0.864       1.037       0.569       0.569       1.037       1.000       1.000       1.000
i8        0.942       1.003       0.655       0.655       1.003       1.000       1.000       1.000
i9        0.600       0.801       0.601       0.600       0.801                   1.000
i10       0.619       0.933       1.114       0.619       1.114       1.000       1.000       1.000
i11       1.243       0.675       0.756       0.675       1.243       1.000       1.000       1.000
i12       0.608       0.778       0.747       0.608       0.778                   1.000
i13       0.695       0.593       1.196       0.593       1.196       1.000       1.000       1.000
i14       0.965       1.001       0.650       0.650       1.001       1.000       1.000       1.000
i15       0.951       1.040       0.969       0.951       1.040       1.000                   1.000
i16       0.514       1.220       0.935       0.514       1.220       1.000       1.000       1.000
i17       0.834       1.130       1.054       0.834       1.130       1.000       1.000       1.000
i18       1.161       0.550       0.481       0.481       1.161       1.000       1.000       1.000
i19       0.638       0.640       0.691       0.638       0.691                   1.000
i20       1.057       0.724       0.657       0.657       1.057       1.000       1.000       1.000
i21       0.814       0.775       0.448       0.448       0.814                   1.000
i22       0.800       0.737       0.741       0.737       0.800                   1.000
i23       0.573       0.555       0.789       0.555       0.789                   1.000
i24       0.829       0.679       1.059       0.679       1.059       1.000       1.000       1.000
i25       0.761       0.956       0.687       0.687       0.956       1.000       1.000
i26       0.870       0.673       0.822       0.673       0.870                   1.000
i27       0.304       0.900       0.920       0.304       0.920       1.000       1.000
i28       0.544       0.659       0.495       0.495       0.659                   1.000
i29       0.755       0.589       0.520       0.520       0.755                   1.000
i30       0.492       0.617       0.681       0.492       0.681                   1.000
i31       0.657       0.767       0.634       0.634       0.767                   1.000
i32       0.752       0.684       0.596       0.596       0.752                   1.000
i33       0.616       0.846       0.803       0.616       0.846                   1.000
i34       0.677       1.098       1.132       0.677       1.132       1.000       1.000       1.000
i35       0.454       1.037       1.204       0.454       1.204       1.000       1.000       1.000
i36       0.815       0.605       0.890       0.605       0.890                   1.000
i37       0.814       0.587       0.939       0.587       0.939       1.000       1.000
i38       1.019       0.675       0.613       0.613       1.019       1.000       1.000       1.000
i39       0.547       0.946       0.843       0.547       0.946       1.000       1.000
i40       0.724       0.571       0.757       0.571       0.757                   1.000
i41       0.611       0.916       0.891       0.611       0.916       1.000       1.000
i42       0.680       0.624       1.111       0.624       1.111       1.000       1.000       1.000
i43       1.015       0.870       0.823       0.823       1.015       1.000       1.000       1.000
i44       0.587       0.866       0.691       0.587       0.866                   1.000
i45       0.789       1.090       0.649       0.649       1.090       1.000       1.000       1.000
i46       1.436       0.747       0.805       0.747       1.436       1.000       1.000       1.000
i47       0.791       0.885       0.723       0.723       0.885                   1.000
i48       0.718       1.028       0.869       0.718       1.028       1.000       1.000       1.000
i49       0.876       0.772       0.918       0.772       0.918       1.000       1.000
i50       0.498       0.882       0.599       0.498       0.882                   1.000

we see that we can remove a significant number of records and big-M constraints. The model solves instantaneous and shows:

----     74 VARIABLE w.L  weights

j1 0.007,    j2 0.792,    j3 0.202

----     74 VARIABLE f.L  final scores

i1  0.870,    i2  0.643,    i3  0.917,    i4  0.670,    i5  0.798,    i6  0.538,    i7  0.942,    i8  0.933
i9  0.760,    i10 0.967,    i11 0.695,    i12 0.771,    i13 0.715,    i14 0.930,    i15 1.025,    i16 1.158
i17 1.112,    i18 0.541,    i19 0.650,    i20 0.713,    i21 0.710,    i22 0.738,    i23 0.602,    i24 0.757
i25 0.900,    i26 0.705,    i27 0.900,    i28 0.625,    i29 0.576,    i30 0.629,    i31 0.739,    i32 0.667
i33 0.836,    i34 1.102,    i35 1.067,    i36 0.664,    i37 0.660,    i38 0.665,    i39 0.923,    i40 0.609
i41 0.909,    i42 0.722,    i43 0.861,    i44 0.829,    i45 0.999,    i46 0.764,    i47 0.851,    i48 0.994
i49 0.802,    i50 0.822

----     74 VARIABLE delta.L  selected

i3  1.000,    i7  1.000,    i8  1.000,    i10 1.000,    i14 1.000,    i15 1.000,    i16 1.000,    i17 1.000
i25 1.000,    i27 1.000,    i34 1.000,    i35 1.000,    i39 1.000,    i41 1.000,    i45 1.000,    i48 1.000

----     74 VARIABLE z.L                   =       16.000  objective


A simple model becomes not that simple once we start "optimizing" it. Unfortunately this is typical for large MIP models.


  1. Simulation/Optimization Package in R for tuning weights to achieve maximum allocation for groups,

Saturday, June 9, 2018

Sparsest solution: a difficult MIP

The problem of finding a solution vector that is as sparse as possible can be formulated as a MIP.

I was looking at solving an underdetermined  problem \[Ax=b\] i.e. the number of rows \(m\) is less than the number of colums \(n\). A solution \(x\) has in general \(m\) nonzero elements. 

The actual system was: \[Ax\approx b\] or to be more precise \[-\epsilon \le Ax-b\le \epsilon \] By adding slack variables \(s_i\) we can write this as: \[\begin{align}&Ax=b+s\\&-\epsilon\le s \le \epsilon\end{align} \] Now we have more wiggle room, and can try to find the vector \(x\) that has as many zero elements as possible.

Big-M formulation

The MIP model seems obvious: \[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align} \min & \sum_j \delta_j \\ & Ax=b+s \\ & -M\delta_j \le x_j \le M\delta_j \\ & s_i \in [-\epsilon,\epsilon]\\ & \delta_j \in \{0,1\}\end{align}}\] This turns out to be a very problematic formulation. First we have no good a priori value for \(M\). I used \(M=10,000\) but that gives some real issues. Furthermore the performance is not good. For a very small problem with \(m=20, n=40\) we already see a solution time of about 20 minutes. With just 40 binary variables I expected something like a minute or two. But that is only a minor part of the problem. The results with Cplex look like:

----     82 VARIABLE x.L  

j3  -0.026,    j4   0.576,    j7   0.638,    j11  0.040,    j12 -0.747,    j14  0.039,    j15 -0.169,    j19 -0.088
j23  0.509,    j31 -0.475,    j34 -0.750,    j35 -0.509

----     82 VARIABLE delta.L  

j3  2.602265E-6,    j4        1.000,    j7        1.000,    j11 4.012925E-6,    j12       1.000,    j14 3.921840E-6
j15       1.000,    j19 8.834778E-6,    j23       1.000,    j31       1.000,    j34       1.000,    j35       1.000

----     82 PARAMETER statistics  

m               20.000,    n               40.000,    epsilon          0.050,    big M        10000.000
iterations 5.443661E+7,    nodes      5990913.000,    seconds       1004.422,    nz(x)           12.000
obj              8.000,    gap                EPS

Well, we have some problems here. The optimal objective is reported as 8, so we have 8 \(\delta_j\)'s that are equal to one. But at the same time the number of nonzero values in \(x\) is 12.  This does not match. The reason is: we have a few \(\delta_j\)'s that are very small (approx. \(10^{-6}\)). Cplex considers these small values as being zero, while the model sees opportunities to exploit these small values for what is sometimes called "trickle flow". These results are just not very reliable. We cannot just ignore the small \(\delta_j\)'s and conclude that the optimal number of non-zero elements in \(x\) is 8 (we will see later it is 11). The underlying reason is a relatively large value for \(M\) combined with a Cplex integer feasibility tolerance that is large enough to create leaks.

There a few things we can do to repair this:

  • Reduce the value of \(M\), e.g. reduce it from \(M=10,000\) to \(M=1,000\). 
  • Tighten the integer feasibility tolerance. Cplex has a default tolerance epint=1.0e-5 (it allows it to be set to zero) .
  • Use SOS1 sets instead of big-M's. Special Ordered Sets of Type 1 allow us to model the problem in a slightly different way than with binary variables. 
  • Use indicator constraints instead of big-M's. Indicator constraints can implement the implication \(\delta_j=0 \Rightarrow x_j=0\) directly. 

SOS1 formulation

We can get rid of the big-M problem by using SOS1 variables: \[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align} \max & \sum_j y_j \\ & Ax=b+s \\ & (x_j,y_j) \in \text{SOS1} \\ & s_i \in [-\epsilon,\epsilon]\\ & y_j \in [0,1]\end{align}}\] The SOS1 sets say: \(x_j = 0\) or \(y_j=0\) (or both), or in different words: at most one of \(x_j,y_j\) can be non-zero. The objective tries to make as many elements of \(y\) equal to one. The corresponding elements of \(x\) will be zero, making the \(x\) vector sparser. This SOS1 approach is more reliable but can be slower. When we try this on our small problem, we see:

----    119 VARIABLE x.L  

j4   0.601,    j7   0.650,    j11  0.072,    j12 -0.780,    j15 -0.170,    j19 -0.117,    j23  0.515,    j31 -0.483
j34 -0.750,    j35 -0.528,    j36 -0.031

----    119 PARAMETER statistics  

m               20.000,    n               40.000,    epsilon          0.050,    iterations 4.787142E+7
nodes      1.229474E+7,    seconds       3600.156,    nz(x)           11.000,    obj             29.000
gap              0.138

This model could not be solved to optimality within one hour! We stopped on a time limit of 3600 seconds. Remember, we only have 40 discrete structures in this model. The gap is still 13.8% after running for an hour. Otherwise, the results make sense: the objective of 29 corresponds to 11 nonzero values in \(x\).

Indicator constraint formulation

Finally, a formulation using indicator constraints can look like: \[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align} \min & \sum_j \delta_j \\ & Ax=b+s \\ & \delta_j = 0 \Rightarrow x_j=0 \\ & s_i \in [-\epsilon,\epsilon]\\ & \delta_j \in \{0,1\}\end{align}}\] This formulation does not use a big-M coefficient either, and the results are consistent.

Performance-wise, this does not help much:

----    178 VARIABLE x.L  

j1   0.158,    j2  -0.459,    j13 -0.155,    j14  0.470,    j17 -0.490,    j22  0.269,    j23  0.211,    j32  1.164
j33  0.147,    j38 -0.604,    j39 -0.563

----    178 VARIABLE delta.L  

j1  1.000,    j2  1.000,    j13 1.000,    j14 1.000,    j17 1.000,    j22 1.000,    j23 1.000,    j32 1.000
j33 1.000,    j38 1.000,    j39 1.000

----    178 PARAMETER statistics  

m               20.000,    n               40.000,    epsilon          0.050,    iterations 7.127289E+7
nodes      1.523042E+7,    seconds       3600.203,    nz(x)           11.000,    obj             11.000
gap              0.273

We are hitting our time limit. The gap is 27.3%.

Optimal solution

The optimal solution is indeed 11 nonzero elements in \(x\). It took me 4 hours to prove this. Here are the MIP bounds:

The optimal solution was found quite early but proving optimality took a long time. The final jump in the best bound (red line : bound on best possible integer solution) can be explained as follows. The objective (blue line) can only assume integer variables, so it jumps by one. You can see this happening early on when jumping from 12 to 11. This also means we are optimal as soon as the best bound (red line) reaches a value larger than 10. At that point we know 11 is the optimal objective value.

NB. To be complete: this run used a big-M formulation with a reduced value of \(M=1,000\) and an integer feasibility tolerance (option epint) of zero. Note that such a value is really providing us with a "paranoia mode" and may result in somewhat longer solution times. Not all solvers allow a integer feasibility tolerance of zero. In addition, I used the option mipemphasis=2 to tell Cplex we want optimality.


Here is a very small MIP model with only 40 binary variables that is just very difficult to solve to optimality. Although we can find the optimal solution relatively quickly, proving optimality proofs to be very challenging. This is not the usual case: for most models with 40 binary variables we can expect very quick turnaround. This model is unusually difficult.

In addition, this models illustrates nicely the dangers of big-M formulations. Even for modest values, like \(M=10,000\), we can be in real trouble.

This is good example that should give us some pause. 

Friday, June 1, 2018

The real optimization problem

Dear Erwin,

Good Morning!!.

I am doing my Master's Degree in Computer Science.

Currently, I am doing some research on Facility Location Problem Optimization. I came across your answers on stackoverflow and your blog as well. Your findings are really helpful, I appreciate you for this. Really well structured and clearly understandable. Mathematical models are well explained.

I kindly request you, If you provide small implementation code in JAVA/Python it will be very helpful. So that I can understand the concept deeply.

Thanks for your time and consideration.

Kindest Regards,

This question is about [1]. I did not use Java or Python to solve the models discussed there. So my answer was easy: sorry.

However, I am a bit uncomfortable with questions like this. Is "research" really not more than just emailing and copy/paste from replies? Should a CS student doing his Master's thesis not be able to translate a detailed (but simple) mathematical model to working code? Would you not learn much more by actually implementing the model yourself?

Often, I find the process to reproduce results in a paper very good way to get a good understanding of the issues. I prefer a mathematical model, or pseudo code if an algorithm needs to be implemented. Just copy/paste is hardly ever a good idea.

I have the feeling the real underlying optimization problem is: minimize effort. Or to make it sound better: maximize efficiency.

I am probably a bit too harsh.


  1. Solving a facility location problem as an MIQCP,

Thursday, May 31, 2018

Knapsack + Packing: Combinatorial Benders Decomposition

In [1] a small, but difficult non-convex problem was described:

Given \(n\) circles with radius \(r_i\) and value \(v_i\) try to fill a rectangle of size \(W\times H\) with a subset of the circles (such that they don't overlap) and that maximizes the total value of the "payload".
The data looks like:

----     28 PARAMETER v  value

i1  4.237,    i2  4.163,    i3  2.183,    i4  2.351,    i5  6.302,    i6  8.478,    i7  3.077,    i8  6.992
i9  7.983,    i10 3.733,    i11 1.994,    i12 5.521,    i13 2.442,    i14 8.852,    i15 3.386,    i16 3.572
i17 6.346,    i18 7.504,    i19 6.654,    i20 5.174

----     28 PARAMETER r  radius

i1  1.273,    i2  4.295,    i3  2.977,    i4  1.855,    i5  1.815,    i6  1.508,    i7  2.074,    i8  4.353
i9  0.802,    i10 2.751,    i11 4.992,    i12 3.104,    i13 4.960,    i14 3.930,    i15 1.088,    i16 3.379
i17 1.218,    i18 1.625,    i19 3.510,    i20 2.459

----     28 PARAMETER a  area

i1   5.090,    i2  57.945,    i3  27.837,    i4  10.812,    i5  10.349,    i6   7.146,    i7  13.517,    i8  59.535
i9   2.021,    i10 23.775,    i11 78.274,    i12 30.275,    i13 77.291,    i14 48.525,    i15  3.720,    i16 35.864
i17  4.659,    i18  8.299,    i19 38.709,    i20 18.998

----     28 PARAMETER W                    =       15.000  
            PARAMETER H                    =       10.000  

The area was computed with the familiar formula: \(a_i = \pi r_i^2\). The mathematical model can look like:\[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align}\max\>&\sum_i v_i \delta_i\\ & (x_i-x_j)^2 + (y_i-y_j)^2 \ge (r_i+r_j)^2 \delta_i \delta_j & \forall i\lt j \\ & r_i \le x_i \le W - r_i\\ & r_i \le y_i \le H - r_i\\ &\delta_i \in \{0,1\}\end{align}}\] Here the binary variable \(\delta_i\) indicates whether we select circle \(i\) for inclusion in the container. The variables \(x_i\) and \(y_i\) determine the location of the selected circles.

This turns out to be a small but difficult problem to solve. The packing problem is a difficult non-convex problem in its own right and we added a knapsack problem on top of it.

In [1] we found the following solutions:

Local solver (BONMIN)58.57
Improved by global solver (LINDOGLOBAL)60.36

Combinatorial Benders Decomposition

In the comments of [1] a variant of Benders Decomposition called Combinatorial Benders Decomposition is suggested by Rob Pratt. Let's see if we can reproduce his results. I often find it useful to do such an exercise to fully understand and appreciate a proposed solution.

First, lets establish that the total number of solutions in terms of the binary knapsack variables \(\delta_i\) is: \(2^n\). For \(n=20\) we have: \[2^{20} = 1,048,576\] This is small for a MIP. In our case unfortunately we have this non-convex packing sub-problem to worry about.

One constraint that can be added is related to the area: the total area of the selected circles should be less than the total area of the container: \[\sum_i a_i \delta_i \le A\] where \(A=W \times H\). This constraint is not very useful to add to our MIQCP directly, but it will be used in the Benders approach. The number of  solutions for \(\delta_i\)  that are feasible w.r.t. the area constraint is only 60,145. The number of these solutions with a value larger than 60.3 is just 83.

To summarize:

Possible configurations for \(\delta_i\)1,048,576
Of which feasible w.r.t area constraint 60,145
Of which have a value better than 60.3 83

Now we have a number that is feasible to enumerate. We know the optimal objective value should be in this set of 83 problem that are better than 60.3. So a simple approach is just to enumerate solutions starting with the best.

We can give this scheme a more impressive name: Combinatorial Benders Decomposition[2]. The approach can be depicted as:

We add a cut in each iteration to the master. This cut \[\sum_{i\in S} (1-\delta_i)\ge 1\] where \(S = \{i|\delta_i=1\}\), will cut off exactly one solution. In theory it can cut off more solutions, but because of the order we generate proposals, just one solution is excluded each cycle.

The sub model is used only to see if the proposal is feasible w.r.t. to our rectangle. The first sub problem that is feasible gives us an optimal solution to the whole problem.  I implement the sub problem as a multistart NLP problem [3]. A slight rewrite of the sub problem gives a proper optimization problem:\[\begin{align} \max \>& \phi\\& (x_i-x_j)^2 + (y_i-y_j)^2 \ge \phi (r_i+r_j)^2&i\lt j\end{align}\] If the optimal value \(\phi^* \ge 1\) we are feasible w.r.t. to the original problem.

Of course, we could as well generate all possible configurations \(\delta_i\) that are feasible w.r.t. to the area constraint (there are 60,145 of them), order them by value (largest value first) and run through. Stop as soon as we find a feasible solution (i.e. where the packing NLP is feasible). No fancy name for this algorithm: "sorting" does not sound very sophisticated.

We can find a better solution with value 60.6134 using this scheme. Of course it is not easy. We need to solve 69 rather nasty non-convex NLPs to reach this solution.

Best Solution Found
Better than just solving the complete knapsack problem directly, but still not a panacea: this remains a small but surprisingly difficult problem to solve!


  1. Knapsack + packing: a difficult MIQCP,
  2. Codato G., Fischetti M. (2004) Combinatorial Benders’ Cuts. In: Bienstock D., Nemhauser G. (eds) Integer Programming and Combinatorial Optimization. IPCO 2004. Lecture Notes in Computer Science, vol 3064. Springer, Berlin, Heidelberg
  3. Multi-start Non-linear Programming: a poor man's global optimizer,

Friday, May 25, 2018

DIY Presolve

In certain models, we can choose some level of "presolve" at the model level. Presolvers are part of LP/MIP solvers. Their main task is to inspect the model and look for opportunities to make it smaller. Some of this is done by simple operations such as: change a singleton constraint into a bound, remove variables that are not needed, etc [2,3].

Note that the meaning of "smaller" in this context is not totally obvious: some substitutions will decrease the number of decision variables and constraints, while increasing the number of non-zero coefficients in the LP matrix. Usually larger and sparser is better than smaller and denser (most LP and MIP solvers exploit sparsity), so I tend to focus on nonzero counts.

The question I am exploring here: how many reductions do we apply at the modeling level opposed to leave it to solver? If a solver is able to reduce the size by a large amount (loosely defined), I always feel I did not do a good job as a modeler. I just did not pay attention.

The model below demonstrates how we can apply different reduction levels to the model. The model becomes smaller, but at the expense of more complex modeling. What is the right level to choose? Of course there is no objective answer to this. Your "optimal level" may be different from mine.

Problem description

The problem is from [1]. Consider the board:

We need to fill the board with integers: \(x_{i,j} \in Z\). The following rules apply:

  1. Green cells must contain strictly positive values, \(x_{i,j}\ge 1\).
  2. Red cells must contain strictly negative values, \(x_{i,j}\le -1\).
  3. White and blue cells have a value of zero, \(x_{i,j}= 0\).
  4. The red and green cells form a symmetric pattern: if \(x_{i,j}\) is a green cell, \(x_{j,i}\) is a red cell, and the other way around.    
  5. Skew-symmetry or Anti-symmetry: we have the restriction \(x_{i,j} = -x_{j,i}\). Putting it differently: \(X^T = -X\).
  6. Row and column sums are equal to zero: \[ \begin{align}&\sum_i x_{i,j} =0 & \forall j\\& \sum_j x_{i,j} = 0 & \forall i\end{align}\]
There are multiple solutions. We may choose the solutions with the smallest sum of green values:\[\min \sum_{\mathit{Green}(i,j)} x_{i,j}\]

In the board above we have the following statistics:

Cell typeCount
Green cells57
Red cells57
Blue cells20
White cells266

Presolve level 0

A direct formulation for all \(x_{i,j}\) is: \[\begin{align}\min\> & z=\sum_{\mathit{Green}(i,j)}  x_{i,j}\\ &  x_{i,j}\ge 1  & \mathit{Green}(i,j)\\ &  x_{i,j} \le  -1  & \mathit{Red}(i,j)\\  &  x_{i,j} =0  & \mathit{WhiteBlue}(i,j)\\ & x_{i,j} = -x_{j,i} & \forall i,j\\ &\sum_i x_{i,j} =0 & \forall j\\&\sum_j x_{i,j} =0 & \forall i \\ & x_{i,j} \in Z\end{align}\]

This model, with all equations stated as explicit constraints, has the following sizes:

Model SizeCount
nonzero elements1980

The counts here exclude the objective function. Although the solver will automatically convert singleton equations into bounds, I never write these as explicit constraints. I prefer to specify singleton equations as bounds.

Presolve level 1

The first three constraints can be implemented as bounds: \[\ell_{i,j} = \begin{cases} 1 & \mathit{Green}(i,j)\\ -\infty & \mathit{Red}(i,j)\\  0 & \text{otherwise}\end{cases}\] and \[u_{i,j} = \begin{cases} \infty & \mathit{Green}(i,j)\\ -1& \mathit{Red}(i,j)\\ 0 & \text{otherwise}\end{cases}\] Now the model can read: \[\begin{align}\min \> & z=\sum_{\mathit{Green}(i,j)}  x_{i,j}\\ &   x_{i,j} = -x_{j,i} & \forall i\lt j\\ &\sum_i x_{i,j} =0 & \forall j\\&\sum_j x_{i,j} =0 & \forall i\\ & x_{i,j} \in [\ell_{i,j}, u_{i,j}] \\& x_{i,j} \in Z\end{align}\] I also reduced the number of skew-symmetry constraints \(x_{i,j}=-x_{j,i}\): we only need these for \(i\lt j\). This reduces the model size to:

Model SizeCount
nonzero elements1180

All singleton equations have been formulates as bounds. This model has a large number of variables fixed to zero (all variables corresponding to blue and white cells). The solver will presolve those variables away, but I prefer to do this myself.

Presolve level 2

The next level is to remove all \(x_{i,j}\) that are known to be zero from the model. \[\begin{align}\min \> & z= \sum_{\mathit{Green}(i,j)}  x_{i,j}\\ &   x_{i,j} = -x_{j,i} & \forall \mathit{Green}(i,j)\\ &\sum_{i|\mathit{GreenRed}(i,j)} x_{i,j} =0 & \forall j\\&\sum_{j|\mathit{GreenRed}(i,j)} x_{i,j} =0 & \forall i\\ & x_{i,j} \in [\ell_{i,j}, u_{i,j}] & \forall \mathit{GreenRed}(i,j)\\& x_{i,j} \in Z\end{align}\] The only cells we model here are the red and green ones. Our counts are:

Model SizeCount
nonzero elements342

This was my first actual implementation. However, we can go further, and use some more  reductions. Here the model starts to become less intuitive.

Presolve level 3

We can implicitly deal with the red cells: a red cell \(x_{i,j}\) has a corresponding green cell \(-x_{j,i}\). \[\begin{align}\min \> & z=\sum_{\mathit{Green}(i,j)}  x_{i,j}\\  &  \sum_{i|\mathit{Green}(i,j)}  x_{i,j} - \sum_{i|\mathit{Green}(j,i)}x_{j,i} =0 & \forall j\\&\sum_{j|\mathit{Green}(i,j)} x_{i,j} - \sum_{j|\mathit{Green}(j,i)} x_{j,i} =0 & \forall i\\ & x_{i,j} \ge 1 & \forall \mathit{Green}(i,j)\\& x_{i,j} \in Z\end{align}\] We only solve for the green cells here. The value of the red cells can be recovered afterwards.

Model SizeCount
nonzero elements228

The row and column sums now become more difficult to recognize. In addition we need to add some code to recalculate the value of the red cells after the solve.

Presolve level 4

Finally, we can also remove one of the summations because of symmetry. We end up with: \[\begin{align}\min \> & z=\sum_{\mathit{Green}(i,j)}  x_{i,j}\\  &  \sum_{i|\mathit{Green}(i,j)}  x_{i,j} - \sum_{i|\mathit{Green}(j,i)}x_{j,i} =0 & \forall j\\ & x_{i,j} \ge 1 & \forall \mathit{Green}(i,j)\\& x_{i,j} \in Z\end{align}\]
Model SizeCount
nonzero elements114

Again, the value of the red cells need to be calculated after the solve. This model is now very compact, but we moved away from the original problem statement. When reading this model, we would not immediately see the correspondence with the problem.

Does it make a difference?

No. The first model shows:

Presolved: 13 rows, 48 columns, 96 nonzeros
Variable types: 0 continuous, 48 integer (0 binary)

The last model gives:

Presolved: 13 rows, 48 columns, 96 nonzeros
Variable types: 0 continuous, 48 integer (0 binary)
So should we even worry? I still like to generate models that are somewhat small. For me this is not even a performance issue, but rather a question of paying attention. I see sometimes sloppy modeling causing an excessive number of variables, equations and nonzero elements. How far I take this DIY presolve effort is determined by readability and understandability: the limit is when the formulation becomes less obvious and when readability starts to suffer.

In any case, if the solver can presolve large parts of the model away, I want to be able to explain this. May be the model is largely triangular, or there are many singleton equations. Things may be more complex, and such an explanation is not always obvious to find.


The solutions looks like:

The minimum sum of the green cells is 87.

Update: fixed reference: use proper names of authors.


  1. Antisymmetric Table Puzzle where the rows/columns sum to zero.
  2. A.L. Brearley, G. Mitra, H.P. Williams, Analysis of mathematical programming problems prior to applying the simplex algorithm, Mathematical Programming, 8 (1975), pp. 54-83.
  3. E.D. Andersen, K.D. Andersen, Presolving in Linear Programming, Mathematical Programming, 71 (1995), pp. 221-245.

Wednesday, May 23, 2018

Excel Power Query CSV read somewhat slow

Reading a large CSV file into Excel's data model is somewhat slow compared to R: Excel/Power Query takes 100 seconds vs R only 25 seconds on the same CSV file (with 1.8 million records).

Reading CSV into Excel/Power Query data model
I used some VBA code for the timing. The only thing we do is to load data into the data model (i.e. there is no Power Pivot table to create).

R does this quite a bit faster:

Reading the same CSV into R
I expected this to be closer. I think this operation should be IO bound instead of CPU bound, something Microsoft should know how to do super fast.

Monday, May 21, 2018

Find the differences: RStudio local or server

Typically I run RStudio locally. For a project we are experimenting with RStudio Server, which allows you to run Rstudio itself on a server and use it from a web browser. The GUI looks virtually identical. The user experience is very similar (things like filenames are the main difference: the server is Linux based while I run Rstudio locally on Windows).

Sunday, May 20, 2018

Facility location model Q&A

Dear Erwin, 
I always follow and read your interesting posts. Thank you so much for your useful posts on your blogs. For the learning purposes, I was trying to code you MIQCP sample on your blog but not successful.  ( 
I have checked the code, Lp, and your post 10,000 times. Everything seems to be exactly like what you have addressed however not sure why Gams returns the error as 100 nonlinear lines. I see you are too busy and not work as GAMS support. But since it might be a coding problem with your post (or perhaps something is missed), It would be highly appreciated if you could do a favor and double check that. Much obliged.  

The attached GAMS model looks like:

set i /i1*i50/
set j /facility1,facility10/

parameter maxdist /40/;

set c /x,y/

Table dloc(i,c)
           x         y
i1         65        84
i2         8         72
i3         57        28
i4         70        79
i5         82        80
i6         87        95
i7         14        98
i8         28        6
i9         62        22
i10        22        80
i11        31        94
i12        87        74
i13        80        32
i14        92        48
i15        22        74
i16        0         42
i17        80        90
i18        100       56
i19        73        1
i20        82        84
i21        7         15
i22        3         19
i23        99        89
i24        56        94
i25        18        35
i26        8         30
i27        71        19
i28        10        66
i29        7         71
i30        13        66
i31        68        14
i32        95        43
i33        23        50
i34        95        43
i35        4         63
i36        51        13
i37        56        26
i38        41        56
i39        75        72
i40        41        68
i41        57        91
i42        16        63
i43        75        82
i44        44        94
i45        79        32
i46        72        5
i47        4         52
i48        16        20
i49        31        60
i50        92        63

scalar M /1000000/


binary variables


distance(i,j)..          (sum(c,dloc(i,c)- floc(j,c)))**2 =l= Maxdist**2 + M*(1-assign(i,j));
assigndemand(i)..         Sum(j,assign(i,j)) =e= 1;
closed(i,j)..             assign(i,j) =l= isopen(j) ;
Numfacilities..           n =e= sum(j,isopen(j));
order(j+1)..              isopen(j) =g= isopen(j+1);

model location /all/

option limcol = 180;
option limrow =180;

solve location minimizing n using MIQCP;

display n.l,floc.l;

I think this GAMS model as written has a few problems that are interestingly enough to point out.

What is this about?

In [1] a simple facility location problem was discussed. It comprised of two models:

  1. Find the number of facilities \(n\) needed, so all demand locations are within MAXDIST kilometers from the closest facility.
  2. Given that we know \(n\), place these \(n\) facilities such that some total distance measure is minimized. 
The first model is presented as:

What is wrong with the GAMS model in the e-mail?

Quite a few things. Let me try to review them:

  1. The set \(j\) only has two elements. This is not sufficient. We should have more elements in \(j\) than the optimal value of \(n\).  The optimal value for this data set is \(n^* = 3\), so just having two elements in \(j\) will cause the model to be infeasible.
  2. The value for \(M\) needs to get much more attention. The value needs to be large enough that the distance equation becomes non-binding when \(\mathit{assign}_{i,j}=0\). A simple value would be \[M=\sum_c \left(\max_i \{ \mathit{dloc}_{i,c}\} -\min_i \{\mathit{dloc}_{i,c}\}\right)^2 \] This is the square of the length of the diagonal of the box containing all demand locations. For this data this leads to \(M=19409\). 
    We can assume facilities will be placed in the convex hull of the demand locations (sorry, I don't know how to prove that). That means we can use the maximum pairwise squared distance between demand locations \((i,i')\) as a good value for \(M\): \[M= \max_{i,i'} \sum_c \left(\mathit{dloc}_{i,c}-\mathit{dloc}_{i',c}\right)^2\] This sets \(M=14116\).
    With some effort better values can be derived by using individual values \(M_{i,j}\) instead of one \(M\). Big-M values with some computation and just having assigned some large value, are always a point of attention.
    Using just very big values for big-\(M\) constants, is something I see very often. This can lead to all kind of problems when solving the MIP problem. 
  3. The expression (sum(c,dloc(i,c)- floc(j,c)))**2 takes the square of the sum. This is not the same as a sum of squares:\[\sum_i x_i^2 \ne \left(\sum_i x_i\right)^2\] This expression is just malformed.
  4. In GAMS x**y is a general non-linear function, and x**2 is not recognized as quadratic function (GAMS could be much smarter here). This will make the model an MINLP model and not a MIQCP. This will lead to the somewhat cryptic error:

      --- Executing SBB: elapsed 0:00:00.140
      Could not load data
      Detected 100 general nonlinear rows in model

    Of course you can solve it with an MINLP solver. (Confusingly SBB is an MINLP solver).  Of course it is better to use the sqr(x) function to help GAMS. The left-hand side of the distance constraint should be written as: sum(c,sqr(dloc(i,c)- floc(j,c))).
    Side note: if we would try to solve the original expression (sum(c,dloc(i,c)- floc(j,c)))**2 as a MINLP model, we would still be in a heap of problems. The function x**y assumes \(x\ge 0\). This is obviously not the case here. We would see a ton of domain errors. 
  5. For an optimal solution, you will need to add: option optcr=0; GAMS has a default relative gap tolerance of 10%. With this options we set this tolerance to zero.
Modeling is not that easy! We need to pay attention to lots of details to get models working correctly.

Wednesday, May 16, 2018

Coming to Excel: custom R plots

Interesting. Excel's charts are looking a bit dated compared to what we can do with ggplot, or Tableau. This will make a good step in the right direction.

Sankey Chart in Excel


  1. Custom R charts coming to Excel,
  2. Azure Machine Learning, JavaScript Custom Functions, and Power BI Custom Visuals Further Expand Developers Capabilities with Excel,
  3. Excel announces new data visualization capabilities with Power BI custom visuals,

Indicator constraints

In [1] a question is posed about how to implement the implication:\[ x+y\le 5 \Rightarrow \delta = 1\] Here \(\delta \in \{0,1\}\) is a binary variable. Indicator constraints are of the form:\[ \delta = 0 \Rightarrow \text{constraint}\] or \[ \delta = 1 \Rightarrow \text{constraint}\] The binary variable \(\delta\) is sometimes called an indicator variable.

From propositional logic we have: \[\begin{align} & A \Rightarrow B \\ & \Leftrightarrow \\  & \neg B \Rightarrow \neg A\end{align}\] This is called transposition [2].

From this, we can formulate:\[\delta = 0 \Rightarrow x+y\gt 5\] If \(x\) or \(y\) is a continuous variable we can choose to write \[\delta = 0 \Rightarrow x+y\ge 5.001\] or just \[\delta = 0 \Rightarrow x+y\ge 5\] In the latter case we keep things ambiguous for \( x+y=5\) which is in practice often a good choice.

If  \(x\) and \(y\) are integers, we can do: \[\delta = 0 \Rightarrow x+y\ge 6\]

This trick is quite useful to know when dealing with indicator constraints.