Monday, January 15, 2018

Least squares as QP: convexity issues

In [1] a non-negative least squares problem is solved using Cplex's matlab interface (function cplexlsqnonneglin [2]).

Model 1

The model this function tries to solve is:
\[\begin{align}\min\>&||Cx-d||_2^2\\&x\ge 0\end{align}\]

Unfortunately we see:

Number of nonzeros in lower triangle of Q = 861
Using Approximate Minimum Degree ordering
Total time for automatic ordering = 0.00 sec. (0.02 ticks)
Summary statistics for factor of Q:
  Rows in Factor            = 42
  Integer space required    = 42
  Total non-zeros in factor = 903
  Total FP ops to factor    = 25585
CPLEX Error  5002: objective is not convex.
QP with an indefinite objective can be solved
to local optimality with optimality target 2,
or to global optimality with optimality target 3.
Presolve time = 0.03 sec. (0.06 ticks)
Barrier time = 0.03 sec. (0.06 ticks)

This looks like a numerical or tolerance issue: least-squares problems should be convex.

Model 2

I often propose a different model:
\[\begin{align}\min\>&||r||_2^2\\&r=Cx-d\\&x\ge 0, r\text{ free}\end{align}\]

This looks like a bad idea: we add variables \(r\) and extra constraints. However this model has a big advantage: the \(Q\) matrix is much more well-behaved. It is basically a (very well-scaled) diagonal matrix. Using this model we see:

Tried aggregator 1 time.
QP Presolve eliminated 1 rows and 1 columns.
Reduced QP has 401 rows, 443 columns, and 17201 nonzeros.
Reduced QP objective Q matrix has 401 nonzeros.
Presolve time = 0.02 sec. (1.21 ticks)
Parallel mode: using up to 8 threads for barrier.
Number of nonzeros in lower triangle of A*A' = 80200
Using Approximate Minimum Degree ordering
Total time for automatic ordering = 0.00 sec. (3.57 ticks)
Summary statistics for Cholesky factor:
  Threads                   = 8
  Rows in Factor            = 401
  Integer space required    = 401
  Total non-zeros in factor = 80601
  Total FP ops to factor    = 21574201
 Itn      Primal Obj        Dual Obj  Prim Inf Upper Inf  Dual Inf          
   0   3.3391791e-01  -3.3391791e-01  9.70e+03  0.00e+00  4.20e+04
   1   9.6533667e+02  -3.0509942e+03  1.21e-12  0.00e+00  1.71e-11
   2   6.4361775e+01  -3.6729243e+02  3.08e-13  0.00e+00  1.71e-11
   3   2.2399862e+01  -6.8231454e+01  1.14e-13  0.00e+00  3.75e-12
   4   6.8012056e+00  -2.0011575e+01  2.45e-13  0.00e+00  1.04e-12
   5   3.3548410e+00  -1.9547176e+00  1.18e-13  0.00e+00  3.55e-13
   6   1.9866256e+00   6.0981384e-01  5.55e-13  0.00e+00  1.86e-13
   7   1.4271894e+00   1.0119284e+00  2.82e-12  0.00e+00  1.15e-13
   8   1.1434804e+00   1.1081026e+00  6.93e-12  0.00e+00  1.09e-13
   9   1.1163905e+00   1.1149752e+00  5.89e-12  0.00e+00  1.14e-13
  10   1.1153877e+00   1.1153509e+00  2.52e-11  0.00e+00  9.71e-14
  11   1.1153611e+00   1.1153602e+00  2.10e-11  0.00e+00  8.69e-14
  12   1.1153604e+00   1.1153604e+00  1.10e-11  0.00e+00  8.96e-14
Barrier time = 0.17 sec. (38.31 ticks)

Total time on 8 threads = 0.17 sec. (38.31 ticks)
QP status(1): optimal
Cplex Time: 0.17sec (det. 38.31 ticks)

Optimal solution found.
Objective :           1.115360

This reformulation is not only useful for Cplex. Some other solvers show the same behavior: the first model is declared to be non-convex while the second model solves just fine. Some other solvers solve both models ok. These solvers add a very tiny value to the diagonal elements of the \(Q\) matrix to make it numerically positive semi-definite [3].

A little background in numerics

The \(Q\) matrix is formed by \(C^TC\). We can show that \(C^TC\) is always positive semi-definite. This follows from: \(x^T(C^TC)x=(Cx)^T(Cx)\ge 0\). However the inner product \(Q=C^TC\) can accumulate some small floating point errors, causing \(Q\) to be no longer positive semi-definite. When I calculate \(Q=C^TC\) and print its eigenvalues I see:

 [1]  6.651381e+03  4.436663e+02  3.117601e+02  2.344391e+02  1.377582e+02  4.842287e+01
 [7]  2.471843e+01  4.970353e+00  4.283370e+00  3.878515e+00  9.313360e-01  4.146773e-01
[13]  2.711321e-01  2.437365e-01  2.270969e-01  8.583981e-02  2.784610e-02  2.047456e-02
[19]  1.670033e-02  8.465492e-03  3.948085e-03  2.352994e-03  1.726044e-03  1.287873e-03
[25]  1.095779e-03  2.341448e-04  1.776231e-04  1.266256e-04  4.064795e-05  2.980187e-05
[31]  1.056841e-05  2.409130e-06  2.039620e-06  5.597181e-07  1.367130e-07  2.376693e-08
[37]  3.535393e-09  1.355988e-11  2.996032e-14 -1.468230e-13 -1.622616e-13 -2.217776e-13

The presence of negative eigenvalues indicate this calculated matrix \(Q\) is indeed not pos def. Another verification is to see if we can form a Cholesky factorization [4]. This will fail if the matrix is not positive definite:

This factorization is typically used inside solvers on the \(Q\) matrix. Finally, it is noted that it is also possible that \(Q\) is actually pos def but both tests (eigenvalues and Cholesky decomposition) fail to see this. Of course for all practical purposes that means the matrix is not numerically positive definite.

In Ordinary Least Squares (OLS) it is well known that solving the normal equations is not very numerically stable. I believe this is related (the same inner product is formed).

André-Louis Cholesky (1875-1918)


  2. Cplex, cplexlsqnonneglin
  3. Gurobi, PSDtol
  4. Cholesky decomposition,

Saturday, January 13, 2018

Solving a facility location problem as an MIQCP

Facility location models [1] are one of the most studied areas in optimization. It is still interesting to build a model from scratch.

The question is [2]:

Given \(m\) customer locations, find the minimum number of warehouses \(n\) we need such that each customer is within \(\mathit{MaxDist}\) miles from a warehouse.

Model 1 : minimize number of warehouses

The idea I use here, is that we assign a customer to a warehouse and make sure the distance constraint is met. In the following let \(i\) be the customers, \(j\) be the (candidate) warehouses. We can define the binary variable:\[\mathit{assign}_{i,j} = \begin{cases}1 & \text{if customer $i$ is serviced by warehouse $j$}\\0 &\text{otherwise}\end{cases}\]

A high-level distance constraint is:

\[\begin{align} &\mathit{assign}_{i,j}=1 \Rightarrow \mathit{Distance}(i,j) \le \mathit{MaxDist}&&\forall i,j \\ &\sum_j \mathit{assign}_{i,j} =1 && \forall i\\&\mathit{assign}_{i,j} \in \{0,1\} \end{align}\]

The distance is the usual:

\[ \mathit{Distance}(i,j)  = \sqrt{ \left( \mathit{dloc}_{i,x} - \mathit{floc}_{j,x} \right)^2 + \left( \mathit{dloc}_{i,y} - \mathit{floc}_{j,y} \right)^2 }  \]

Here \(\mathit{dloc}, \mathit{floc}\) are the \((x,y)\) coordinates of the demand (customer) and facility (warehouse) locations. The \(\mathit{dloc}\) quantities are constants, opposed to \(\mathit{floc}\) which are decision variables: these are calculated by the solver.

This distance function is nonlinear. We can make it more suitable to be handled by quadratic solvers by reformulating the distance constraint as:

\[\left( \mathit{dloc}_{i,x} - \mathit{floc}_{j,x} \right)^2 + \left( \mathit{dloc}_{i,y} - \mathit{floc}_{j,y} \right)^2 \le \mathit{MaxDist}^2 + M (1-\mathit{assign}_{i,j})\]

where \(M\) is a large enough constant.

The next step is to introduce a binary variable: \[\mathit{isopen}_{j} = \begin{cases}1 & \text{if warehouse $j$ is open for business}\\0 &\text{otherwise}\end{cases}\] We need \(\mathit{isopen}_j=0 \Rightarrow \mathit{assign}_{i,j} =0\). This can be modeled as: \[\mathit{assign}_{i,j}\le \mathit{isopen}_j\] The objective is to minimize the number of open warehouses:\[\min n = \sum_j \mathit{isopen}_j\]

A complete model can look like:

The problem is a convex MIQCP: Mixed Integer Quadratically Constrained Problem. This model type is supported by solvers like Cplex and Gurobi. The last constraint order is designed to make sure the open facilities are the first in the list of candidate facilities. It is also a way to reduce symmetry in the model.

To test this, I generated 50 random points with \(x,y\) coordinates between 0 and 100. The maximum distance is \(\mathit{MaxDist}=40\). The optimal solution is:

----     60 VARIABLE n.L                   =        3.000  number of open facilties

----     60 VARIABLE isopen.L  use facility

facility1 1.000,    facility2 1.000,    facility3 1.000

----     63 PARAMETER locations  

                    x           y

facility1      43.722      25.495
facility2      31.305      59.002
facility3      67.786      51.790

It is noted that the only relevant result in the solution is \(n=3\) (the number of open facilities). The chosen location for the warehouses and the assignment is not of much value. We can see this from a picture of the solution, in particular the variables floc and assign:

Misinterpretation of the solution

Model 2: Optimal location of warehouses

We can solve a second model that given \(n=3\) warehouses obtains a solution that:

  • assigns customers to their closest warehouse
  • finds warehouse locations such that the distances are minimized 
If we minimize the sum of the distances:
\[\min \sum_{i,j} \left[ \mathit{assign}_{i,j}\cdot \sqrt{\sum_c  \left( \mathit{dloc}_{i,c} - \mathit{floc}_{j,c} \right)^2 }\right]\]
we achieve both goals in one swoop. Of course this is a nasty nonlinear function. We can make this a little bit more amenable to being solved with standard solvers:
\[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align}\min &\sum_{i,j} d_{i,j}\\ &d_{i,j} \ge  \sum_c  \left( \mathit{dloc}_{i,c} - \mathit{floc}_{j,c} \right)^2 - M (1-\mathit{assign}_{i,j})&&\forall i,j\\&\sum_j \mathit{assign}_{i,j} =1 && \forall i\\&\mathit{assign}_{i,j} \in \{0,1\}\\&0\le d_{i,j}\le\mathit{MaxDist}^2 \end{align}}\] Note that now \(j = 1,\dots,n=3\). This is again a convex MIQCP model.

To be a little bit more precise: with this model we minimize the sum of the squared distances which is slightly different. With this objective we place somewhat of a higher weight on large distances. This is different from the first model. There we also used squared distances, but that did not change the meaning of the model one bit. In the case of this second model the summation of squared distances will give slightly different results from actually minimizing the sum of distances.

When we solve this model we see a different location for the warehouses:

----     78 PARAMETER locations  

                    x           y

facility1      24.869      18.511
facility2      72.548      46.855
facility3      24.521      74.617

The plot looks much better now:

Optimal Warehouse Locations with Minimized Distances

There are of course many other ways to attack this problem:

  • Instead of using two model we can combine them. We can use a multi-objective formulation \(\min w_1 n  + w_2 \sum_{i,j} d_{i,j}\)  Use a large weight \(w_1\) to put priority on the first objective.
  • Instead of minimizing the sum of the distances we can choose to minimize the largest distance.
  • Instead of Euclidean distances we can use a Manhattan distance metric. This will give us a linear MIP model.


  1. Zvi Drezner, Horst W. Hamacher, eds., Facility Location, Applications and Theory, Springer, 2001,
  2. Facility Location - Algorithm to Minimize facilities serving customers with distance constraint,

Friday, January 12, 2018

New book: linear programming using matlab

If you want to know more about the nitty-gritty details of linear programming solvers, this hefty book (637 pages) may be of interest. Contains lots of Matlab code that implement different LP solvers.


  1. Introduction
  2. Linear Programming Algorithms
  3. Linear Programming Benchmark and Random Problems
  4. Presolve Methods
  5. Scaling Techniques
  6. Pivoting Rules
  7. Basis Inverse and Update Methods
  8. Revised Primal Simplex Algorithm
  9. Revised Dual Simplex Algorithm
  10. Exterior Point Simplex Algorithm
  11. Interior Point Methods
  12. Sensitivity Analysis

    A. MATLAB's Optimization Toolbox Algorithms
    B. State-of-the-Art Linear Programming Solvers: CLP and Cplex 


Sunday, January 7, 2018

A difficult row selection problem

The problem

In [1] an interesting problem is posted. From a matrix \(A_{i,j}\) (\(i=1,\dots,m\),\(j=1,\dots,n\)) select \(K\) rows such that a cost function is minimized. Let \(S\) be the set of selected rows. The cost function is defined as: \[ \min_S \> \left\{ 0.2 \max_{i \in S} A_{i,1} + 0.4 \sum_{i \in S} A_{i,2} - 0.3 \sum_{i \in S} A_{i,3} - 0.1 \max_{i \in S} A_{i,4} \right\} \] This problem is by no means simple to model (probably unbeknownst to the poster). Let's dive in.

Selecting rows

We define a binary decision variable:
\[\delta_i = \begin{cases} 1 &\text{if row $i$ is selected}\\ 0 &\text{otherwise}\end{cases}\]
We want to select \(K\) rows:
\[\sum_i \delta_i = K \]

A simpler objective

If the cost function was just  \[ \min_S \> \left\{ 0.4 \sum_{i \in S} A_{i,2} - 0.3 \sum_{i \in S} A_{i,3} \right\} \] things are easy. We can write: \[\min\>  0.4 \sum_i \delta_i A_{i,2} - 0.3 \sum_i \delta_i A_{i,3}\]

As this cost function is separable in the rows, we can even use simple algorithm to solve problem (instead of using a MIP mode):
  1. Create a vector \(c_i =  0.4 A_{i,2} - 0.3 A_{i,3}\)
  2. Sort the vector \(c\) 
  3. Pick rows that correspond to the \(K\) smallest values of \(c\)
The \(\max\) terms in the objective make the objective non-separable in the rows, so this little algorithm is no longer applicable.

Review: Modeling a maximum

The maximum \(z = \max(x,y)\) where \(x\) and \(y\) are decision variables, can in general be modeled as: \[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align} &z\ge x\\ &z \ge y\\ &z \le x + M \eta\\ & z\le y + M(1-\eta)\\ &\eta \in \{0,1\}\end{align}}\]

Here \(M\) is a large enough constant. This formulation can be interpreted as: \[\begin{align}&z \ge x \text{ and } z \ge y\\  &z \le x \text{ or } z \le y \end{align}\]

The maximum of a set of variables: \(z = \max_k x_k\) is a direct extension of the above: \[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align} &z\ge x_k && \forall k\\  &z \le x_k + M \eta_k&& \forall k\\ & \sum_k \eta_k = 1\\ &\eta_k \in \{0,1\}\end{align}}\] This formulation is correct in all cases. We can simplify things a little bit depending on how the objective is written.

If we are minimizing the maximum we can actually simplify things considerably:\[\min \left\{\max_k x_k \right\} \Rightarrow \begin{cases}\min &z\\ &z \ge x_k & \forall k\end{cases}\]

Maximizing the maximum does not save much. We can write:
\max \left\{  \max_k  x_k \right\}  \Rightarrow \begin{cases}\max   & z\\& z \le x_k+M \eta_k & \forall k\\ & \displaystyle \sum_k \eta_k = 1\\ &\eta_k \in \{0,1\}

A complete model

We generate a small data set with 10 rows. We want to select \(K=2\) rows that maximize our objective. The data part of the model can look like:

The model itself using the most general implementation of the maximum can look like:


  • The parameter range functions as a big-M constant.
  • The variables \(\eta_{i,j}\) are only used for the \(j\)'s corresponding to a "max" column.
  • We needed to make sure that the selected row with the maximum (i.e. \(\eta_{i,j}=1\)) is limited the selected rows. We do this by requiring \(\delta_j=0 \Rightarrow \eta_{i,j}=0\).
  • We can simplify things for the first objective where \(c_j>0\). For this column we have a simple "minimax" case. This means we can drop equations maxObj2, sum1, and implyZero and the corresponding \(\eta\)'s from the model. We still need them for the last column.
  • Similarly we can drop equation maxObj1 when  \(c_j < 0\) (objective 4).
  • An optimized model can therefore look like:

The results look like:

----     57 PARAMETER A  data

             j1          j2          j3          j4

i1       -6.565       6.865       1.008      -3.977
i2       -4.156      -5.519      -3.003       7.125
i3       -8.658       0.004       9.962       1.575
i4        9.823       5.245      -7.386       2.794
i5       -6.810      -4.998       3.379      -1.293
i6       -2.806      -2.971      -7.370      -6.998
i7        1.782       6.618      -5.384       3.315
i8        5.517      -3.927      -7.790       0.048
i9       -6.797       7.449      -4.698      -4.284
i10       1.879       4.454       2.565      -0.724

----     57 VARIABLE delta.L  select rows

i3 1.000,    i5 1.000

----     57 VARIABLE obj.L  single objectives

j1 -6.810,    j2 -4.994,    j3 13.341,    j4  1.575

----     57 VARIABLE z.L                   =       -7.519  final objective

The solution can also be viewed as:

The data set is small enough to enumerate all possible solutions. There are only 45 different ways to select two rows from ten, so we can just select the best objective:

As an aside, the different configurations were generated by letting the solution pool technology in Cplex enumerate all solutions to the optimization problem: \[ \begin{align} \min\>&0\\ & \sum_i x_i = K\\ & x_i \in \{0,1\} \end{align} \] This explains the somewhat strange ordering of the different solutions. Of course this overkill: if you have a hammer in the form of a solver, everything looks like an optimization problem.

A larger data set with \(m=10,000\) rows and \(K=100\) took 30 seconds to solve even though it has 20,000 binary variables. The number of combinations for this problem is \[{{m}\choose{K}} = \frac{m!}{(m-K)!K!} \] This is a big number for these values:


A little bit too big to try complete enumeration.



Wednesday, January 3, 2018

Variant of an Assignment Problem

The standard assignment problem can be stated as:

\[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align}\min\>&\sum_{i,j} c_{i,j} x_{i,j}\\ & \sum_j x_{i,j} = 1 && \forall i\\ & \sum_i x_{i,j} = 1 && \forall j\\&x_{i,j} \in \{0,1\} \end{align}}\]

There are highly efficient specialized algorithms for this problem [1]. I often solve these problems just as an LP (the integer variables are automatically integer valued for this particular problem). That sounds like a dumb approach, but there are some reasons:

  • LP solvers are actually very good in solving this quickly (often faster than a specialized method that is not implemented with very much care for efficiency [2]).
  • Variants and extensions are (often) easily handled.

The standard model above assumes the sets \(i \in I\) (sources) and \(j \in J\) (destinations) have the same size. Let \(n=|I|\) and \(m=|J|\) be the number of \(i\)'s and \(j\)'s. Thus the standard model has \(n=m\). Now assume we have more \(j\)'s than \(i\)'s: \(m>n\). The model can be updated to:

\[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align}\min\>&\sum_{i,j} c_{i,j} x_{i,j}\\ & \sum_j x_{i,j} = 1 && \forall i\\ & \sum_i x_{i,j} \le 1 && \forall j\\&x_{i,j} \in \{0,1\} \end{align}}\]

I.e. each \(i\) is connected to a unique \(j\), but some \(j\)'s are not assigned. A solution for a random data set with \(n=5\) and \(m=10\) can look like:

----     41 PARAMETER c  cost coefficients

            j1          j2          j3          j4          j5          j6          j7          j8          j9         j10

i1       0.661       0.756       0.627       0.284       0.086       0.103       0.641       0.545       0.032       0.792
i2       0.073       0.176       0.526       0.750       0.178       0.034       0.585       0.621       0.389       0.359
i3       0.243       0.246       0.131       0.933       0.380       0.783       0.300       0.125       0.749       0.069
i4       0.202       0.005       0.270       0.500       0.151       0.174       0.331       0.317       0.322       0.964
i5       0.994       0.370       0.373       0.772       0.397       0.913       0.120       0.735       0.055       0.576

----     41 VARIABLE x.L  assignment variables

            j2          j5          j6          j9         j10

i1                       1
i2                                   1
i3                                                           1
i4           1
i5                                               1

Let's add a picture to this post (created with tikz):
Original assignment and new variant

In [3] an unusual variant is asked for: the assigned \(j\)'s should be adjacent. The required configuration can be seen in the right picture above.

The first thing we do is to keep track if a \(j\) node is assigned. For this we introduce new binary variables \(y_j\) and replace the second assignment constraint \(\sum_i x_{i,j} \le 1\) by:

\[\begin{align} &y_j = \sum_i x_{i,j}\\&y_j \in \{0,1\}\end{align}\]

The variable \(y_j\) will be zero for the unassigned nodes (grey in the picture above), and one if connected (red). We can make sure we don't have "holes" in the \(y\) vector by requiring that we see the pattern \([0,\> 1]\)  only once. This we can model as:

\[\begin{align} &z_j \ge y_j - y_{j-1}\\& \sum_j z_j \le 1\\&z_j \in \{0,1\} \end{align}\]

This approach is often used in machine scheduling or power generation: prevent a generator to be turned on too many times.

This model has as results:

----     44 VARIABLE x.L  assignment variables

            j5          j6          j7          j8          j9

i1                                                           1
i2                       1
i3                                               1
i4           1
i5                                   1

----     44 VARIABLE y.L  destination is used

j5 1,    j6 1,    j7 1,    j8 1,    j9 1

----     44 VARIABLE z.L  0-1 transition

j5 1

The complete GAMS model looks like:


  • The variables \(y\) and \(z\) can be relaxed to be continuous between zero and one.
  • The \(x\) variables are no longer automatically integer valued. We need to solve this model as a real MIP and can not use just an LP solver.
  • When using leads and lags we always have to look at the boundary case. Here we have the special case \(z_1 \ge y_1 - 0\). In GAMS when we index out-of-domain we get a zero. That is exactly what we need in this case.
  • The problem can also be attacked by solving \(m-n+1\) assignment problems (and then pick the best one). 


  1. Rainer Burkard,‎ Mauro Dell'Amico,‎ Silvano Martello, Assignment Problems, SIAM, 2009
  2. Assignment problem,
  3. Combinatorial optimization: assignment (matching) with “consecutive” assignments,

Wednesday, December 27, 2017

Filling rectangles with polyominos

I know about dominos, but polyominos [1] are new for me. In [2] a problem is posted: fill a rectangle (or square) with a number of different polyominos. No overlap or rotation allowed. I assume there is no limitation in the supply of polyominos.

For example, consider a \(11 \times 11\) grid, and choose from the following building blocks:

Five different polyominos

We can try to fill our grid, but we will not always succeed:

Fail to cover complete grid

I solved this as a MIP as follows. My decision variables are:

\[x_{i,j,k} = \begin{cases} 1 & \text{if we place polyomino $k$  at location $(i,j)$}\\ 0 & \text{otherwise} \end{cases}\]

I used as rule that the left-upper corner of each polynomino is its anchor. I.e. in the picture above we have \(x_{1,1,4} = 1\), \(x_{2,1,2}=1\), \(x_{1,3,5}=1\) etc.

To formulate a non-overlap constraint I populated a set \(cover_{i,j,k,i',j'}\), with elements that exist  if cell \((i',j')\) is covered when we place polyomino \(k\) in cell \((i,j)\). To require each cell \((i',j')\) is covered we can say:

\[ \forall i',j':  \sum_{i,j,k|cover_{i,j,k,i',j'}} x_{i,j,k} = 1\]

This constraint is infeasible if we cannot cover each cell \((i',j')\) exactly once. In order to make sure we can show a meaningful solution when we cannot cover each cell, we formulate the following optimization model:

\[\begin{align} \max\>&\sum_{i,j} y_{i,j}\\&y_{i',j'} = \sum_{i,j,k|cover_{i,j,k,i',j'}} x_{i,j,k}\\&x_{i,j,k}\in \{0,1\}\\&y_{i,j} \in \{0,1\}\end{align}\]

Here \(y_{i,j}=1\) indicates cell \((i,j)\) is covered exactly once, and \(y_{i,j}=0\) says the cell is not covered.

With a little bit of effort we can produce the following:

61 x 61 board with 9 different polyominos
A SAT solver may be better suited for this [2], but MIP works for all but very large instances.


  1. Polyomino,
  2. 2D bin packing on a grid,

Tuesday, December 19, 2017

SOS1 variables and Wikipedia

In Wikipedia [1]  the following definition of a Special Ordered Set of Type 1 (SOS1) [2] is given:

Special Ordered Sets of type 1 (SOS1 or S1): are a set of variables, at most one of which can take a strictly positive value, all others being at 0.

This is not completely correct. A SOS1 member can be a negative variable or a free variable. Such a variable may take a negative value.

An example:

Model with SOS1 variables with a negative lower bound

I.e. we want to minimize \(z=\sum_i x_i\) with \(x_i \in [-i,\infty)\) while the \(x_i\)'s form a SOS1 set. The results are:

---- VAR x  

          LOWER          LEVEL          UPPER         MARGINAL

i1        -1.0000          .            +INF            1.0000      
i2        -2.0000          .            +INF            1.0000      
i3        -3.0000        -3.0000        +INF            1.0000      

                           LOWER          LEVEL          UPPER         MARGINAL

---- VAR z                 -INF           -3.0000        +INF             .          

Indeed, we see a single negative value; the others remain zero.


The correct definition is found in most manuals, e.g. the docs on OSL (a discontinued IBM product) MPS format mentions [3]:

A special ordered set (SOS) defines restrictions on the integer variables in the set. During branch-and-bound processing, branching on these sets consists of setting all variables before a certain point in the set to zero on one branch and setting all of them after a certain point to one on the alternative branch.
Three different types of special ordered sets are recognized:
  • Type 1, where at most one variable in the set may take a nonzero value.
  • Type 2, where at most two variables in the set may take nonzero values. If two variables are nonzero, they must be adjacent in the set. This allows the modeling of nonconvex separable functions, such as economies of scale.
  • Type 3, where at most one variable in the set may take a nonzero value, and there was at least one equality row with an RHS value of 1.0 in which all the variables in the set had a coefficient of 1.0. In this case, it is possible to say that all the variables are 0-1 variables. This allows the modeling of decisions.
The SOS3 type is an extension that is not much used.


Looks like there is no complete uniformity here. When doing some further tests, we found out a few solvers (GAMS/BDMLP, GAMS/LINDO) return something different:

---- VAR x  

          LOWER          LEVEL          UPPER         MARGINAL

i1        -1.0000        -1.0000        +INF            1.0000      
i2        -2.0000        -2.0000        +INF            1.0000      
i3        -3.0000        -3.0000        +INF            1.0000      

                           LOWER          LEVEL          UPPER         MARGINAL

---- VAR z                 -INF           -6.0000        +INF             .          

Using the democratic rule: "one solver, one vote" they are on the losing side of this debate. Of course it would be much better if all solvers do the same thing.

Update: LINDO has confirmed this as a bug and fixed it. The behavior of BDMLP was acknowledged as "as designed".

Notes on using SOS1 sets

These days we probably should replace SOS sets by formulations with binary variables. Often binary variable formulations perform better. For this we often need good bounds, in order to make big-M constants having tight values. Some solvers support indicator constraints and general constraints (like max, min, etc.). These constructs may introduce SOS1 variables behind the scenes.

SOS2 variables

The Wikipedia entry for SOS1 [1] has been fixed by now (thank you!). However, a similar issue is present in the section on SOS2 variables. The Wikipedia text says:

Special Ordered Sets of type 2 (SOS2 or S2): an ordered set of non-negative variables, of which at most two can be non-zero, and if two are non-zero these must be consecutive in their ordering.

Using the same model as before but replacing sos1 variable by sos2 variable we should see as solution:

---- VAR x  

          LOWER          LEVEL          UPPER         MARGINAL

i1        -1.0000          .            +INF            1.0000      
i2        -2.0000        -2.0000        +INF            1.0000      
i3        -3.0000        -3.0000        +INF            1.0000      

                           LOWER          LEVEL          UPPER         MARGINAL

---- VAR z                 -INF           -5.0000        +INF             .          

As we can see SOS2 variables can be negative. Of course in virtually all practical cases SOS2 variables will be non-negative, but it is not a requirement.

I am not sure if the emphasis in the Wikipedia text on the ordering is so useful. This may be read as \(x_i \le x_{i+1}\), which is not what SOS2 is about. I think just saying the two non-zero values must be consecutive or adjacent should be enough. Writing is not so easy!


  1. Special Ordered Set,
  2. E. M. L. Beale and J. A. Tomlin, Special Facilities in a General Mathematical Programming System for Nonconvex Problems Using Ordered Sets of Variables, Operational Research 69, pp 447–454, 1970
  3. IBM Optimization Subroutine Library, Guide and Reference, Release 2, 1991
  4. Wilhelm Hummeltenberg, Implementations of special ordered sets in MP software, European Journal of Operations Research, 17 (1), 1984, pp. 1-15.

Sunday, December 17, 2017

A production scheduling problem


A post in [1] describes an interesting problem:

I have read about cutting stock problem, but this is a bit different.
We have orders in different variants, and a maximum sized machine to produce it.
Variants would be X,S,XL,L and ordered quantities are 100,40,40,80. 
Say, the machine width is 6. This means, we can put 6 different variants
together and produce it. 
We can put 2 X,1 S,1 XL,2 L, this means if we produce it 50 times,
output is :
X = 100 (0 waste)
S = 50 (10 waste)
XL = 50 (10 waste)
L = 100 (20 waste)

Total of 40 waste in 300 produced. 
Another aproach to reduce waste would be creating 2 different variation. We
can put 4 X, 2 S and produce it 25 times, with 10 waste and make another
setup and put 2 XL,4 L and produce it 20 times with no waste. With total
10 waste we handled this production in 2 setups. 
Since setup has a price, we would prefer first setup, or depending on
quantities, we may choose the second one. 
I have read about cutting stock and it looks similar to this one, but
ability to divide quantities between different setups, this has more
potential to optimize and therefore more complex.

High level model


We can formulate this as a mixed-integer quadratic model. The data looks like:

The optimization model is as follows:


  1. The variables \(run_r\) indicate if we execute a production run or not. Within a production run we use the same pattern.
  2. We have \(run_r=0 \Rightarrow pattern_{v,r}=0, runlen_r=0\). We could drop one of these constraints but not both.
  3. Equation \(edemand\) has a nasty quadratic term, It causes the model to be non-linear and non-convex. Of course we can solve this with a global solver. 
  4. The \(order\) constraint makes sure all active runs (ie. \(run_r=1\)) are before the inactive ones (\(run_r=0\)). This makes the solution look nicer and will reduce some symmetry.
  5. The funny \(\forall r-1\) means we generate the \(order\) constraint for all \(r\) except the first one. 
  6. The model type is actually MIQCP (Mixed Integer Quadratically Constrained Program). There are commercial solvers for convex problems of this type, but for this non-convex problem I used a global MINLP solver (Baron, Couenne, Antigone).


The input data can be summarized as:

----     62 PARAMETER costwaste  cost of waste

X  1.000,    S  2.000,    XL 3.000,    L  4.000

----     62 PARAMETER costsetup            =      100.000  fixed cost of a run

----     62 PARAMETER demand  

X  100,    S   40,    XL  40,    L   80

The solution looks like:

----     62 VARIABLE runlen.L  length of a run

run1 20,    run2 40

----     62 VARIABLE pattern.L  production pattern of a run

          run1        run2

X            1           2
S                        1
XL           2
L            2           1

----     62 VARIABLE waste.L  waste per variant

                      ( ALL          0. )

----     62 VARIABLE wastecost.L           =        0.000  cost of waste
            VARIABLE setupcost.L           =      200.000  cost related to setups
            VARIABLE cost.L                =      200.000  total cost

We see we use two production runs. We are under-utilizing the machine (a pattern of 5 and 4 items). It is noted that this solution is not unique. A different solver will likely give a different solution.

Shorter  run lengths

I believe we are missing some objective in this model. Suppose we have a demand of X=60 only (zero demand for the other variants). We can organize in a single run in at least the following ways:

  • Pattern X=1. This gives a run length of 60. Total cost = 100 (cost of one setup).
  • Pattern X=6. This gives a run length of 10. Total cost = 100 (cost of one setup). 
I would guess the second configuration is better. Probably we should add some cost for the total sum of the run lengths. 

If we add a unit cost of 1 for operating the machine, the total operating cost can be expressed as \(\sum_r runlen_r\). If we add this to the total cost we get the following solution:

----     68 VARIABLE runlen.L  length of a run

run1  4,    run2 40

----     68 VARIABLE pattern.L  production pattern of a run

          run1        run2

X            5           2
S                        1
XL                       1
L                        2

----     68 VARIABLE waste.L  waste per variant

                      ( ALL          0. )

----     68 VARIABLE wastecost.L           =        0.000  cost of waste
            VARIABLE setupcost.L           =      200.000  cost related to setups
            VARIABLE opcost.L              =       44.000  operating cost
            VARIABLE cost.L                =      244.000  total cost

A visualization of these two production schedules can look like:

Production Schedules


If we want to use a high-performance MIP solver, we need to linearize equation \(edemand\). I don't know how to linearize an integer variable times an integer variable. However, if we can replace an integer variable by a series of binary variables, then we are in more favorable territory.

The machine can handle up 6 variants at the time. This means \(pattern_{v,r}\le 6\). This makes \(pattern\) a good candidate for a binary expansion:

\[\begin{align}&pattern_{v,r} = \sum_k k \cdot \delta_{v,r,k}\\&\sum_k  \delta_{v,r,k} \le 1\\& \delta_{v,r,k} \in \{0,1\}\end{align}\]

where \(k\) runs from 1 to \(cap=6\). Now we can write the \(edemand\) equation as:

\[\sum_{r,k}  k \cdot ( runlen_r \cdot \delta_{v,r,k}) = demand_v + waste_v\]

The multiplication of a binary variable and a continuous variable can be written as linear inequalities [2]. This results in:

\[\begin{align} & \sum_{r,k}  k \cdot w_{v,r,k} = demand_v + waste_v\\ & w_{v,r,k} \le \delta_{v,r,k} \cdot maxrunlen\\ & w_{v,r,k} \le runlen_r\\ & w_{v,r,k} \ge runlen_r - maxrunlen \cdot (1-\delta_{v,r,k}) \\ &w_{v,r,k}\ge 0 \end{align} \]

We can also go the other route: expand \(runlen_r\). This would lead to:

&\sum_{r,\ell} \ell \cdot v_{v,r,\ell} = demand_v + waste_v\\
&\sum_{\ell} \ell \cdot \rho_{r,\ell} = runlen_r\\
&\sum_{\ell} \rho_{r,\ell} \le 1\\
& v_{v,r,\ell} \le  cap \cdot \rho_{r,\ell}\\
& v_{v,r,\ell} \le pattern_{v,r}\\
& v_{v,r,\ell} \ge pattern_{v,r} - cap \cdot (1-\rho_{r,\ell})\\
& \rho_{r,\ell} \in \{0,1\}\\
&  v_{v,r,\ell} \ge 0

With these lengthy reformulations we end up with a linear mixed-integer program. This means we can use standard MIP solvers instead of a global nonlinear solver. In addition, MIP formulations often perform better than their non-linear counterparts. The first MIP formulation seems to work much better than the second.

----    136 PARAMETER report  

                    obj        time       nodes

minlp.baron     244.000       2.120      58.000
mip1.cplex      244.000       1.218     673.000
mip2.cplex      244.000      12.063    3382.000

The linearization is somewhat convoluted, so having available a smaller nonlinear model can help in constructing and debugging these models. As is the case in database design (logical vs. physical design) and programming (premature optimization), starting with a simpler model that focuses on concepts is often helpful in developing optimization models.


  1. Similar to cutting stock problem but not quite,
  2. Multiplication of a continuous and a binary variable,
  3. Gislaine Mara Melega, Silvio Alexandre de Araujo and Raf Jans, Comparison of MIP Models for the Integrated Lot-Sizing and One-Dimensional Cutting Stock Problem, Pesquisa Operacional (2016) 36(1): 167-196.

Thursday, December 14, 2017

Permuted Matrix Balancing

In [1] the following problem is posed:


I am trying to generate a matrix of numbers with 7 rows and 4 columns. Each row must sum to 100 and each column must have an even spread (if permitted) between a min and max range (specified below).
       C1      C2    C3    C4   sum   range 
 1     low                      100    ^
 2     ..                              |  
 3     ..                              |
 4     ..                              | 
 5     ..                              |
 6     ..                              |
 7     high                            _

c1_high = 98
c1_low = 75
c2_high = 15
c2_low = 6
c3_high = 8
c3_low = 2
c4_low = 0.05
c4_high =0.5
In addition to this, i need the spread of each row to be as linear as possible, though a line fitted to the data with a second order polynomial would suffice (with an r^2 value of >0.98).
Note: this description seems to hint the values in the columns must be ordered from \(L_j\) to \(U_j\). That is actually not the case: we don't assume any ordering.

Solution Outline

This problem somewhat resembles a matrix balancing problem. Let's see how we can model this.

One approach to attack this problem is:
Step 1: Create data with perfect spread
It is easy to generate data with perfect spread wile ignoring that the row sums are 100. We just start at \(L_j\) and increase a by fixed step size:

----     40 PARAMETER L  low

c1 75.000,    c2  6.000,    c3  2.000,    c4  0.050

----     40 PARAMETER U  high

c1 98.000,    c2 15.000,    c3  8.000,    c4  0.500

----     40 PARAMETER init  initial matrix

             c1          c2          c3          c4      rowsum

r1       75.000       6.000       2.000       0.050      83.050
r2       78.833       7.500       3.000       0.125      89.458
r3       82.667       9.000       4.000       0.200      95.867
r4       86.500      10.500       5.000       0.275     102.275
r5       90.333      12.000       6.000       0.350     108.683
r6       94.167      13.500       7.000       0.425     115.092
r7       98.000      15.000       8.000       0.500     121.500

This step is just data manipulation.
Step 2: Permute values in each column
The second step is to permute the values within a column to achieve a solution that has a close match to the row sums. This gives:

----     75 VARIABLE y.L  after permutation

            c1          c2          c3          c4      rowsum

r1      86.500      10.500       5.000       0.275     102.275
r2      94.167       7.500       3.000       0.125     104.792
r3      98.000       6.000       2.000       0.050     106.050
r4      90.333       9.000       4.000       0.200     103.533
r5      82.667      12.000       6.000       0.350     101.017
r6      75.000      15.000       8.000       0.500      98.500
r7      78.833      13.500       7.000       0.425      99.758

We are closer to the row sums as expected.

One way to model a permutation of a column vector \(a\) inside an optimization model is to find a square permutation matrix \(P\) [2] with

\[\begin{align}&y_i = \sum_k P_{i,k}a_k\\ &\sum_k P_{i,k}=1&\forall i \\&\sum_i P_{i,k} = 1&\forall k\\& P_{i,k}\in\{0,1\}\end{align}\]

A permutation matrix has exactly one one in each row and in each column. The rest of the elements are zero. One could say a permutation matrix is a (row or column) permuted identity matrix. The constraints can also be interpreted as assignment constraints (related to the assignment problem).

Step 3: Add minimal relative deviations
The final step is to find small deviations from \(y\) such that the row sums are exactly 100.

----     75 VARIABLE x.L  final values

            c1          c2          c3          c4      rowsum

r1      84.265      10.467       4.993       0.275     100.000
r2      89.460       7.431       2.984       0.125     100.000
r3      92.057       5.912       1.980       0.050     100.000
r4      86.863       8.949       3.988       0.200     100.000
r5      81.668      11.985       5.997       0.350     100.000
r6      76.473      15.022       8.005       0.500     100.000
r7      79.071      13.503       7.001       0.425     100.000

----     75 VARIABLE d.L  deviations

            c1          c2          c3          c4

r1      -2.235      -0.033      -0.007 -2.25855E-5
r2      -4.707      -0.069      -0.016 -4.75702E-5
r3      -5.943      -0.088      -0.020 -6.00626E-5
r4      -3.471      -0.051      -0.012 -3.50779E-5
r5      -0.999      -0.015      -0.003 -1.00932E-5
r6       1.473       0.022       0.005 1.489155E-5
r7       0.237       0.003 7.931220E-4 2.399194E-6

For this I used a relative measure: minimize the sum of squared relative deviations:

\[\min \sum_{i,j} \left( \frac{d_{i,j}}{\mu_j} \right)^2 \]

where \(\mu_j = (L_j +U_j)/2\). This is to ensure that the deviations are distributed according to the average magnitude of a column. This is to make sure column 1 gets larger deviations than column 4.

A complete model

I believe we need to do steps 2 and 3 simultaneously to find a global optimal solution. Here is an attempt:

This is a convex MIQP model, so we can solve this with high-performance solvers like Cplex and Gurobi.


  1. We could split the model in two phases (step 2 and step 3 in the above description). I am not sure how this would impact the optimal solution. Also it would not really simplify things.
  2. This model can be linearized by using a different objective: minimize the sum of the absolute values of the (relative) deviations:
    \[\min\>\sum_{i,j}\left|  \frac{d_{i,j}}{\mu_j} \right|\]
    You would end up with a linear MIP model. Possible formulations are variable splitting
    \[\begin{align}\min \> & \sum_{i,j} v_{i,j}^{\oplus} + v_{i,j}^{\ominus}\\&v_{i,j}^{\oplus} - v_{i,j}^{\ominus} = \frac{d_{i,j}}{\mu_j}\\&v_{i,j}^{\oplus}, v_{i,j}^{\ominus}\ge 0 \end{align}\]
    or bounding:
    \[\begin{align}\min\>&\sum_{i,j} v_{i,j}\\&-v_{i,j} \le \frac{d_{i,j}}{\mu_j}\le v_{i,j}\end{align}\] 
  3. Instead of dividing by the average column value \(\mu_j\) we can divide by the "real" value \(y_{i,j}\). Unfortunately, that would make this a general MINLP model. Note that we should "protect" the division, e.g. by the bound \(y_{i,j}\ge L_j\).


  1. Algorithm for generation of number matrix with specified boundaries,
  2. Permutation Matrix,