Friday, June 21, 2019

Special assignment problem

Problem Description

This is adapted from [1].

Assume we have data \(a_i\) and \(b_j\):

----     11 PARAMETER a  

i1  0.172,    i2  0.843,    i3  0.550,    i4  0.301,    i5  0.292,    i6  0.224,    i7  0.350,    i8  0.856,    i9  0.067
i10 0.500

----     11 PARAMETER b  

j1  0.998,    j2  0.621,    j3  0.992,    j4  0.786,    j5  0.218,    j6  0.676,    j7  0.244,    j8  0.325,    j9  0.702
j10 0.492,    j11 0.424,    j12 0.416,    j13 0.218,    j14 0.235,    j15 0.630

Try to find an assignment of each \(a_i\) to a unique \(b_j\) such that the quantities \[ \frac{a_i}{b_j}\] are as close to each other as possible.

This is more or less an assignment problem. The objective to make all ratios \(a_i/b_j\) as equal as possible, makes it interesting. I'll discuss some formulations for this problem.

A first model

We first introduce assignment variables: \[x_{i,j} = \begin{cases} 1 & \text{if $a_i$ is assigned to $b_j$} \\ 0 & \text{otherwise}\end{cases}\] The standard assignment constraints apply \[\begin{align}&\sum_i x_{i,j}\le 1&&\forall j \\&\sum_j x_{i,j} = 1&&\forall i  \end{align}\] This is an "unbalanced" assignment: we have more \(j\)'s than \(i\)'s.

To model "all about equal", we introduce a continuous variable \(c\) that represents the common value. I.e. \[\frac{a_i}{b_j} \approx c \>\>\text{for all  $(i,j)$ such that $x_{i,j}=1$}\]

The objective is \[\min \sum_{i,j}  \left[ x_{i,j} \left( \frac{a_i}{b_j} - c\right) \right]^2 \] or  \[\min \sum_{i,j} x_{i,j} \left( \frac{a_i}{b_j} - c\right)^2 \]

A complete model can look like:

\[\begin{align} \min & \sum_{i,j} \color{darkred}x_{i,j} \left( \frac{\color{darkblue}a_i}{\color{darkblue}b_j} - \color{darkred}c\right)^2 \\ & \sum_j \color{darkred} x_{i,j} = 1 &&\forall i\\ & \sum_i \color{darkred} x_{i,j} \le 1 &&\forall j\\ & \color{darkred}x_{i,j} \in \{0,1\}\\ & \color{darkred}c \text{ free} \end{align}\]

When we run this model, using an MINLP solver, we see:

----     33 VARIABLE x.L  assignment

             j1          j3          j4          j5          j7          j8          j9         j10         j11         j12

i1                                                        1.000
i2                    1.000
i3                                1.000
i4                                                                                                        1.000
i5                                                                                                                    1.000
i6                                                                    1.000
i7                                                                                            1.000
i8        1.000
i9                                            1.000
i10                                                                               1.000

----     33 VARIABLE c.L                   =        0.695  common value
            VARIABLE z.L                   =        0.201  objective

The assignment can be depicted as follows.

Solution. Matrix values are a(i)/b(j).
In the above matrix the cell values are \[\text{cell}_{i,j} = \frac{a_i}{b_j}\]

A linear model

We can approximate the quadratic weighting of the residuals by absolute values: \[\sum_{i,j}  \left| x_{i,j} \left( \frac{a_i}{b_j} - c \right) \right| \] The first thing to is to linearize the term \(x_{i,j} c\). We introduce variables \[y_{i,j} = x_{i,j} c\] We can form the implications: \[\begin{align} & x_{i,j} = 0 \Rightarrow y_{i,j} = 0\\ &x_{i,j} = 1 \Rightarrow y_{i,j} = c\end{align}\] Most solvers support indicator constraints, which allows us to implement these implications as is. If we have a solver without this, we can formulate this as a bunch of big-\(M\) constraints: \[\begin{align} & - M x_{i,j} \le y_{i,j} \le M x_{i,j} \\ & c-M(1-x_{i,j}) \le y_{i,j} \le c+M(1-x_{i,j})\end{align}\] From the data we can assume \(a_{i}\ge 0\) and \(b_{j}\gt 0\). We can exploit this: \[\begin{align} & 0 \le y_{i,j} \le M_1 x_{i,j} \\ & c-M_2(1-x_{i,j}) \le y_{i,j} \le c+M_2(1-x_{i,j})\end{align}\] We can think a bit about good values for \(M_1\) and \(M_2\). I suggest: \[M_1 = M_2 = \max c = \max_{i,j} \frac{a_i}{b_j}\]

The complete model can look like

MIP Model
\min & \sum_{i,j} \color{darkred}r_{i,j} \\
        & \sum_j \color{darkred} x_{i,j} = 1 && \forall i\\
        & \sum_i \color{darkred} x_{i,j} \le 1 && \forall j\\
        &  -\color{darkred}r_{i,j} \le \color{darkred} x_{i,j} \frac{\color{darkblue} a_i}{\color{darkblue} b_j} - \color{darkred} y_{i,j} \le \color{darkred}r_{i,j} &&\forall i,j\\
       &  \color{darkred} y_{i,j} \le \color{darkblue} M_1 \color{darkred} x_{i,j}  &&\forall i,j\\ & \color{darkred} c - \color{darkblue} M_2 (1- \color{darkred} x_{i,j}) \le \color{darkred} y_{i,j} \le \color{darkred} c + \color{darkblue} M_2 (1- \color{darkred} x_{i,j})&&\forall i,j \\ & \color{darkred}x_{i,j} \in \{0,1\}\\ & \color{darkred}c \ge 0 \\ & \color{darkred}y_{i,j} \ge 0\\ & \color{darkred}r_{i,j} \ge 0\\ \end{align}\]

The results look like:

----     72 VARIABLE x.L  assignment

             j1          j3          j4          j5          j7          j8          j9         j10         j11         j12

i1                                                        1.000
i2                    1.000
i3                                1.000
i4                                                                                                        1.000
i5                                                                                                                    1.000
i6                                                                    1.000
i7                                                                                            1.000
i8        1.000
i9                                            1.000
i10                                                                               1.000

----     72 VARIABLE c.L                   =        0.711  common value

----     72 VARIABLE y.L  c*x(i,j)

             j1          j3          j4          j5          j7          j8          j9         j10         j11         j12

i1                                                        0.711
i2                    0.711
i3                                0.711
i4                                                                                                        0.711
i5                                                                                                                    0.711
i6                                                                    0.711
i7                                                                                            0.711
i8        0.711
i9                                            0.711
i10                                                                               0.711

----     72 VARIABLE r.L  abs(residuals)

             j1          j3          j4          j5          j7          j8          j9         j10         j12

i1                                                        0.006
i2                    0.139
i3                                0.010
i5                                                                                                        0.009
i6                                                                    0.021
i7                                                                                      6.137042E-4
i8        0.147
i9                                            0.402
i10                                                                               0.002

The model results are very close. The assignments are the same. Just the \(c\) value and resulting sum of squared or absolute residuals are somewhat different.

----     77 PARAMETER sumdev  sum of squared/absolute deviations

                  dev^2       |dev|           c

minlp model       0.201       0.784       0.695
mip model         0.204       0.737       0.711

A different approach

We can look at the problem in a different way. Instead of minimize the spread around a central value \(c\), we can minimize the bandwidth directly. I.e. we can write: \[\begin{align} \min\> & \mathit{maxv} - \mathit{minv} \\ & \mathit{minv} \le \frac{a_i}{b_j} && \forall x_{i,j}=1\\& \mathit{maxv} \ge \frac{a_i}{b_j} && \forall x_{i,j}=1 \end{align}\] These constraints can be implemented as indicator constraints \[\begin{align} & x_{i,j}=1 \Rightarrow \mathit{minv} \le \frac{a_i}{b_j}\\ & x_{i,j}=1 \Rightarrow \mathit{maxv} \ge \frac{a_i}{b_j}\end{align}\] If we don't have indicator constraints available we need to resort to big-\(M\) constraints:

Alternative MIP Model
\[\begin{align} \min\> & \color{darkred}{\mathit{maxv}} - \color{darkred}{\mathit{minv}} \\ & \sum_j \color{darkred} x_{i,j} = 1 && \forall i \\ & \sum_i \color{darkred} x_{i,j} \le 1 && \forall j \\ & \color{darkred}{\mathit{minv}} \le \frac{\color{darkblue}a_i}{\color{darkblue}b_j} + M (1-\color{darkred} x_{i,j}) && \forall i,j \\ & \color{darkred}{\mathit{maxv}} \ge \frac{\color{darkblue}a_i}{\color{darkblue}b_j} - M (1-\color{darkred} x_{i,j}) && \forall i,j \\ & \color{darkred}x_{i,j} \in \{0,1\} \\ & \color{darkred}{\mathit{maxv}},\color{darkred}{\mathit{minv}} \text{ free} \end{align}\]

The solution looks like:

----    103 VARIABLE x.L  assignment

             j1          j2          j3          j4          j5          j6          j9         j11         j14         j15

i1                                                                                                        1.000
i2                                1.000
i3                                                                                1.000
i4                                                                                                                    1.000
i5                                                                    1.000
i6                                                                                            1.000
i7                    1.000
i8        1.000
i9                                                        1.000
i10                                           1.000

----    103 VARIABLE minv.L                =        0.308  minimum value
            VARIABLE maxv.L                =        0.858  maximum value

A picture of this solution:

Minimize bandwidth solution

The disadvantage of this method is that it does not care about assignments as long as they are inside our optimal bandwidth. We can see this if we recalculate an optimal central \(c\) value afterwards. The results with this reconstructed \(c\) value:

----    114 PARAMETER sumdev  sum of squared/absolute deviations

                  dev^2       |dev|           c

minlp model       0.201       0.784       0.695
mip model         0.204       0.737       0.711
mip model 2       0.313       1.548       0.617

Indeed, we see that the total sum of squared or absolute deviations is not as good as we saw before. The same thing can be seen by just calculating the standard deviation for the selected ratios \(a_i/b_j\). This gives:

----    130 PARAMETER stdev  

minlp model 0.149,    mip model   0.149,    mip model 2 0.186

Again we see this last approach leads to more variability.

Distribution of selected a(i)/b(j)

This model is simpler than the earlier MIP model, but we pay a price: the solution is not as highly concentrated around the central point.

Non-unique \(b_j\)

If we allow the \(b_j\) to be reused, only a small change in the models is needed. We just need to drop the constraint \[\sum_i x_{i,j} \le 1 \] When we do this, we may see a solution like:

Solution. Allow a column to be selected multiple times.


This is turned out to be an interesting problem. A high-level MINLP problem is presented first. We can linearize the model into a MIP, but this becomes a bit messy. Modern MIP solvers offer tools (indicator constraints, built-in absolute value function) that can help here. Finally an alternative, simpler MIP model is proposed. However, this model produces solutions that are more dispersed. 


Thursday, June 20, 2019

Assignment: Scipy vs Cplex


I have never used this routine [1], so lets give this a try. This is a Python implementation of the Hungarian method [2]. The call is exceedingly simple: just give it the cost matrix.

The algorithm allows unbalanced assignment problems by just specifying a rectangular cost matrix. Typically, unbalanced problems are transformed into balanced ones by adding source or destination nodes.

I believe the description is slightly inaccurate:

Description from

This would allow zero assignments, i.e. \(x_{i,j}=0\). It is not always easy to create a watertight English description.

Let's try a problem.

1000x1000 dense assignment problem

When we feed a random problem with 1,000 sources and 1,000 destinations, we see the following timings:

We first read in the data (organized as lists). The second step is to convert this to a NumPy array (otherwise this is done automatically by linear_sum_assignment). Then we solve the problem. On my laptop this takes about 325 seconds.

We can formulate this problem as an LP:\[\begin{align}\min \> & \sum_{i,j} c_{i,j} x_{i,j}\\ & \sum_j x_{i,j} = 1 && \forall i \\ & \sum_i x_{i,j} =1 && \forall j \\ & x_{i,j} \ge 0 \end{align}\]

When we feed this into Cplex we see:

Parallel mode: deterministic, using up to 4 threads for concurrent optimization.
Tried aggregator 1 time.
LP Presolve eliminated 2 rows and 1 columns.
Reduced LP has 1999 rows, 1000000 columns, and 1999000 nonzeros.
Presolve time = 5.95 sec. (770.92 ticks)
Initializing dual steep norms . . .

Iteration log . . .
Iteration:     1   Dual objective     =             1.007485
Iteration:  1113   Dual objective     =             1.661162

Dual simplex solved model.

LP status(1): optimal
Cplex Time: 14.91sec (det. 2309.10 ticks)

Cplex tries several LP algorithms in parallel (concurrent optimization). It turns out Dual Simplex wins. The total solution time is about 15 seconds. So Cplex is 20 times as fast.

We can actually improve on this a bit. This is a network problem, so we can use the Network solver. In addition, we disable the presolver (it is somewhat expensive and does little for this problem).

Extracted network with 2002 nodes and 1000001 arcs.
Extraction time = 0.63 sec. (49.15 ticks)

Iteration log . . .
Iteration:     0   Infeasibility     =          2000.000000 (2000)
Iteration: 10000   Infeasibility     =             0.000000 (7.62808)
Iteration: 20000   Objective         =             5.108089
Iteration: 30000   Objective         =             3.842909
Iteration: 40000   Objective         =             3.147366
Iteration: 50000   Objective         =             2.622157
Iteration: 60000   Objective         =             2.275971
Iteration: 70000   Objective         =             1.990956
Iteration: 80000   Objective         =             1.790689
Elapsed time = 1.33 sec. (82.13 ticks)
Iteration: 90000   Objective         =             1.684441

Network - Optimal:  Objective =    1.6701729950e+00
Network time = 1.67 sec. (119.82 ticks)  Iterations = 92769 (10000)
LP status(1): optimal
Cplex Time: 2.66sec (det. 236.01 ticks)

Now we need only less than 3 seconds (Cplex is 120 times as fast as Scipy).

We would normally expect a specialized assignment solver to do better than a general purpose LP solver. However, Cplex is a highly polished product and we run on the bare metal. Using Python and a simpler implementation just causes us to lose. Note that there are specialized solvers that are much faster.

Some solvers in Scipy are a little bit underwhelming. Examples are the linear programming simplex solver and, as we saw here, the assignment problem solver.


  1. scipy.optimize.linear_sum_assignment,
  2. Harold W. Kuhn. The Hungarian Method for the assignment problem. Naval Research Logistics Quarterly, 2:83-97, 1955.

Wednesday, June 12, 2019

Machine Scheduling

Here [1] is a problem with multiple machines:
  • There are 3 machines and 4 jobs in this example
  • Not all machines can do all jobs
  • We have a few precedence constraints
  • Given known processing times, find a schedule that minimizes the makespan (the time the last job is finished)
Assume the following data:

----     63 --------------- data ---

----     63 SET j  jobs

job1,    job2,    job3,    job4

----     63 SET m  machines

machine1,    machine2,    machine3

----     63 SET ok  allowed job/machine combinations

        machine1    machine2    machine3

job1         YES         YES
job2         YES         YES
job3         YES                     YES
job4                                 YES

----     63 PARAMETER proctime  processing time

job1  4.000,    job2  2.000,    job3 10.000,    job4 12.000

----     63 SET prec  precedence constraints

            job2        job4

job1         YES         YES

There are many ways to model this. Some approaches can be:

  • Discrete time using time slots. This leads to binary variables \[x_{j,m,t} = \begin{cases} 1 & \text{if job $j$ executes on machine $m$ at time $t$}\\ 0 & \text{otherwise}\end{cases}\]
  • Continuous time with a binary variable indicating if job \(j_1\) is executed before job \(j_2\) (on the same machine)
  • Continuous time with a binary variable indicating if job \(j_1\) immediately precedes job \(j_2\) (on the same machine)  
Let's try the second approach. First we need a planning horizon. A simplistic way to find a planning horizon \(T\) is just to schedule each job after another: \[T = \sum_j \mathit{proctime}_j \] For our small data set, this gives:

----     63 PARAMETER T                    =       28.000  max time

This \(T\) is used as a big-\(M\) constant, so we would like it to be as small as possible. For very large problems, we often use some heuristic to find an initial solution, and use this to set the planning horizon \(T\). If that is not possible, we may want to use indicator constraints, so we don't need any \(T\).

Decision Variables

We have two blocks of binary variables. One dealing with assigning jobs to machines and a second one about ordering of jobs on the same machine. The latter is handled by implementing no-overlap constraints. For this we need a binary variable indicating if job \(j_1\) is before or after job \(j_2\).

Binary variables
\[\color{DarkRed}{\mathit{assign}}_{j,m} = \begin{cases} 1 & \text{if job $j$ is placed on machine $m$}\\ 0 & \text{otherwise}\end{cases}\] \[\color{DarkRed}{\mathit{after}}_{j_1,j_2} = \begin{cases} 1 & \text{if job $j_1$ is executed after job $j_2$ when placed on the same machine}\\ 0 & \text{if job $j_1$ is executed before job $j_2$ when placed on the same machine}\end{cases}\]

Note that the variable \(\mathit{after}_{j_1,j_2}\) will be used only for \(j_1 \lt j_2\). This is to avoid double checking the same pair.
We also have a few continuous variables:

Continuous variables
\[\begin{align} & \color{DarkRed}{\mathit{start}}_{j} \ge 0 && \text{start time of job $j$}\\ & \color{DarkRed}{\mathit{finish}}_{j} \ge 0 && \text{finish time of job $j$}\\ & \color{DarkRed}{\mathit{makespan}} \ge 0 && \text{last finish time} \end{align}\]


The model can look like:

Mixed Integer Programming Model
\[\begin{align} \min\> & \color{DarkRed}{\mathit{makespan}} \\ & \color{DarkRed}{\mathit{makespan}} \ge \color{DarkRed}{\mathit{finish}}_j && \forall j \\ & \color{DarkRed}{\mathit{finish}}_j = \color{DarkRed}{\mathit{start}}_j + \color{DarkBlue}{\mathit{proctime}}_j && \forall j\\ & \sum_m \color{DarkRed}{\mathit{assign}}_{j,m} = 1 && \forall j\\ & \color{DarkRed}{\mathit{start}}_{j_1} \ge \color{DarkRed}{\mathit{finish}}_{j_2} - \color{DarkBlue}T (1-\color{DarkRed}{\mathit{after}}_{j_1,j_2}) - \color{DarkBlue}T (1-\color{DarkRed}{\mathit{assign}}_{j_1,m})- \color{DarkBlue}T (1-\color{DarkRed}{\mathit{assign}}_{j_2,m}) && \forall m, j_1 \lt j_2 \\ & \color{DarkRed}{\mathit{start}}_{j_2} \ge \color{DarkRed}{\mathit{finish}}_{j_1} - \color{DarkBlue}T \color{DarkRed}{\mathit{after}}_{j_1,j_2} - \color{DarkBlue}T (1-\color{DarkRed}{\mathit{assign}}_{j_1,m})- \color{DarkBlue}T (1-\color{DarkRed}{\mathit{assign}}_{j_2,m}) && \forall m, j_1 \lt j_2 \\ & \color{DarkRed}{\mathit{start}}_{j_2} \ge \color{DarkRed}{\mathit{finish}}_{j_1} && \forall \color{DarkBlue}{\mathit{prec}}_{j_1,j_2} \\ & \color{DarkRed}{\mathit{assign}}_{j,m} = 0 && \forall \text{ not } \color{DarkBlue}{\mathit{ok}}_{j,m} \end{align}\]

The no-overlap constraints are the most complicated in this model. If the problem was a single machine scheduling problem, the no-overlap constraints would look like: \[\mathit{start}_{j_1} \ge \mathit{finish}_{j_2} \text{ or } \mathit{start}_{j_2} \ge \mathit{finish}_{j_1} \>\> \forall j_1 \lt j_2\] Note that we only need to compare jobs \(j_1 \lt j_2\) (each pair needs to be checked only once). The OR condition can be implemented using a binary variable, and big-\(M\) constraints: \[\begin{align} & \mathit{start}_{j_1} \ge \mathit{finish}_{j_2} - T (1-{\mathit{after}}_{j_1,j_2}) && \forall j_1 \lt j_2 \\ & \mathit{start}_{j_2} \ge \mathit{finish}_{j_1} - T {\mathit{after}}_{j_1,j_2} && \forall j_1 \lt j_2 \end{align}\] i.e. job \(j_1\) executes before or after job \(j_2\), but not in parallel. When we have multiple machines, we only want to keep this constraint active when jobs \(j_1\) and \(j_2\) are assigned to the same machine. That is: jobs are allowed to execute in parallel when they are on different machines. In a sense, we have built some nested big-\(M\) constraints.


After running this model we get as results:

            --------------- solution -----

----     66 VARIABLE assign.L  assign job to machine

        machine1    machine2    machine3

job1                   1.000
job2                   1.000
job3       1.000
job4                               1.000

----     66 VARIABLE start.L  job start time

job2 4.000,    job4 4.000

----     66 VARIABLE finish.L  job finish time

job1  4.000,    job2  6.000,    job3 10.000,    job4 16.000

----     66 VARIABLE makespan.L            =       16.000  time last job is finished

----     66 VARIABLE after.L  j1 is scheduled after j2 on machine m


job1       1.000

How should we interpret the variable \(\mathit{after}\)? Only jobs 1 and 2 are on the same machine and for this combination we have \(\mathit{after}_{1,2}=0\). This means job1 is not after job2, and thus: job2 must be after job1. This is indeed what is happening. The values for \(\mathit{after}\) for jobs not on the same machine are basically just random: they are not used in active constraints. This means: the printed value \(\mathit{after}_{1,3}=1\) is not relevant as jobs 1 and 3 are on different machines.

Graphically this looks like:

We see that this solution obeys the precedence constraints (jobs 2 and 4 after job 1). Furthermore, it only assigns jobs to allowed machines (e.g. job 4 can only run on machine 3).

Larger data set

The above data set is very small. Such a data set is great during development. A beginners mistake I often see is using a large data set during model development. It is much more convenient and efficient using a very small data set when things are in flux, and you do many runs just to get the model right.

Now, it is important to see how the model behaves with a large data set. I used random data with 8 machines and 50 jobs. For this problem we only needed a few seconds to solve to guaranteed optimality. The model had about 20k constraints, 1,700 variables (of which 1,400 binary variables).

solution of 8 machine, 50 job problem with random data

The performance seems to be quite reasonable. The main problem is that for large models the number of constraints becomes big, and the big-\(M\) constraints may not be as tight as we want. But for a first formulation, not so bad.


Tuesday, May 28, 2019

Bad question?

In [1] a user asks:

I need to build a MILP (Mixed integer linear programming) constraint form this if-else statement: with beta is a constant.
if (a > b) then c = beta else c = 0
How can I build the statement to MILP constraint. Are there any techniques for solving this problem. Thank you.

In my opinion this is a difficult question to answer. To put it bluntly: this is a poor question because we lack lots of information.

  1. We are not sure which identifiers are decision variables and which ones are parameters. It is stated that beta is a constant, so probably we can deduce that all other identifiers refer to variables. In general, it is important to state these things explicitly.
  2. For the variables: we don't know the types (binary, integer, positive, free).
  3. We don't see the rest of the model. In its most general form, this if-construct is a somewhat difficult thing to handle. In practice however, we can almost always exploit knowledge from the model to simplify things considerably. I almost never encounter models where we have to handle this construct in its most general form. 
  4. Expanding on the previous point: one thing to look at is how the objective behaves. It may push variables in a certain direction, which we may exploit. Often this leads to substantial simplifications. In many cases we can drop an implication (e.g. drop the else part).
  5. We don't know what solver or modeling system is being used. Some tools have good support for implications (a.k.a. indicator constraints), which can make things much easier. In other cases, we may need to use big-M constraints.
  6. If  \(a\), \(b\) and \(c\) are (free) continuous variables, the condition \(\gt\) is a bit ambiguous. Do we really mean \(\gt\) or can we use \(\ge\)? In my models, I tend to use \(\ge\) and make \(a=b\) ambiguous: we can pick the best decision for this case (I don't want to skip a more profitable solution because of some epsilon thing). 
  7. Using assumptions in the previous point, we can write \[\begin{align} &\delta = 1 \Rightarrow a \ge b\\ & \delta=0 \Rightarrow a \le b\\ & c = \delta \cdot \beta\\ & \delta \in \{0,1\}\end{align}\] Note that the \(c\) constraint is linear: \(\beta\) is a parameter.
  8. This can be translated into  \[\begin{align} & a \ge b - M(1-\delta)\\ & a \le b + M\delta \\ & c = \delta \cdot \beta\\ & \delta \in \{0,1\}\end{align}\] To determine good values for big-\(M\) constants, again, we need to know more about the model.
  9. If you insist on \(a\gt b\), we can introduce some tolerance \(\varepsilon>0\) and write: \[\begin{align} &\delta = 1 \Rightarrow a \ge b + \varepsilon \\ & \delta=0 \Rightarrow a \le b\\ & c = \delta \cdot \beta\\ & \delta \in \{0,1\}\end{align}\] Here \(\varepsilon\) should be larger than the feasibility tolerance of the solver (scaling may make this not completely obvious). Note that we effectively create a "forbidden region". The variable \(a\) can not assume any value in the interval \((b,b+\varepsilon)\) (again, subject to feasibility tolerances).
  10. Of course when we have integer variables \(a \gt b\) is much more well-defined and we can interpret that as \(a \ge b+1\).

So the best answer is: I would need to look at the whole model to give a good answer.

These type of questions are quite common, and answering them is just very difficult. You can not expect a reply that enumerates all possible answers for all possible cases.


  1. Build MILP constraint from if-else statements,

Bad naming?

Example C++ code

The above fragment is from [1]. I never write loops like this. I use \(n\) for limits or counts, but never for a loop index.

Looking at this, I realized I have many of these "rules". Such as:

  1. \(x\), \(y\), \(z\), and \(v\), \(w\) are always double precision variables. (I used to subtract points if a student would write
        for (int x=0; ... ).
  2. \(i\), \(j\), \(k\) and \(m\), \(n\) are always integer variables.
  3. Never use \(l\) (i.e. \(\ell\)) as variable name, it is too close to the digit 1 (one).
  4. Don't use short integers (unless for a specific reason) or single precision variables.
  5. Use \(i\),\(j\),\(k\) as loop indices in a predictable way (e.g. for a (sparse) matrix: \(i\) for rows, \(j\) for columns, \(k\) for things like nonzero elements).
  6. The previous rule also applies to AMPL which uses local index names. E.g. after declaring
       param f_max {j in FOOD} >= f_min[j]; 
    I always use j for FOOD
  7. Use short names for items (variables) that are often used and for locally declared indices. Use long names for items that are sparsely used. I call this Huffman-code [2] naming. 
I am so used to this, that code that disobeys this in a flagrant way, just hurts my eyes. I find that, if I follow these simple rules, reading code is easier. It minimizes the surprise factor. Of course, writing code is for consumption by a compiler (or other tool), but more importantly: for consumption by a human reader.

So, that loop should look like:

const int n = 10;
for (int i = 0; i < n; ++i) {...}



Tuesday, May 14, 2019

Ordering of variables / constraints in an LP/MIP

In [1] a user asks about how to force a certain ordering of variables for an LP/MIP model. This is an intriguing question for me because I really never worry about this.

  • The user asks only about variable ordering. Of course there is a symmetry here: the same question can be asked for equations. Well, equations correspond to slack variables, so this is essentially the same.
  • A different ordering may (will?) cause a different solution path, so you can expect different solution times and iteration counts. Of course, this is in a very unpredictable way, so not something we can exploit easily. Especially in MIPs we have this concept of performance variability [2]: minor changes in the model can cause significant different solution times.
  • Why would you really worry about the ordering? There are three cases I can think of: (1) access variables by index number [I tend to use higher-level tools where this is not needed], (2) when implementing some decomposition scheme [same: in higher-level tools we can index by names], or (3) when plotting the non-zero elements in the A matrix [I never do this; I find this is not adding much insight into the model].  In practice, I never worry about the ordering. We have already many things to worry about when building LP/MIP models; this should not be one of them. 

I don't understand why one would want to spend time on this.


  2. Andrea Lodi, Andrea Tramontani. Performance Variability in Mixed-Integer Programming. In INFORMS TutORials in Operations Research. Published online: 14 Oct 2014; 1-12,

Cutting Application

Version 1 of cutting application.

The algorithms are a small part of the whole application.  Some of the issues:

  1. Users are not always able to "specify" in detail what they want in advance. Building prototypes can help: giving feedback about a demo is easier than writing detailed specs in advance.
  2. Most code is not dedicated to the algorithm itself, but rather to surrounding things. I estimate that the algorithms cover about 20% of the code. For instance, reporting was a substantial effort here.

In the application, I draw on a canvas, but we can export to Excel:

Excel version of output

or print to a PDF file:

PDF version of output

Anaconda Python Install

The installation of anaconda 64 bit on my windows laptop worked fine. But then all conda and pip commands came back with the fatal error:

Can't connect to HTTPS URL because the SSL module is not available.

I found lots of messages about this. This bug seems to bug users for a long time. Even though some of these reports are closed as "Resolved" by the developers, I encountered this problem just today. Here was my solution (from [1]):

I.e. two DLLs were in the wrong directory.

Immediately after this I noticed a second problem:

A third problem is that conda install sometimes takes forever.

All these problems are not unique to me. Using google, I see lots of other users having the same or similar problems.

I had hoped that this would be a bit less painful. This was a standard install on a standard windows machine. This really should not give all these problems.



Monday, April 8, 2019

Sometimes a simple heuristic is the best

Demoing a cutting stock problem, I was a bit overthinking things. If the material is cheap, the optimal cutting patterns are too complicated. Simplified cutting patterns are better even when this causes additional waste. So, now I am using a simplistic Skyline heuristic and even then add constraints to simplify the cutting.

Now, this is really small potatoes compared to this [1]:


  1. Dominique Thiebaut, 2D-Packing Images on a Large Scale, INFOCOMP 2013 : The Third International Conference on Advanced Communications and Computation

Generating Pareto optimal points (part 2)

In [1] a small multi-objective problem was described. In part 1 of this post [2], I showed how we generate the set of Pareto optimal solutions using a complete enumeration scheme.

Summary of the problem

We want to find all Pareto optimal solutions (i.e. non-dominated solutions) consisting of a set of items (rows in the table below). We have the following data:

----     42 PARAMETER data  %%% data set with 12 items

               cost       hours      people

Rome           1000           5           5
Venice          200           1          10
Torin           500           3           2
Genova          700           7           8
Rome2          1020           5           6
Venice2         220           1          10
Torin2          520           3           2
Genova2         720           7           4
Rome3          1050           5           5
Venice3         250           1           8
Torin3          550           3           8
Genova3         750           7           8

We want to optimize the following objectives:

  1. Maximize number of items selected
  2. Minimize total cost
  3. Minimize total hours
  4. Minimize total number of people needed
In addition we have a number of simple bound constraints:
  1. The total cost can not exceed $10000
  2. The total number of hours we can spend is limited to 100
  3. There are only 50 people available

Non-dominated solutions.

We say that when comparing two feasible solution points \(a\) and \(b\), \(a\) dominates \(b\) if:

  • All the objectives of \(a\) are better or equal to the objectives of \(b\) and
  • for one objective \(k\) we have \(f_k(a)\) is strictly better than \(f_k(b)\).
This means \(a\) does not dominate \(b\) if \(f_k(b)\) is strictly better than \(f_k(a)\) in one objective \(k\). Or if all objectives are the same.


The idea is to generate MIP solutions that have a preference to go to the efficient frontier using some weights, and then add constraints that say: we should be better in at least one objective. I.e. suppose we have collected already solutions \((\bar{x}_{i,s}, \bar{f}_{k,s})\) in previous cycles \(s\), then we require:\[ \begin{align} & \delta_{k,s} = 1 \Rightarrow f_k \text{ is better than } \bar{f}_{k,s} \\ & \sum_k \delta_{k,s} \ge 1 \>\> \forall s\\ & \delta_{k,s}\in \{0,1\}\end{align}\] I.e. our new solution is not dominated by previous record solutions. 

This approach is documented in [3].

Note: I think we are just handwaving a bit about the possibility two solutions with all the same objectives: these are also non-dominated. For our problem this turns out not to be an issue.

Approach 2: MIP models with non-dominated constraints

In this part I will show the GAMS code, that implements this.

2.1 Organize the data

We need to specify the problem data:

'items' /Rome,Venice,Torin,Genova,Rome2,Venice2,
Torin2,Genova2,Rome3,Venice3,Torin3,Genova3 /
'objectives' /cost,hours,people,items/

table data(i,k)
cost   hours  people
Rome     1000    5       5
Venice    200    1      10
Torin     500    3       2
Genova    700    7       8
Rome2    1020    5       6
Venice2   220    1      10
Torin2    520    3       2
Genova2   720    7       4
Rome3    1050    5       5
Venice3   250    1       8
Torin3    550    3       8
Genova3   750    7       8
'items') = 1;

parameter UpperLimit(k) /
cost 10000
hours  100
people  50
* if zero or not listed use INF
upperLimit(k)$(UpperLimit(k)=0) =

parameter sign(k) 'sign: +1:min, -1:max' /
(cost,hours,people) +1
items               -1

2.2 Form Box

I will implement the implications as big-M constraints. This means we want to have small values for our big-M's. To obtain these it is useful to form tight bounds on the objectives \(f_k\).

* Basic Model

parameter c(k) 'objective coefficient';

variable z 'total single obj';
positive variable f(k) 'obj values';
binary variable x(i) 'select item'

'scalarized objective'

scalobj.. z =e=
sum(k, c(k)*f(k));
objs(k)..  f(k) =e=
sum(i, data(i,k)*x(i));

* some objs have an upper limit
f.up(k) = upperLimit(k);

option optcr=0,limrow=0,limcol=0,solprint=silent,solvelink=5;

* find lower and upper limits of each objective

model m0 'initial model to find box' /all/;

parameter fbox(k,*) 'new bounds on objectives';

  c(obj) = 1;
solve m0 minimizing z using mip;
'min') = z.l;
solve m0 maximizing z using mip;
'max') = z.l;
  c(obj) = 0;

display upperLimit,fbox;

* range = max - min
'range') = fbox(k,'max')-fbox(k,'min');

* tighten objectives
f.lo(k) = fbox(k,
f.up(k) = fbox(k,

The first part is just the linear model: we have a scalarizing objective function, and the linear constraint that calculates the objectives \(f_k\).

In the second part we perform a loop, where we both minimize and maximize each \(f_k\).

It is interesting to see the differences between the implicit bounds we calculated here and the original constraints:

----     81 PARAMETER UpperLimit  

cost   10000.000,    hours    100.000,    people    50.000,    items       +INF

----     81 PARAMETER fbox  new bounds on objectives


cost      6810.000
hours       45.000
people      50.000
items        9.000

We first observe that all lower bounds stay at zero. Actually a solution with \(x_i=f_k=0\) is a feasible and even non-dominated solution. However, on the upper bounds we did a really good job. These upper bounds were tightened significantly.

2.3 MOIP algorithm

* MOIP algorithm

'found solutions'

binary variable delta(maxsols,k) 'count solutions that are better';


'big-M values'

M(k) = fbox(k,
scalar tol 'tolerance: improve obj by this much' /1/;

* initialize
solstore(maxsols,i) = 0;

nondominated1(sols,k)$(sign(k)=+1).. f(k) =l= solstore(sols,k)-tol + (M(k)+tol)*(1-delta(sols,k));
nondominated2(sols,k)$(sign(k)=-1).. f(k) =g= solstore(sols,k)+tol - (M(k)+tol)*(1-delta(sols,k));
sum(k, delta(sols,k)) =g= 1;
sum(i, (2*solstore(sols,i)-1)*x(i)) =l= sum(i,solstore(sols,i)) - 1;

* initialize to empty set
sols(maxsols) =

* set weights to 1 (adapt for min/max)
c(k) = sign(k);

scalar ok /1/;

model m1 /all/;

solve m1 minimizing z using mip;

* optimal or integer solution found?
   ok = 1$(m1.modelstat=1
or m1.modelstat=8);

     sols(maxsols) =
     solstore(maxsols,k) = round(f.l(k));
     solstore(maxsols,i) = round(x.l(i));

display solstore;

In this algorithm we look for up to 1000 solutions. We implemented the constraints that make sure \(f_k\) is not-dominated by earlier solutions, and we use an objective that guides the solver towards the efficient frontier. In addition, for no good reason, I also added a constraint that works on the \(x\) space: forbid previously found integer solutions.

Note that in each cycle the MIP problem becomes bigger:

  • we add non-dominated constraints
  • we add an integer cut
  • we add binary variables \(\delta_{s,k}\) related to the non-dominated constraints.

2.4 Solution

This gives the same 83 solutions as we found in [2].

(Solution 0 with all zeroes is not printed here).


  1. Best Combination Algorithm of Complex Data with Multiple Constraints,
  2. Generating Pareto optimal points (part 1),
  3. John Sylva, Alejandro Crema: "A method for finding the set of non-dominated vectors for multiple objective integer linear programs", EJOR 158 (2004), 46-55.