## Sunday, June 30, 2019

### Maximum Dispersion

#### Problem statement

Given $$n$$ points with their distances  $$d_{i,j}$$, select $$k$$ points such that the sum of the distances between the selected points is maximized .

A simple MIQP (Mixed-Integer Quadratic Programming) model is:

Non-convex MIQP Model
\begin{align} \max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}x_{i} \color{darkred}x_{j} \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}x_{i} \in \{0,1\} \end{align}

Notice we only need to look at distances $$d_{i,j}$$ with $$i\lt j$$ as we can assume symmetry. If not, just make $$d_{i,j}$$ symmetric by: $d_{i,j} = \frac{d_{i,j} + d_{j,i}}{2}$

As we shall see this is not such an easy problem to solve to optimality.

This problem is called the $$p$$-dispersion-sum problem.

#### Example data

I generated random $$n=50$$ $$2d$$ points:

----     24 PARAMETER coord  random coordinates

x           y

i1        1.717       8.433
i2        5.504       3.011
i3        2.922       2.241
i4        3.498       8.563
i5        0.671       5.002
i6        9.981       5.787
i7        9.911       7.623
i8        1.307       6.397
i9        1.595       2.501
i10       6.689       4.354
i11       3.597       3.514
i12       1.315       1.501
i13       5.891       8.309
i14       2.308       6.657
i15       7.759       3.037
i16       1.105       5.024
i17       1.602       8.725
i18       2.651       2.858
i19       5.940       7.227
i20       6.282       4.638
i21       4.133       1.177
i22       3.142       0.466
i23       3.386       1.821
i24       6.457       5.607
i25       7.700       2.978
i26       6.611       7.558
i27       6.274       2.839
i28       0.864       1.025
i29       6.413       5.453
i30       0.315       7.924
i31       0.728       1.757
i32       5.256       7.502
i33       1.781       0.341
i34       5.851       6.212
i35       3.894       3.587
i36       2.430       2.464
i37       1.305       9.334
i38       3.799       7.834
i39       3.000       1.255
i40       7.489       0.692
i41       2.020       0.051
i42       2.696       4.999
i43       1.513       1.742
i44       3.306       3.169
i45       3.221       9.640
i46       9.936       3.699
i47       3.729       7.720
i48       3.967       9.131
i49       1.196       7.355
i50       0.554       5.763


Let's try to find $$k=10$$ points that are most dispersed. The distance matrix is formed by calculating Euclidean distances. 10 out of 50 most dispersed points

#### Solve non-convex problem

The above MIQP problem is non-convex. We can solve the non-convex MIQP problem in different ways:

• Throw this into a global solver such as Baron, Couenne or Antigone
• Use a solver like Cplex (option qtolin) or Gurobi (option preQlinearize) that can linearize the problem automatically for us. See further down how we can do this ourselves.
• Instruct Cplex to use a QP formulation (option qtolin=0) and tell it to use the global QP solver (option optimalitytarget=3)

#### Convexification of the quadratic model

There is a trick to make this problem convex. For maximization, we require that the $$Q$$ matrix is negative definite.  In our case the $$Q$$ matrix is our distance matrix. The following algorithm will make the problem convex:

1. Calculate the largest eigenvalue $$\lambda_{max}$$ of the distance matrix $$0.5 D$$. We multiply by 0.5 because we only use half of the matrix in the objective to prevent double counting.
2. If  $$\lambda_{max} \lt 0$$: we are done (matrix is negative-definite)
3. Form the objective: $\max \sum_{i \lt j} d_{i,j} x_{i} x_{j} - \lambda_{max} \sum_i x_i^2 + \lambda_{max} \sum_i x_i$
This essentially imposes a possible large diagonal perturbation of the $$Q$$ matrix. We compensate with a linear term. This uses the fact that $$x_i = x_i^2$$ when $$x_i \in \{0,1\}$$.

#### Linearization

The quadratic model is easily linearized by observing that a well-known trick to handle $$y_{i,j} = x_i x_j$$ is to write: \begin{align} & y_{i,j} \le x_i \\ & y_{i,j} \le x_j \\ & y_{i,j} \ge x_i +x_j-1\\& y_{i,j}, x_i \in \{0,1\}\end{align} A complete model can look like:

Linearized Model 1
\begin{align} \max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}y_{i,j} \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}y_{i,j} \le \color{darkred} x_i && \forall i\lt j \\ & \color{darkred}y_{i,j} \le \color{darkred} x_j && \forall i\lt j \\ & \color{darkred}x_{i} \in \{0,1\} \\ & \color{darkred}y_{i,j} \in [0,1] \end{align}

There are two things that may need some attention:

• The inequality $$y_{i,j}\ge x_i +x_j -1$$ was dropped. The objective will take care of this.
• The variables $$y_{i,j}$$ were relaxed to be continuous between 0 and 1. The $$y$$ variables will be automatically integer (well, where it matters). Often models with fewer integer variables are easier to solve. Modern solvers, however, may reintroduce binary variables for this particular model. In Cplex you can see a message like shown below.

Tried aggregator 1 time.
MIP Presolve eliminated 1 rows and 1 columns.
Reduced MIP has 2451 rows, 1275 columns, and 4950 nonzeros.
Reduced MIP has 50 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.06 sec. (2.79 ticks)
Found incumbent of value 0.000000 after 0.06 sec. (4.54 ticks)
Probing time = 0.02 sec. (0.51 ticks)
Tried aggregator 1 time.
Reduced MIP has 2451 rows, 1275 columns, and 4950 nonzeros.
Reduced MIP has 1275 binaries, 0 generals, 0 SOSs, and 0 indicators.


#### A tighter linearization

In  a tighter formulation is proposed:

Linearized Model 2
\begin{align} \max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}y_{i,j} \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}y_{i,j} \le \color{darkred} x_i && \forall i\lt j \\ & \color{darkred}y_{i,j} \le \color{darkred} x_j && \forall i\lt j \\ & \color{darkred}y_{i,j} \ge \color{darkred} x_i +\color{darkred} x_j -1 && \forall i\lt j \\ & \sum_{i\lt j} \color{darkred}y_{i,j} + \sum_{i\gt j} \color{darkred}y_{j,i} = (\color{darkblue} k-1) \color{darkred}x_j && \forall j \\ & \color{darkred}x_{i} \in \{0,1\} \\ & \color{darkred}y_{i,j} \in \{0,1\} \end{align}

Essentially, we multiplied the constraint $$\sum_i x_{i} = k$$ by $$x_j$$, and added these as constraints. The derivation is as follows: \begin{align} & \left( \sum_i x_i\right) x_j = k x_j\\ \Rightarrow & \sum_{i\lt j} x_i x_j + x_j^2 + \sum_{i\gt j} x_i x_j = k x_j\\ \Rightarrow & \sum_{i\lt j} y_{i,j} + \sum_{i\gt j} y_{j,i} = (k-1) x_j \end{align} We used here that $$x_j^2 = x_j$$.

My intuition is as follows. If $$x_j=1$$ then exactly $$k-1$$ other $$x_i$$'s should be 1. That means $$k-1$$ $$y_{i,j}$$'s should be 1. As we only use the upper-triangular part, the picture becomes: Picture of y layout
Given that $$x_j=1$$, we need to have $$k-1$$ ones in the orange region.

#### Aggregation

We can aggregate the previous cut, which leads to: $\sum_{i\lt j} y_{i,j} = \frac{k(k-1)}{2}$ The advantage of this version is that we only need one extra constraint :

Linearized Model 3
\begin{align} \max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}y_{i,j} \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}y_{i,j} \le \color{darkred} x_i && \forall i\lt j \\ & \color{darkred}y_{i,j} \le \color{darkred} x_j && \forall i\lt j \\ & \color{darkred}y_{i,j} \ge \color{darkred} x_i +\color{darkred} x_j -1 && \forall i\lt j \\ & \sum_{i\lt j} \color{darkred}y_{i,j} = \frac{ \color{darkblue}k (\color{darkblue} k-1)}{2} \\ & \color{darkred}x_{i} \in \{0,1\} \\ & \color{darkred}y_{i,j} \in \{0,1\} \end{align}

The intuition is easy: if we have $$k$$ $$x_j$$'s equal to one, we need to have $$k(k-1)/2$$ $$y_{i,j}$$'s equal to one.

#### Numerical results

I tried these formulations using Cplex with a time limit of 1800 seconds (half an hour).

ModelTimeObjGapNotes
options: qtolin=0, optimalitytarget=3
Non-convex MIQP1800332.912957.65%Automatically converted to linear model.
options: qtolin=1
Convex MIQP 1800332.912992.22%option: qtolin=0
MIP 1 1800332.912957.65%Defaults
MIP 2 8332.9129optimalDefaults (roughly same performance whether or not including $$y_{i,j}\ge x_i+x_j-1$$)
MIP 3 400332.9129optimalDefaults. Excludes $$y_{i,j}\ge x_i+x_j-1$$.
MIP 3 11332.9129optimalDefaults. Includes $$y_{i,j}\ge x_i+x_j-1$$.

Just one observation, but I think these results are representative for other (small) instances.

We dropped the constraint $$y_{i,j}\ge x_i+x_j-1$$ from MIP model 1: the objective will push $$y$$ upwards on its own. For models MIP 2 and 3 it is wise to reintroduce them.

It is noted that although the gaps are terrible for all models without extra cuts. However, some of these methods find the best solution very quickly. They are just not able to prove optimality in a reasonable time. Here is an example:

        Nodes                                         Cuts/
Node  Left     Objective  IInf  Best Integer    Best Bound    ItCnt     Gap

0     0     1235.6817    50                   1235.6817        7
*     0+    0                          332.9129     1235.6817           271.17%
Found incumbent of value 332.912894 after 0.02 sec. (4.67 ticks)
0     2     1235.6817    50      332.9129     1162.0829        7  249.07%
Elapsed time = 0.03 sec. (11.72 ticks, tree = 0.02 MB, solutions = 1)


#### Better dispersion

From the picture of the solution (earlier in this post), we see that this model does not prevent selected points to be close to each other.  A better model may be to maximize the minimum distance between selected points. This can be modeled as:

Maximize Minimum Distance
\begin{align} \max\> & \color{darkred} {\Delta} \\ & \color{darkred} \Delta \le \color{darkblue} d_{i,j} + \color{darkblue} M (1- \color{darkred}x_{i} \color{darkred}x_{j}) && \forall i\lt j \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}x_{i} \in \{0,1\} \\ \end{align}

Here $$M$$ is a large enough constant, e.g. $M = \max_{i\le j} d_{i,j}$ The model can be easily linearized by formulating the distance constraint as  $\Delta \le d_{i,j} + M (1- x_{i}) + M(1- x_{j})$ This model quickly solves our 10 out of 50 problem. It gives as solution: Maximization of minimum distance

This model has as disadvantage that it does not care about points being closer to each other as long as they are further away than the minimum. For this example, visual inspection seems to indicate this model does good job.

#### Conclusion

The $$p$$-dispersion-sum problem is very difficult to solve to optimality, even for very small data sets. Extra cuts can enormously help the linearized version of the model. A main drawback is that the optimal solutions do not provide us with well-dispersed points (points can be very close).

The maximin model solves much easier, and it gives better dispersion: selected points are never close to each other.

#### References

1. How to select n objects from a set of N objects, maximizing the sum of pairwise distances between them, https://stackoverflow.com/questions/56761115/how-to-select-n-objects-from-a-set-of-n-objects-maximizing-the-sum-of-pairwise
2. Warren P. Adams and Hanif D. Sherali, A Tight Linearization and an Algorithm for Zero-One Quadratic Programming Problems, Management Science, Vol. 32, No. 10 (Oct., 1986), pp. 1274-1290
3. Michael J. Kuby, Programming Models for Facility Dispersion: The p‐Dispersion and Maxisum Dispersion Problems, Geographical Analysis, vol. 19, pp.315-329, 1987.
4. Ed Klotz, Performance Tuning for Cplex’s Spatial Branch-and-Bound Solver for Global Nonconvex (Mixed Integer) Quadratic Programs, http://orwe-conference.mines.edu/files/IOS2018SpatialPerfTuning.pdf

## Friday, June 21, 2019

### Special assignment problem

#### Problem Description

Assume we have data $$a_i$$ and $$b_j$$:

----     11 PARAMETER a

i1  0.172,    i2  0.843,    i3  0.550,    i4  0.301,    i5  0.292,    i6  0.224,    i7  0.350,    i8  0.856,    i9  0.067
i10 0.500

----     11 PARAMETER b

j1  0.998,    j2  0.621,    j3  0.992,    j4  0.786,    j5  0.218,    j6  0.676,    j7  0.244,    j8  0.325,    j9  0.702
j10 0.492,    j11 0.424,    j12 0.416,    j13 0.218,    j14 0.235,    j15 0.630


Try to find an assignment of each $$a_i$$ to a unique $$b_j$$ such that the quantities $\frac{a_i}{b_j}$ are as close to each other as possible.

This is more or less an assignment problem. The objective to make all ratios $$a_i/b_j$$ as equal as possible, makes it interesting. I'll discuss some formulations for this problem.

#### A first model

We first introduce assignment variables: $x_{i,j} = \begin{cases} 1 & \text{if a_i is assigned to b_j} \\ 0 & \text{otherwise}\end{cases}$ The standard assignment constraints apply \begin{align}&\sum_i x_{i,j}\le 1&&\forall j \\&\sum_j x_{i,j} = 1&&\forall i \end{align} This is an "unbalanced" assignment: we have more $$j$$'s than $$i$$'s.

To model "all about equal", we introduce a continuous variable $$c$$ that represents the common value. I.e. $\frac{a_i}{b_j} \approx c \>\>\text{for all (i,j) such that x_{i,j}=1}$

The objective is $\min \sum_{i,j} \left[ x_{i,j} \left( \frac{a_i}{b_j} - c\right) \right]^2$ or  $\min \sum_{i,j} x_{i,j} \left( \frac{a_i}{b_j} - c\right)^2$

A complete model can look like:

MINLP Model
\begin{align} \min & \sum_{i,j} \color{darkred}x_{i,j} \left( \frac{\color{darkblue}a_i}{\color{darkblue}b_j} - \color{darkred}c\right)^2 \\ & \sum_j \color{darkred} x_{i,j} = 1 &&\forall i\\ & \sum_i \color{darkred} x_{i,j} \le 1 &&\forall j\\ & \color{darkred}x_{i,j} \in \{0,1\}\\ & \color{darkred}c \text{ free} \end{align}

When we run this model, using an MINLP solver, we see:

----     33 VARIABLE x.L  assignment

j1          j3          j4          j5          j7          j8          j9         j10         j11         j12

i1                                                        1.000
i2                    1.000
i3                                1.000
i4                                                                                                        1.000
i5                                                                                                                    1.000
i6                                                                    1.000
i7                                                                                            1.000
i8        1.000
i9                                            1.000
i10                                                                               1.000

----     33 VARIABLE c.L                   =        0.695  common value
VARIABLE z.L                   =        0.201  objective


The assignment can be depicted as follows. Solution. Matrix values are a(i)/b(j).
In the above matrix the cell values are $\text{cell}_{i,j} = \frac{a_i}{b_j}$

We can reformulate this MINLP model into a quadratic model by introducing variables  $y_{i,j} = x_{i,j} c$ This non-linear constraint can be linearized by stating \begin{align} & x_{i,j} = 0 \Rightarrow y_{i,j} = 0\\ &x_{i,j} = 1 \Rightarrow y_{i,j} = c\end{align} With these $$y$$ variables we can form an objective: $\sum_{i,j} \left( x_{i,j} \frac{a_i}{b_j} - y_{i,j} \right)^2$ Unfortunately, this is a non-convex objective. There are not many non-convex MIQP solvers around (Cplex has one). This model gives the same solutions as the MINLP model above. I am not showing the results here, as they are identical to the ones in the previous section.

Non-convex MIQP Model
\begin{align} \min & \sum_{i,j} \left( \color{darkred}x_{i,j} \frac{\color{darkblue}a_i}{\color{darkblue}b_j} - \color{darkred}y_{i,j}\right)^2 \\ & \sum_j \color{darkred} x_{i,j} = 1 &&\forall i\\ & \sum_i \color{darkred} x_{i,j} \le 1 &&\forall j\\ & \color{darkred} x_{i,j} = 0 \Rightarrow \color{darkred}y_{i,j} = 0 &&\forall i,j\\ & \color{darkred} x_{i,j} = 1 \Rightarrow \color{darkred}y_{i,j} = c &&\forall i,j\\ & \color{darkred}x_{i,j} \in \{0,1\}\\ & \color{darkred}y_{i,j} \ge 0 \\ & \color{darkred}c \ge 0 \end{align}

We assumed here positive fractions $$a_i/b_j \gt 0$$.This allows us to say $$c\ge 0$$ and $$y_{i,j}\ge 0$$.

#### A linear model

We can approximate the quadratic weighting of the residuals by absolute values: $\sum_{i,j} \left| x_{i,j} \left( \frac{a_i}{b_j} - c \right) \right|$ The first thing to is to linearize the term $$x_{i,j} c$$. As in the quadratic model, we introduce variables $y_{i,j} = x_{i,j} c$ We can form the implications: \begin{align} & x_{i,j} = 0 \Rightarrow y_{i,j} = 0\\ &x_{i,j} = 1 \Rightarrow y_{i,j} = c\end{align} Most solvers support indicator constraints, which allows us to implement these implications as is. If we have a solver without this, we can formulate this as a bunch of big-$$M$$ constraints: \begin{align} & - M x_{i,j} \le y_{i,j} \le M x_{i,j} \\ & c-M(1-x_{i,j}) \le y_{i,j} \le c+M(1-x_{i,j})\end{align} From the data we can assume $$a_{i}\ge 0$$ and $$b_{j}\gt 0$$. We can exploit this: \begin{align} & 0 \le y_{i,j} \le M_1 x_{i,j} \\ & c-M_2(1-x_{i,j}) \le y_{i,j} \le c+M_2(1-x_{i,j})\end{align} We can think a bit about good values for $$M_1$$ and $$M_2$$. I suggest: $M_1 = M_2 = \max c = \max_{i,j} \frac{a_i}{b_j}$

The complete model can look like

MIP Model
\begin{align} \min & \sum_{i,j} \color{darkred}r_{i,j} \\ & \sum_j \color{darkred} x_{i,j} = 1 && \forall i\\ & \sum_i \color{darkred} x_{i,j} \le 1 && \forall j\\ & -\color{darkred}r_{i,j} \le \color{darkred} x_{i,j} \frac{\color{darkblue} a_i}{\color{darkblue} b_j} - \color{darkred} y_{i,j} \le \color{darkred}r_{i,j} &&\forall i,j\\ & \color{darkred} y_{i,j} \le \color{darkblue} M_1 \color{darkred} x_{i,j} &&\forall i,j\\ & \color{darkred} c - \color{darkblue} M_2 (1- \color{darkred} x_{i,j}) \le \color{darkred} y_{i,j} \le \color{darkred} c + \color{darkblue} M_2 (1- \color{darkred} x_{i,j})&&\forall i,j \\ & \color{darkred}x_{i,j} \in \{0,1\}\\ & \color{darkred}c \ge 0 \\ & \color{darkred}y_{i,j} \ge 0\\ & \color{darkred}r_{i,j} \ge 0\\ \end{align}

The results look like:

----     72 VARIABLE x.L  assignment

j1          j3          j4          j5          j7          j8          j9         j10         j11         j12

i1                                                        1.000
i2                    1.000
i3                                1.000
i4                                                                                                        1.000
i5                                                                                                                    1.000
i6                                                                    1.000
i7                                                                                            1.000
i8        1.000
i9                                            1.000
i10                                                                               1.000

----     72 VARIABLE c.L                   =        0.711  common value

----     72 VARIABLE y.L  c*x(i,j)

j1          j3          j4          j5          j7          j8          j9         j10         j11         j12

i1                                                        0.711
i2                    0.711
i3                                0.711
i4                                                                                                        0.711
i5                                                                                                                    0.711
i6                                                                    0.711
i7                                                                                            0.711
i8        0.711
i9                                            0.711
i10                                                                               0.711

----     72 VARIABLE r.L  abs(residuals)

j1          j3          j4          j5          j7          j8          j9         j10         j12

i1                                                        0.006
i2                    0.139
i3                                0.010
i5                                                                                                        0.009
i6                                                                    0.021
i7                                                                                      6.137042E-4
i8        0.147
i9                                            0.402
i10                                                                               0.002


The model results are very close. The assignments are the same. Just the $$c$$ value and resulting sum of squared or absolute residuals are somewhat different.

----     77 PARAMETER sumdev  sum of squared/absolute deviations

dev^2       |dev|           c

minlp model       0.201       0.784       0.695
mip model         0.204       0.737       0.711


#### A different approach

We can look at the problem in a different way. Instead of minimize the spread around a central value $$c$$, we can minimize the bandwidth directly. I.e. we can write: \begin{align} \min\> & \mathit{maxv} - \mathit{minv} \\ & \mathit{minv} \le \frac{a_i}{b_j} && \forall x_{i,j}=1\\& \mathit{maxv} \ge \frac{a_i}{b_j} && \forall x_{i,j}=1 \end{align} These constraints can be implemented as indicator constraints \begin{align} & x_{i,j}=1 \Rightarrow \mathit{minv} \le \frac{a_i}{b_j}\\ & x_{i,j}=1 \Rightarrow \mathit{maxv} \ge \frac{a_i}{b_j}\end{align} If we don't have indicator constraints available we need to resort to big-$$M$$ constraints:

Alternative MIP Model
\begin{align} \min\> & \color{darkred}{\mathit{maxv}} - \color{darkred}{\mathit{minv}} \\ & \sum_j \color{darkred} x_{i,j} = 1 && \forall i \\ & \sum_i \color{darkred} x_{i,j} \le 1 && \forall j \\ & \color{darkred}{\mathit{minv}} \le \frac{\color{darkblue}a_i}{\color{darkblue}b_j} + M (1-\color{darkred} x_{i,j}) && \forall i,j \\ & \color{darkred}{\mathit{maxv}} \ge \frac{\color{darkblue}a_i}{\color{darkblue}b_j} - M (1-\color{darkred} x_{i,j}) && \forall i,j \\ & \color{darkred}x_{i,j} \in \{0,1\} \\ & \color{darkred}{\mathit{maxv}},\color{darkred}{\mathit{minv}} \text{ free} \end{align}

The solution looks like:

----    103 VARIABLE x.L  assignment

j1          j2          j3          j4          j5          j6          j9         j11         j14         j15

i1                                                                                                        1.000
i2                                1.000
i3                                                                                1.000
i4                                                                                                                    1.000
i5                                                                    1.000
i6                                                                                            1.000
i7                    1.000
i8        1.000
i9                                                        1.000
i10                                           1.000

----    103 VARIABLE minv.L                =        0.308  minimum value
VARIABLE maxv.L                =        0.858  maximum value


A picture of this solution: Minimize bandwidth solution

The disadvantage of this method is that it does not care about assignments as long as they are inside our optimal bandwidth. We can see this if we recalculate an optimal central $$c$$ value afterwards. The results with this reconstructed $$c$$ value:

----    114 PARAMETER sumdev  sum of squared/absolute deviations

dev^2       |dev|           c

minlp model       0.201       0.784       0.695
mip model         0.204       0.737       0.711
mip model 2       0.313       1.548       0.617


Indeed, we see that the total sum of squared or absolute deviations is not as good as we saw before. The same thing can be seen by just calculating the standard deviation for the selected ratios $$a_i/b_j$$. This gives:

----    130 PARAMETER stdev

minlp model 0.149,    mip model   0.149,    mip model 2 0.186


Again we see this last approach leads to more variability. Distribution of selected a(i)/b(j)

In this above picture we see the results of this second MIP model not having an incentive to keep values close to each other, besides that they should be within $$[\mathit{minv},\mathit{maxv}]$$. This model is simpler than the earlier MIP model, but we pay a price: the solution is not as highly concentrated around the central point.

#### Non-unique $$b_j$$

If we allow the $$b_j$$ to be reused, only a small change in the models is needed. We just need to drop the constraint $\sum_i x_{i,j} \le 1$ When we do this, we may see a solution like: Solution. Allow a column to be selected multiple times.

#### Conclusion

This is turned out to be an interesting problem. A high-level MINLP problem is presented first. We can linearize the model into a MIP, but this becomes a bit messy. Modern MIP solvers offer tools (indicator constraints, built-in absolute value function) that can help here. Finally an alternative, simpler MIP model is proposed. However, this model produces solutions that are more dispersed.

## Thursday, June 20, 2019

### Assignment: Scipy vs Cplex

#### Scipy.optimize.linear_sum_assignment

I have never used this routine , so lets give this a try. This is a Python implementation of the Hungarian method . The call is exceedingly simple: just give it the cost matrix.

The algorithm allows unbalanced assignment problems by just specifying a rectangular cost matrix. Typically, unbalanced problems are transformed into balanced ones by adding source or destination nodes.

I believe the description is slightly inaccurate: Description from Scipy.org

This would allow zero assignments, i.e. $$x_{i,j}=0$$. It is not always easy to create a watertight English description.

Let's try a problem.

#### 1000x1000 dense assignment problem

When we feed a random problem with 1,000 sources and 1,000 destinations, we see the following timings:

We first read in the data (organized as lists). The second step is to convert this to a NumPy array (otherwise this is done automatically by linear_sum_assignment). Then we solve the problem. On my laptop this takes about 325 seconds.

We can formulate this problem as an LP:\begin{align}\min \> & \sum_{i,j} c_{i,j} x_{i,j}\\ & \sum_j x_{i,j} = 1 && \forall i \\ & \sum_i x_{i,j} =1 && \forall j \\ & x_{i,j} \ge 0 \end{align}

When we feed this into Cplex we see:

 Parallel mode: deterministic, using up to 4 threads for concurrent optimization. Tried aggregator 1 time. LP Presolve eliminated 2 rows and 1 columns. Reduced LP has 1999 rows, 1000000 columns, and 1999000 nonzeros. Presolve time = 5.95 sec. (770.92 ticks) Initializing dual steep norms . . . Iteration log . . . Iteration:     1   Dual objective     =             1.007485 Iteration:  1113   Dual objective     =             1.661162 Dual simplex solved model. LP status(1): optimal Cplex Time: 14.91sec (det. 2309.10 ticks)

Cplex tries several LP algorithms in parallel (concurrent optimization). It turns out Dual Simplex wins. The total solution time is about 15 seconds. So Cplex is 20 times as fast.

We can actually improve on this a bit. This is a network problem, so we can use the Network solver. In addition, we disable the presolver (it is somewhat expensive and does little for this problem).

 Extracted network with 2002 nodes and 1000001 arcs. Extraction time = 0.63 sec. (49.15 ticks) Iteration log . . . Iteration:     0   Infeasibility     =          2000.000000 (2000) Iteration: 10000   Infeasibility     =             0.000000 (7.62808) Iteration: 20000   Objective         =             5.108089 Iteration: 30000   Objective         =             3.842909 Iteration: 40000   Objective         =             3.147366 Iteration: 50000   Objective         =             2.622157 Iteration: 60000   Objective         =             2.275971 Iteration: 70000   Objective         =             1.990956 Iteration: 80000   Objective         =             1.790689 Elapsed time = 1.33 sec. (82.13 ticks) Iteration: 90000   Objective         =             1.684441 Network - Optimal:  Objective =    1.6701729950e+00 Network time = 1.67 sec. (119.82 ticks)  Iterations = 92769 (10000) LP status(1): optimal Cplex Time: 2.66sec (det. 236.01 ticks)

Now we need only less than 3 seconds (Cplex is 120 times as fast as Scipy).

We would normally expect a specialized assignment solver to do better than a general purpose LP solver. However, Cplex is a highly polished product and we run on the bare metal. Using Python and a simpler implementation just causes us to lose. Note that there are specialized solvers that are much faster.

Some solvers in Scipy are a little bit underwhelming. Examples are the linear programming simplex solver and, as we saw here, the assignment problem solver.

#### References

1. scipy.optimize.linear_sum_assignment, https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linear_sum_assignment.html
2. Harold W. Kuhn. The Hungarian Method for the assignment problem. Naval Research Logistics Quarterly, 2:83-97, 1955.

## Wednesday, June 12, 2019

### Machine Scheduling

Here  is a problem with multiple machines:
• There are 3 machines and 4 jobs in this example
• Not all machines can do all jobs
• We have a few precedence constraints
• Given known processing times, find a schedule that minimizes the makespan (the time the last job is finished)
Assume the following data:

----     63 --------------- data ---

----     63 SET j  jobs

job1,    job2,    job3,    job4

----     63 SET m  machines

machine1,    machine2,    machine3

----     63 SET ok  allowed job/machine combinations

machine1    machine2    machine3

job1         YES         YES
job2         YES         YES
job3         YES                     YES
job4                                 YES

----     63 PARAMETER proctime  processing time

job1  4.000,    job2  2.000,    job3 10.000,    job4 12.000

----     63 SET prec  precedence constraints

job2        job4

job1         YES         YES


There are many ways to model this. Some approaches can be:

• Discrete time using time slots. This leads to binary variables $x_{j,m,t} = \begin{cases} 1 & \text{if job j executes on machine m at time t}\\ 0 & \text{otherwise}\end{cases}$
• Continuous time with a binary variable indicating if job $$j_1$$ is executed before job $$j_2$$ (on the same machine)
• Continuous time with a binary variable indicating if job $$j_1$$ immediately precedes job $$j_2$$ (on the same machine)
Let's try the second approach. First we need a planning horizon. A simplistic way to find a planning horizon $$T$$ is just to schedule each job after another: $T = \sum_j \mathit{proctime}_j$ For our small data set, this gives:

----     63 PARAMETER T                    =       28.000  max time


This $$T$$ is used as a big-$$M$$ constant, so we would like it to be as small as possible. For very large problems, we often use some heuristic to find an initial solution, and use this to set the planning horizon $$T$$. If that is not possible, we may want to use indicator constraints, so we don't need any $$T$$.

#### Decision Variables

We have two blocks of binary variables. One dealing with assigning jobs to machines and a second one about ordering of jobs on the same machine. The latter is handled by implementing no-overlap constraints. For this we need a binary variable indicating if job $$j_1$$ is before or after job $$j_2$$.

Binary variables
$\color{DarkRed}{\mathit{assign}}_{j,m} = \begin{cases} 1 & \text{if job j is placed on machine m}\\ 0 & \text{otherwise}\end{cases}$ $\color{DarkRed}{\mathit{after}}_{j_1,j_2} = \begin{cases} 1 & \text{if job j_1 is executed after job j_2 when placed on the same machine}\\ 0 & \text{if job j_1 is executed before job j_2 when placed on the same machine}\end{cases}$

Note that the variable $$\mathit{after}_{j_1,j_2}$$ will be used only for $$j_1 \lt j_2$$. This is to avoid double checking the same pair.
We also have a few continuous variables:

Continuous variables
\begin{align} & \color{DarkRed}{\mathit{start}}_{j} \ge 0 && \text{start time of job j}\\ & \color{DarkRed}{\mathit{finish}}_{j} \ge 0 && \text{finish time of job j}\\ & \color{DarkRed}{\mathit{makespan}} \ge 0 && \text{last finish time} \end{align}

#### Constraints

The model can look like:

Mixed Integer Programming Model
\begin{align} \min\> & \color{DarkRed}{\mathit{makespan}} \\ & \color{DarkRed}{\mathit{makespan}} \ge \color{DarkRed}{\mathit{finish}}_j && \forall j \\ & \color{DarkRed}{\mathit{finish}}_j = \color{DarkRed}{\mathit{start}}_j + \color{DarkBlue}{\mathit{proctime}}_j && \forall j\\ & \sum_m \color{DarkRed}{\mathit{assign}}_{j,m} = 1 && \forall j\\ & \color{DarkRed}{\mathit{start}}_{j_1} \ge \color{DarkRed}{\mathit{finish}}_{j_2} - \color{DarkBlue}T (1-\color{DarkRed}{\mathit{after}}_{j_1,j_2}) - \color{DarkBlue}T (1-\color{DarkRed}{\mathit{assign}}_{j_1,m})- \color{DarkBlue}T (1-\color{DarkRed}{\mathit{assign}}_{j_2,m}) && \forall m, j_1 \lt j_2 \\ & \color{DarkRed}{\mathit{start}}_{j_2} \ge \color{DarkRed}{\mathit{finish}}_{j_1} - \color{DarkBlue}T \color{DarkRed}{\mathit{after}}_{j_1,j_2} - \color{DarkBlue}T (1-\color{DarkRed}{\mathit{assign}}_{j_1,m})- \color{DarkBlue}T (1-\color{DarkRed}{\mathit{assign}}_{j_2,m}) && \forall m, j_1 \lt j_2 \\ & \color{DarkRed}{\mathit{start}}_{j_2} \ge \color{DarkRed}{\mathit{finish}}_{j_1} && \forall \color{DarkBlue}{\mathit{prec}}_{j_1,j_2} \\ & \color{DarkRed}{\mathit{assign}}_{j,m} = 0 && \forall \text{ not } \color{DarkBlue}{\mathit{ok}}_{j,m} \end{align}

The no-overlap constraints are the most complicated in this model. If the problem was a single machine scheduling problem, the no-overlap constraints would look like: $\mathit{start}_{j_1} \ge \mathit{finish}_{j_2} \text{ or } \mathit{start}_{j_2} \ge \mathit{finish}_{j_1} \>\> \forall j_1 \lt j_2$ Note that we only need to compare jobs $$j_1 \lt j_2$$ (each pair needs to be checked only once). The OR condition can be implemented using a binary variable, and big-$$M$$ constraints: \begin{align} & \mathit{start}_{j_1} \ge \mathit{finish}_{j_2} - T (1-{\mathit{after}}_{j_1,j_2}) && \forall j_1 \lt j_2 \\ & \mathit{start}_{j_2} \ge \mathit{finish}_{j_1} - T {\mathit{after}}_{j_1,j_2} && \forall j_1 \lt j_2 \end{align} i.e. job $$j_1$$ executes before or after job $$j_2$$, but not in parallel. When we have multiple machines, we only want to keep this constraint active when jobs $$j_1$$ and $$j_2$$ are assigned to the same machine. That is: jobs are allowed to execute in parallel when they are on different machines. In a sense, we have built some nested big-$$M$$ constraints.

#### Results

After running this model we get as results:

            --------------- solution -----

----     66 VARIABLE assign.L  assign job to machine

machine1    machine2    machine3

job1                   1.000
job2                   1.000
job3       1.000
job4                               1.000

----     66 VARIABLE start.L  job start time

job2 4.000,    job4 4.000

----     66 VARIABLE finish.L  job finish time

job1  4.000,    job2  6.000,    job3 10.000,    job4 16.000

----     66 VARIABLE makespan.L            =       16.000  time last job is finished

----     66 VARIABLE after.L  j1 is scheduled after j2 on machine m

job3

job1       1.000


How should we interpret the variable $$\mathit{after}$$? Only jobs 1 and 2 are on the same machine and for this combination we have $$\mathit{after}_{1,2}=0$$. This means job1 is not after job2, and thus: job2 must be after job1. This is indeed what is happening. The values for $$\mathit{after}$$ for jobs not on the same machine are basically just random: they are not used in active constraints. This means: the printed value $$\mathit{after}_{1,3}=1$$ is not relevant as jobs 1 and 3 are on different machines.

Graphically this looks like:

We see that this solution obeys the precedence constraints (jobs 2 and 4 after job 1). Furthermore, it only assigns jobs to allowed machines (e.g. job 4 can only run on machine 3).

#### Larger data set

The above data set is very small. Such a data set is great during development. A beginners mistake I often see is using a large data set during model development. It is much more convenient and efficient using a very small data set when things are in flux, and you do many runs just to get the model right.

Now, it is important to see how the model behaves with a large data set. I used random data with 8 machines and 50 jobs. For this problem we only needed a few seconds to solve to guaranteed optimality. The model had about 20k constraints, 1,700 variables (of which 1,400 binary variables). solution of 8 machine, 50 job problem with random data

The performance seems to be quite reasonable. The main problem is that for large models the number of constraints becomes big, and the big-$$M$$ constraints may not be as tight as we want. But for a first formulation, not so bad.