Tuesday, July 16, 2019

Quadratic Indicator Constraint?

I was pondering about a constraint of the form:\[x_{i,k}=1 \Rightarrow d_{i,k} \ge \sum_j \left( \bar{x}_{k,j} - p_{i,j} \right)^2\] where \(x_{i,k}\in \{0,1\}\). Obviously, this could be implemented easily using indicator constraints, which are supported by all major solvers. These constraints have the form: \[\begin{align} &x_i = 0 \Rightarrow \text{constraint} \\ &\text{or} \\ & x_i = 1 \Rightarrow \text{constraint}\end{align}\] where \(x_i \in \{0,1\}\).

Unfortunately, indicator constraints are only supported for linear constraints. The Cplex docs say:

  • The constraint must be linear; a quadratic constraint is not allowed to have an indicator constraint.
  • A lazy constraint cannot have an indicator constraint.
  • A user-defined cut cannot have an indicator constraint.

Gurobi mentions:

An indicator constraint \(y=f  \rightarrow a^Tx \le b\) states that if the binary variable \(y\) has the value \(f\in \{0,1\}\) in a given solution then the linear constraint \(a^Tb \le b\) has to be satisfied.

Xpress and SCIP say similar things: indicator constraints are only for linear constraints.

Somehow, I expected that I could use a (convex) quadratic constraint with an indicator. As none of the solvers has this implemented, I suspect there is a good reason why this is not possible.

Big-M and SOS1


So, now we cannot use indicators, we need to use more traditional implementation techniques. The most obvious is to use a big-\(M\) formulation: \[  d_{i,k} \ge \sum_j \left( \bar{x}_{k,j} - p_{i,j} \right)^2 - M(1-x_{i,k})\] The disadvantage of this formulation is that we need to establish a good value for \(M\) and that this value cannot be too large. Large values of \(M\) tend to confuse MIP solvers.

If you have no good (and small) values for \(M\), we can use SOS1 sets. I.e. we write \[\begin{align} &d_{i,k} \ge \sum_j \left( \bar{x}_{k,j} - p_{i,j} \right)^2 - s_{i,k} \\ & s_{i,k}\ge 0\\ & \{s_{i,k},x_{i,k}\} \in SOS1\end{align}\] This says: either \(s_{i,k}\) or \(x_{i,k}\) can be non-zero, but not both. To be precise: both can be zero. Most MIP solvers support SOS1 sets.

Finally, as stated in the comments, an alternative formulation would be \[\begin{align} & d'_{i,k} \ge \sum_j \left( \bar{x}_{k,j} - p_{i,j} \right)^2 \\ & x_{i,k}=1 \Rightarrow d_{i,k} \ge  d'_{i,k} \end{align}\] This comes at the expense of extra variables and constraints. Basically we added one indirection. With this formulation, we have a linear indicator constraint and a separate convex quadratic inequality. I assume the presolver will not presolve this away.

Error Messages


In my opinion well-formulated error messages are very important. As a user, we are already confused that the system is not doing what we want, so a confusing error message is just piling on. Let's see what we see when using a quadratic indicator constraint:

  • GAMS has poor support for indicators in general, but is doing really poorly here. It does not give an error message, but just gives a wrong solution. This is really bad (I reported this, so it will be fixed).
  • AMPL gives an error message that seems to come from outer space.
    CPLEX 12.9.0.0: logical constraint _slogcon[1] is not an indicator constraint.Let's blame the slogcons (???). Obviously, I have no constraint (or any other symbol) called slogcon.
  • Cplex's interactive optimizer is doing a better job:
    CPLEX Error  1605: Line 5: Illegal quadratic term in a constraint.
  • Cplex's OPL has a difficult time understanding the (convex) quadratic indicator constraint. It says:
    *** FATAL[ENGINE_002]: Exception from IBM ILOG CPLEX: CPLEX Error  5002: 'q1' is not convex. Obviously the constraint is convex, so there is a problem in how OPL generates the constraint (i.e. this is a bug). The error message just does not make sense. The message is also bad: I don't have a `q1` in the model. 

A better message would be something like:
Equation `e2`: quadratic terms in an indicator constraint are not supported. Please reformulate this constraint.

Basically you want an error message to describe the problem in a way that is understandable (even when the user is slightly panicked). And secondly, it is a good idea to tell the user what to do now. And, never mention the slogcons.

Of course, it would be even better if solvers would support quadratic indicator constraints. In design there is this concept of orthogonality: it is best to have as few exceptions as possible.

Sunday, July 7, 2019

Binary multiplication

We want to express \[\begin{align} & z = x \cdot y \\ & x,y,z \in \{0,1\}\end{align}\] in a linear fashion. There are two good reasons to do this. This linearization will allow us to use linear Mixed Integer Programming solvers instead of quadratic or MINLP solvers. In addition, the linearization can bypass any non-convexity in the original quadratic formulation. The linear formulation will buy us more accessible solvers, and possibly better performance.

As we are talking about binary variables, \(z = x \cdot y\) can be interpreted as \(z = x \textbf{ and } y\). We can resort to two standard formulations:

Formulation 1Formulation 2
\[\begin{align} & \color{darkred} z \le \color{darkred}x\\ & \color{darkred}z \le \color{darkred}y\\ & \color{darkred}z \ge \color{darkred}x+\color{darkred}y-1\\ & \color{darkred}x,\color{darkred}y \in \{0,1\} \\ & \color{darkred}z \in [0,1] \end{align}\] \[\begin{align} & \color{darkred} z \le \frac{\color{darkred}x+\color{darkred}y}{2}\\ & \color{darkred} z \ge \color{darkred}x+\color{darkred}y-1\\ & \color{darkred}x,\color{darkred}y,\color{darkred}z \in \{0,1\} \\ \end{align}\]

I always use the first form. But I quite often see the second formulation being used (e.g. [1]).

The second form can be interpreted as an aggregation of the first one by adding up \(z\le x\) and \(z \le y\).

There is a good reason to use the first form: it is tighter. For instance, the fractional value \((x,y) = (0, 1/2)\) yields \(z=0\) in the first formulation, but would allow \(z=0.25\) in the second formulation. The first formulation is so tight, we can relax \(z\in\{0,1\}\) to \(z\in[0,1]\).

My attempt to show graphically the difference:

Formulation 1
Formulation 2


Obvious formulation 1 seems to be the winner. But when I try this out on a model [2] I see:

Results for Cplex, default settings, 1 thread
SizeModel 1
Seconds/nodes
Model 2
Seconds/nodes
\(n=20,k=10\)2.9 / 31171.8 / 3018
\(n=25,k=5\)12 / 624511 / 7043
\(n=30,k=4\)15 / 526313 / 4471

Well, this does not seem to prove my point. My preferred formulation is actually slightly underperforming.

Let's look at a different solver to see if that proves my point.


Results for CBC, 1 thread
SizeModel 1
Seconds/nodes
Model 2
Seconds/nodes
\(n=20,k=10\)46 / 30821623 / 279364
\(n=25,k=5\)105 / 68601823 / 87396
\(n=30,k=4\)144 / 12008213 / 11336

These results look quite different. First, obviously CBC is slower than Cplex. No surprise there. But also we see formulation 1 is much better than formulation 2. This is more like I expected beforehand. I suspect the cuts produced by Cplex eliminated the advantage of formulation 1. We see that more often: some modeling tricks are becoming less important as solvers are getting smarter.

Notes:

  1. The meaning of \(n\) and \(k\) for the size of the problem is explained in [2].
  2. The variables \(z\) were declared as binary both in models 1 and 2.


References



Sunday, June 30, 2019

Maximum Dispersion


Problem statement


Given \(n\) points with their distances  \(d_{i,j}\), select \(k\) points such that the sum of the distances between the selected points is maximized [1].

A simple MIQP (Mixed-Integer Quadratic Programming) model is:


Non-convex MIQP Model
\[\begin{align} \max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}x_{i} \color{darkred}x_{j} \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}x_{i} \in \{0,1\} \end{align}\]

Notice we only need to look at distances \(d_{i,j}\) with \(i\lt j\) as we can assume symmetry. If not, just make \(d_{i,j}\) symmetric by: \[d_{i,j} = \frac{d_{i,j} + d_{j,i}}{2}\]

As we shall see this is not such an easy problem to solve to optimality.

This problem is called the \(p\)-dispersion-sum problem.

Example data


I generated random \(n=50\) \(2d\) points:


----     24 PARAMETER coord  random coordinates

              x           y

i1        1.717       8.433
i2        5.504       3.011
i3        2.922       2.241
i4        3.498       8.563
i5        0.671       5.002
i6        9.981       5.787
i7        9.911       7.623
i8        1.307       6.397
i9        1.595       2.501
i10       6.689       4.354
i11       3.597       3.514
i12       1.315       1.501
i13       5.891       8.309
i14       2.308       6.657
i15       7.759       3.037
i16       1.105       5.024
i17       1.602       8.725
i18       2.651       2.858
i19       5.940       7.227
i20       6.282       4.638
i21       4.133       1.177
i22       3.142       0.466
i23       3.386       1.821
i24       6.457       5.607
i25       7.700       2.978
i26       6.611       7.558
i27       6.274       2.839
i28       0.864       1.025
i29       6.413       5.453
i30       0.315       7.924
i31       0.728       1.757
i32       5.256       7.502
i33       1.781       0.341
i34       5.851       6.212
i35       3.894       3.587
i36       2.430       2.464
i37       1.305       9.334
i38       3.799       7.834
i39       3.000       1.255
i40       7.489       0.692
i41       2.020       0.051
i42       2.696       4.999
i43       1.513       1.742
i44       3.306       3.169
i45       3.221       9.640
i46       9.936       3.699
i47       3.729       7.720
i48       3.967       9.131
i49       1.196       7.355
i50       0.554       5.763


Let's try to find \(k=10\) points that are most dispersed. The distance matrix is formed by calculating Euclidean distances.

10 out of 50 most dispersed points


Solve non-convex problem


The above MIQP problem is non-convex. We can solve the non-convex MIQP problem in different ways:

  • Throw this into a global solver such as Baron, Couenne or Antigone
  • Use a solver like Cplex (option qtolin) or Gurobi (option preQlinearize) that can linearize the problem automatically for us. See further down how we can do this ourselves.
  • Instruct Cplex to use a QP formulation (option qtolin=0) and tell it to use the global QP solver (option optimalitytarget=3)

Convexification of the quadratic model


There is a trick to make this problem convex. For maximization, we require that the \(Q\) matrix is negative definite.  In our case the \(Q\) matrix is our distance matrix. The following algorithm will make the problem convex:

  1. Calculate the largest eigenvalue \(\lambda_{max}\) of the distance matrix \(0.5 D\). We multiply by 0.5 because we only use half of the matrix in the objective to prevent double counting.
  2. If  \(\lambda_{max} \lt 0 \): we are done (matrix is negative-definite)
  3. Form the objective: \[\max \sum_{i \lt j}  d_{i,j} x_{i} x_{j} - \lambda_{max} \sum_i x_i^2 + \lambda_{max} \sum_i x_i\] 
This essentially imposes a possible large diagonal perturbation of the \(Q\) matrix. We compensate with a linear term. This uses the fact that \(x_i = x_i^2\) when \(x_i \in \{0,1\}\).  

Linearization 


The quadratic model is easily linearized by observing that a well-known trick to handle \(y_{i,j} = x_i x_j\) is to write: \[\begin{align} & y_{i,j} \le x_i \\ & y_{i,j} \le x_j \\ & y_{i,j} \ge x_i +y_i-1\\& y_{i,j}, x_i \in \{0,1\}\end{align}\] A complete model can look like:


Linearized Model 1
\[\begin{align} \max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}y_{i,j} \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}y_{i,j} \le \color{darkred} x_i && \forall i\lt j \\ & \color{darkred}y_{i,j} \le \color{darkred} x_j && \forall i\lt j \\ & \color{darkred}x_{i} \in \{0,1\} \\ & \color{darkred}y_{i,j} \in [0,1] \end{align}\]

There are two things that may need some attention:

  • The inequality \(y_{i,j}\ge x_i +x_j -1\) was dropped. The objective will take care of this.
  • The variables \(y_{i,j}\) were relaxed to be continuous between 0 and 1. The \(y\) variables will be automatically integer (well, where it matters). Often models with fewer integer variables are easier to solve. Modern solvers, however, may reintroduce binary variables for this particular model. In Cplex you can see a message like shown below.


Tried aggregator 1 time.
MIP Presolve eliminated 1 rows and 1 columns.
Reduced MIP has 2451 rows, 1275 columns, and 4950 nonzeros.
Reduced MIP has 50 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.06 sec. (2.79 ticks)
Found incumbent of value 0.000000 after 0.06 sec. (4.54 ticks)
Probing time = 0.02 sec. (0.51 ticks)
Tried aggregator 1 time.
Reduced MIP has 2451 rows, 1275 columns, and 4950 nonzeros.
Reduced MIP has 1275 binaries, 0 generals, 0 SOSs, and 0 indicators.


A tighter linearization


In [2] a tighter formulation is proposed:


Linearized Model 2
\[\begin{align} \max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}y_{i,j} \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}y_{i,j} \le \color{darkred} x_i && \forall i\lt j \\ & \color{darkred}y_{i,j} \le \color{darkred} x_j && \forall i\lt j \\ & \color{darkred}y_{i,j} \ge \color{darkred} x_i +\color{darkred} x_j -1 && \forall i\lt j \\ &  \sum_{i\lt j} \color{darkred}y_{i,j} + \sum_{i\gt j} \color{darkred}y_{j,i} = (\color{darkblue} k-1) \color{darkred}x_j && \forall j \\ & \color{darkred}x_{i} \in \{0,1\} \\ & \color{darkred}y_{i,j} \in \{0,1\} \end{align}\]

Essentially, we multiplied the constraint \(\sum_i x_{i} = k\) by \(x_j\), and added these as constraints. The derivation is as follows: \[\begin{align} & \left( \sum_i x_i\right) x_j = k x_j\\ \Rightarrow & \sum_{i\lt j} x_i x_j +  x_j^2 + \sum_{i\gt j} x_i x_j =  k x_j\\ \Rightarrow & \sum_{i\lt j} y_{i,j} + \sum_{i\gt j} y_{j,i} =  (k-1) x_j \end{align}\] We used here that \(x_j^2 = x_j\).

My intuition is as follows. If \(x_j=1\) then exactly \(k-1\) other \(x_i\)'s should be 1. That means \(k-1\) \(y_{i,j}\)'s should be 1. As we only use the upper-triangular part, the picture becomes:

Picture of y layout
Given that \(x_j=1\), we need to have \(k-1\) ones in the orange region.


Aggregation


We can aggregate the previous cut, which leads to: \[\sum_{i\lt j} y_{i,j} = \frac{k(k-1)}{2}\] The advantage of this version is that we only need one extra constraint [4]:


Linearized Model 3
\[\begin{align} \max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}y_{i,j} \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}y_{i,j} \le \color{darkred} x_i && \forall i\lt j \\ & \color{darkred}y_{i,j} \le \color{darkred} x_j && \forall i\lt j \\ & \color{darkred}y_{i,j} \ge \color{darkred} x_i +\color{darkred} x_j -1 && \forall i\lt j \\ &  \sum_{i\lt j} \color{darkred}y_{i,j} = \frac{ \color{darkblue}k (\color{darkblue} k-1)}{2} \\ & \color{darkred}x_{i} \in \{0,1\} \\ & \color{darkred}y_{i,j} \in \{0,1\} \end{align}\]

The intuition is easy: if we have \(k\) \(x_j\)'s equal to one, we need to have \(k(k-1)/2\) \(y_{i,j}\)'s equal to one.

Numerical results


I tried these formulations using Cplex with a time limit of 1800 seconds (half an hour).


ModelTimeObjGapNotes
Non-convex MIQP1800332.139387.64%Solve as quadratic model.
options: qtolin=0, optimalitytarget=3
Non-convex MIQP1800332.912957.65%Automatically converted to linear model.
options: qtolin=1
Convex MIQP 1800332.912992.22%option: qtolin=0
MIP 1 1800332.912957.65%Defaults
MIP 2 8332.9129optimalDefaults (roughly same performance whether or not including \(y_{i,j}\ge x_i+x_j-1\))
MIP 3 400332.9129optimalDefaults. Excludes \(y_{i,j}\ge x_i+x_j-1\).
MIP 3 11332.9129optimalDefaults. Includes \(y_{i,j}\ge x_i+x_j-1\).


Just one observation, but I think these results are representative for other (small) instances.

We dropped the constraint \(y_{i,j}\ge x_i+x_j-1\) from MIP model 1: the objective will push \(y\) upwards on its own. For models MIP 2 and 3 it is wise to reintroduce them.

It is noted that although the gaps are terrible for all models without extra cuts. However, some of these methods find the best solution very quickly. They are just not able to prove optimality in a reasonable time. Here is an example:


        Nodes                                         Cuts/
   Node  Left     Objective  IInf  Best Integer    Best Bound    ItCnt     Gap

      0     0     1235.6817    50                   1235.6817        7         
*     0+    0                          332.9129     1235.6817           271.17%
Found incumbent of value 332.912894 after 0.02 sec. (4.67 ticks)
      0     2     1235.6817    50      332.9129     1162.0829        7  249.07%
Elapsed time = 0.03 sec. (11.72 ticks, tree = 0.02 MB, solutions = 1)


Better dispersion


From the picture of the solution (earlier in this post), we see that this model does not prevent selected points to be close to each other.  A better model may be to maximize the minimum distance between selected points. This can be modeled as:


Maximize Minimum Distance
\[\begin{align} \max\> & \color{darkred} {\Delta} \\ & \color{darkred} \Delta \le \color{darkblue} d_{i,j} + \color{darkblue} M (1- \color{darkred}x_{i} \color{darkred}x_{j})  && \forall i\lt j \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}x_{i} \in \{0,1\} \\ \end{align}\]


Here \(M\) is a large enough constant, e.g. \[M = \max_{i\le j} d_{i,j}\] The model can be easily linearized by formulating the distance constraint as  \[ \Delta \le  d_{i,j} +  M (1- x_{i}) + M(1- x_{j})  \] This model quickly solves our 10 out of 50 problem. It gives as solution:

Maximization of minimum distance

This model has as disadvantage that it does not care about points being closer to each other as long as they are further away than the minimum. For this example, visual inspection seems to indicate this model does good job.


Conclusion


The \(p\)-dispersion-sum problem is very difficult to solve to optimality, even for very small data sets. Extra cuts can enormously help the linearized version of the model. A main drawback is that the optimal solutions do not provide us with well-dispersed points (points can be very close).

The maximin model solves much easier, and it gives better dispersion: selected points are never close to each other.

References


  1. How to select n objects from a set of N objects, maximizing the sum of pairwise distances between them, https://stackoverflow.com/questions/56761115/how-to-select-n-objects-from-a-set-of-n-objects-maximizing-the-sum-of-pairwise
  2. Warren P. Adams and Hanif D. Sherali, A Tight Linearization and an Algorithm for Zero-One Quadratic Programming Problems, Management Science, Vol. 32, No. 10 (Oct., 1986), pp. 1274-1290
  3. Michael J. Kuby, Programming Models for Facility Dispersion: The p‐Dispersion and Maxisum Dispersion Problems, Geographical Analysis, vol. 19, pp.315-329, 1987.
  4. Ed Klotz, Performance Tuning for Cplex’s Spatial Branch-and-Bound Solver for Global Nonconvex (Mixed Integer) Quadratic Programs, http://orwe-conference.mines.edu/files/IOS2018SpatialPerfTuning.pdf

Friday, June 21, 2019

Special assignment problem


Problem Description


This is adapted from [1].

Assume we have data \(a_i\) and \(b_j\):


----     11 PARAMETER a  

i1  0.172,    i2  0.843,    i3  0.550,    i4  0.301,    i5  0.292,    i6  0.224,    i7  0.350,    i8  0.856,    i9  0.067
i10 0.500


----     11 PARAMETER b  

j1  0.998,    j2  0.621,    j3  0.992,    j4  0.786,    j5  0.218,    j6  0.676,    j7  0.244,    j8  0.325,    j9  0.702
j10 0.492,    j11 0.424,    j12 0.416,    j13 0.218,    j14 0.235,    j15 0.630


Try to find an assignment of each \(a_i\) to a unique \(b_j\) such that the quantities \[ \frac{a_i}{b_j}\] are as close to each other as possible.

This is more or less an assignment problem. The objective to make all ratios \(a_i/b_j\) as equal as possible, makes it interesting. I'll discuss some formulations for this problem.

A first model


We first introduce assignment variables: \[x_{i,j} = \begin{cases} 1 & \text{if $a_i$ is assigned to $b_j$} \\ 0 & \text{otherwise}\end{cases}\] The standard assignment constraints apply \[\begin{align}&\sum_i x_{i,j}\le 1&&\forall j \\&\sum_j x_{i,j} = 1&&\forall i  \end{align}\] This is an "unbalanced" assignment: we have more \(j\)'s than \(i\)'s.

To model "all about equal", we introduce a continuous variable \(c\) that represents the common value. I.e. \[\frac{a_i}{b_j} \approx c \>\>\text{for all  $(i,j)$ such that $x_{i,j}=1$}\]

The objective is \[\min \sum_{i,j}  \left[ x_{i,j} \left( \frac{a_i}{b_j} - c\right) \right]^2 \] or  \[\min \sum_{i,j} x_{i,j} \left( \frac{a_i}{b_j} - c\right)^2 \]

A complete model can look like:

MINLP Model
\[\begin{align} \min & \sum_{i,j} \color{darkred}x_{i,j} \left( \frac{\color{darkblue}a_i}{\color{darkblue}b_j} - \color{darkred}c\right)^2 \\ & \sum_j \color{darkred} x_{i,j} = 1 &&\forall i\\ & \sum_i \color{darkred} x_{i,j} \le 1 &&\forall j\\ & \color{darkred}x_{i,j} \in \{0,1\}\\ & \color{darkred}c \text{ free} \end{align}\]


When we run this model, using an MINLP solver, we see:


----     33 VARIABLE x.L  assignment

             j1          j3          j4          j5          j7          j8          j9         j10         j11         j12

i1                                                        1.000
i2                    1.000
i3                                1.000
i4                                                                                                        1.000
i5                                                                                                                    1.000
i6                                                                    1.000
i7                                                                                            1.000
i8        1.000
i9                                            1.000
i10                                                                               1.000


----     33 VARIABLE c.L                   =        0.695  common value
            VARIABLE z.L                   =        0.201  objective


The assignment can be depicted as follows.

Solution. Matrix values are a(i)/b(j).
In the above matrix the cell values are \[\text{cell}_{i,j} = \frac{a_i}{b_j}\]

A quadratic model


We can reformulate this MINLP model into a quadratic model by introducing variables  \[y_{i,j} = x_{i,j} c\] This non-linear constraint can be linearized by stating \[\begin{align} & x_{i,j} = 0 \Rightarrow y_{i,j} = 0\\ &x_{i,j} = 1 \Rightarrow y_{i,j} = c\end{align}\] With these \(y\) variables we can form an objective: \[\sum_{i,j}  \left( x_{i,j}  \frac{a_i}{b_j} - y_{i,j} \right)^2  \] Unfortunately, this is a non-convex objective. There are not many non-convex MIQP solvers around (Cplex has one). This model gives the same solutions as the MINLP model above. I am not showing the results here, as they are identical to the ones in the previous section.


Non-convex MIQP Model
\[\begin{align} \min & \sum_{i,j} \left( \color{darkred}x_{i,j} \frac{\color{darkblue}a_i}{\color{darkblue}b_j} - \color{darkred}y_{i,j}\right)^2 \\ & \sum_j \color{darkred} x_{i,j} = 1 &&\forall i\\ & \sum_i \color{darkred} x_{i,j} \le 1 &&\forall j\\ & \color{darkred} x_{i,j} = 0 \Rightarrow \color{darkred}y_{i,j} = 0 &&\forall i,j\\ & \color{darkred} x_{i,j} = 1 \Rightarrow \color{darkred}y_{i,j} = c &&\forall i,j\\ & \color{darkred}x_{i,j} \in \{0,1\}\\ & \color{darkred}y_{i,j} \ge 0 \\ & \color{darkred}c \ge 0 \end{align}\]

We assumed here positive fractions \(a_i/b_j \gt 0\).This allows us to say \(c\ge 0\) and \(y_{i,j}\ge 0\).

A linear model


We can approximate the quadratic weighting of the residuals by absolute values: \[\sum_{i,j}  \left| x_{i,j} \left( \frac{a_i}{b_j} - c \right) \right| \] The first thing to is to linearize the term \(x_{i,j} c\). As in the quadratic model, we introduce variables \[y_{i,j} = x_{i,j} c\] We can form the implications: \[\begin{align} & x_{i,j} = 0 \Rightarrow y_{i,j} = 0\\ &x_{i,j} = 1 \Rightarrow y_{i,j} = c\end{align}\] Most solvers support indicator constraints, which allows us to implement these implications as is. If we have a solver without this, we can formulate this as a bunch of big-\(M\) constraints: \[\begin{align} & - M x_{i,j} \le y_{i,j} \le M x_{i,j} \\ & c-M(1-x_{i,j}) \le y_{i,j} \le c+M(1-x_{i,j})\end{align}\] From the data we can assume \(a_{i}\ge 0\) and \(b_{j}\gt 0\). We can exploit this: \[\begin{align} & 0 \le y_{i,j} \le M_1 x_{i,j} \\ & c-M_2(1-x_{i,j}) \le y_{i,j} \le c+M_2(1-x_{i,j})\end{align}\] We can think a bit about good values for \(M_1\) and \(M_2\). I suggest: \[M_1 = M_2 = \max c = \max_{i,j} \frac{a_i}{b_j}\]

The complete model can look like

MIP Model
\[\begin{align}
\min & \sum_{i,j} \color{darkred}r_{i,j} \\
        & \sum_j \color{darkred} x_{i,j} = 1 && \forall i\\
        & \sum_i \color{darkred} x_{i,j} \le 1 && \forall j\\
        &  -\color{darkred}r_{i,j} \le \color{darkred} x_{i,j} \frac{\color{darkblue} a_i}{\color{darkblue} b_j} - \color{darkred} y_{i,j} \le \color{darkred}r_{i,j} &&\forall i,j\\
       &  \color{darkred} y_{i,j} \le \color{darkblue} M_1 \color{darkred} x_{i,j}  &&\forall i,j\\ & \color{darkred} c - \color{darkblue} M_2 (1- \color{darkred} x_{i,j}) \le \color{darkred} y_{i,j} \le \color{darkred} c + \color{darkblue} M_2 (1- \color{darkred} x_{i,j})&&\forall i,j \\ & \color{darkred}x_{i,j} \in \{0,1\}\\ & \color{darkred}c \ge 0 \\ & \color{darkred}y_{i,j} \ge 0\\ & \color{darkred}r_{i,j} \ge 0\\ \end{align}\]


The results look like:


----     72 VARIABLE x.L  assignment

             j1          j3          j4          j5          j7          j8          j9         j10         j11         j12

i1                                                        1.000
i2                    1.000
i3                                1.000
i4                                                                                                        1.000
i5                                                                                                                    1.000
i6                                                                    1.000
i7                                                                                            1.000
i8        1.000
i9                                            1.000
i10                                                                               1.000


----     72 VARIABLE c.L                   =        0.711  common value

----     72 VARIABLE y.L  c*x(i,j)

             j1          j3          j4          j5          j7          j8          j9         j10         j11         j12

i1                                                        0.711
i2                    0.711
i3                                0.711
i4                                                                                                        0.711
i5                                                                                                                    0.711
i6                                                                    0.711
i7                                                                                            0.711
i8        0.711
i9                                            0.711
i10                                                                               0.711


----     72 VARIABLE r.L  abs(residuals)

             j1          j3          j4          j5          j7          j8          j9         j10         j12

i1                                                        0.006
i2                    0.139
i3                                0.010
i5                                                                                                        0.009
i6                                                                    0.021
i7                                                                                      6.137042E-4
i8        0.147
i9                                            0.402
i10                                                                               0.002

The model results are very close. The assignments are the same. Just the \(c\) value and resulting sum of squared or absolute residuals are somewhat different.


----     77 PARAMETER sumdev  sum of squared/absolute deviations

                  dev^2       |dev|           c

minlp model       0.201       0.784       0.695
mip model         0.204       0.737       0.711


A different approach


We can look at the problem in a different way. Instead of minimize the spread around a central value \(c\), we can minimize the bandwidth directly. I.e. we can write: \[\begin{align} \min\> & \mathit{maxv} - \mathit{minv} \\ & \mathit{minv} \le \frac{a_i}{b_j} && \forall x_{i,j}=1\\& \mathit{maxv} \ge \frac{a_i}{b_j} && \forall x_{i,j}=1 \end{align}\] These constraints can be implemented as indicator constraints \[\begin{align} & x_{i,j}=1 \Rightarrow \mathit{minv} \le \frac{a_i}{b_j}\\ & x_{i,j}=1 \Rightarrow \mathit{maxv} \ge \frac{a_i}{b_j}\end{align}\] If we don't have indicator constraints available we need to resort to big-\(M\) constraints:

Alternative MIP Model
\[\begin{align} \min\> & \color{darkred}{\mathit{maxv}} - \color{darkred}{\mathit{minv}} \\ & \sum_j \color{darkred} x_{i,j} = 1 && \forall i \\ & \sum_i \color{darkred} x_{i,j} \le 1 && \forall j \\ & \color{darkred}{\mathit{minv}} \le \frac{\color{darkblue}a_i}{\color{darkblue}b_j} + M (1-\color{darkred} x_{i,j}) && \forall i,j \\ & \color{darkred}{\mathit{maxv}} \ge \frac{\color{darkblue}a_i}{\color{darkblue}b_j} - M (1-\color{darkred} x_{i,j}) && \forall i,j \\ & \color{darkred}x_{i,j} \in \{0,1\} \\ & \color{darkred}{\mathit{maxv}},\color{darkred}{\mathit{minv}} \text{ free} \end{align}\]


The solution looks like:


----    103 VARIABLE x.L  assignment

             j1          j2          j3          j4          j5          j6          j9         j11         j14         j15

i1                                                                                                        1.000
i2                                1.000
i3                                                                                1.000
i4                                                                                                                    1.000
i5                                                                    1.000
i6                                                                                            1.000
i7                    1.000
i8        1.000
i9                                                        1.000
i10                                           1.000


----    103 VARIABLE minv.L                =        0.308  minimum value
            VARIABLE maxv.L                =        0.858  maximum value


A picture of this solution:

Minimize bandwidth solution

The disadvantage of this method is that it does not care about assignments as long as they are inside our optimal bandwidth. We can see this if we recalculate an optimal central \(c\) value afterwards. The results with this reconstructed \(c\) value:



----    114 PARAMETER sumdev  sum of squared/absolute deviations

                  dev^2       |dev|           c

minlp model       0.201       0.784       0.695
mip model         0.204       0.737       0.711
mip model 2       0.313       1.548       0.617

Indeed, we see that the total sum of squared or absolute deviations is not as good as we saw before. The same thing can be seen by just calculating the standard deviation for the selected ratios \(a_i/b_j\). This gives:


----    130 PARAMETER stdev  

minlp model 0.149,    mip model   0.149,    mip model 2 0.186

Again we see this last approach leads to more variability.

Distribution of selected a(i)/b(j)

In this above picture we see the results of this second MIP model not having an incentive to keep values close to each other, besides that they should be within \([\mathit{minv},\mathit{maxv}]\). This model is simpler than the earlier MIP model, but we pay a price: the solution is not as highly concentrated around the central point.

Non-unique \(b_j\)


If we allow the \(b_j\) to be reused, only a small change in the models is needed. We just need to drop the constraint \[\sum_i x_{i,j} \le 1 \] When we do this, we may see a solution like:

Solution. Allow a column to be selected multiple times.

Conclusion


This is turned out to be an interesting problem. A high-level MINLP problem is presented first. We can linearize the model into a MIP, but this becomes a bit messy. Modern MIP solvers offer tools (indicator constraints, built-in absolute value function) that can help here. Finally an alternative, simpler MIP model is proposed. However, this model produces solutions that are more dispersed. 

References


Thursday, June 20, 2019

Assignment: Scipy vs Cplex


Scipy.optimize.linear_sum_assignment

I have never used this routine [1], so lets give this a try. This is a Python implementation of the Hungarian method [2]. The call is exceedingly simple: just give it the cost matrix.

The algorithm allows unbalanced assignment problems by just specifying a rectangular cost matrix. Typically, unbalanced problems are transformed into balanced ones by adding source or destination nodes.

I believe the description is slightly inaccurate:

Description from Scipy.org

This would allow zero assignments, i.e. \(x_{i,j}=0\). It is not always easy to create a watertight English description.

Let's try a problem.

1000x1000 dense assignment problem

When we feed a random problem with 1,000 sources and 1,000 destinations, we see the following timings:



We first read in the data (organized as lists). The second step is to convert this to a NumPy array (otherwise this is done automatically by linear_sum_assignment). Then we solve the problem. On my laptop this takes about 325 seconds.

We can formulate this problem as an LP:\[\begin{align}\min \> & \sum_{i,j} c_{i,j} x_{i,j}\\ & \sum_j x_{i,j} = 1 && \forall i \\ & \sum_i x_{i,j} =1 && \forall j \\ & x_{i,j} \ge 0 \end{align}\]

When we feed this into Cplex we see:

Parallel mode: deterministic, using up to 4 threads for concurrent optimization.
Tried aggregator 1 time.
LP Presolve eliminated 2 rows and 1 columns.
Reduced LP has 1999 rows, 1000000 columns, and 1999000 nonzeros.
Presolve time = 5.95 sec. (770.92 ticks)
Initializing dual steep norms . . .

Iteration log . . .
Iteration:     1   Dual objective     =             1.007485
Iteration:  1113   Dual objective     =             1.661162

Dual simplex solved model.

LP status(1): optimal
Cplex Time: 14.91sec (det. 2309.10 ticks)

Cplex tries several LP algorithms in parallel (concurrent optimization). It turns out Dual Simplex wins. The total solution time is about 15 seconds. So Cplex is 20 times as fast.

We can actually improve on this a bit. This is a network problem, so we can use the Network solver. In addition, we disable the presolver (it is somewhat expensive and does little for this problem).

Extracted network with 2002 nodes and 1000001 arcs.
Extraction time = 0.63 sec. (49.15 ticks)

Iteration log . . .
Iteration:     0   Infeasibility     =          2000.000000 (2000)
Iteration: 10000   Infeasibility     =             0.000000 (7.62808)
Iteration: 20000   Objective         =             5.108089
Iteration: 30000   Objective         =             3.842909
Iteration: 40000   Objective         =             3.147366
Iteration: 50000   Objective         =             2.622157
Iteration: 60000   Objective         =             2.275971
Iteration: 70000   Objective         =             1.990956
Iteration: 80000   Objective         =             1.790689
Elapsed time = 1.33 sec. (82.13 ticks)
Iteration: 90000   Objective         =             1.684441

Network - Optimal:  Objective =    1.6701729950e+00
Network time = 1.67 sec. (119.82 ticks)  Iterations = 92769 (10000)
LP status(1): optimal
Cplex Time: 2.66sec (det. 236.01 ticks)

Now we need only less than 3 seconds (Cplex is 120 times as fast as Scipy).

We would normally expect a specialized assignment solver to do better than a general purpose LP solver. However, Cplex is a highly polished product and we run on the bare metal. Using Python and a simpler implementation just causes us to lose. Note that there are specialized solvers that are much faster.

Some solvers in Scipy are a little bit underwhelming. Examples are the linear programming simplex solver and, as we saw here, the assignment problem solver.

References


  1. scipy.optimize.linear_sum_assignment, https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linear_sum_assignment.html
  2. Harold W. Kuhn. The Hungarian Method for the assignment problem. Naval Research Logistics Quarterly, 2:83-97, 1955.

Wednesday, June 12, 2019

Machine Scheduling

Here [1] is a problem with multiple machines:
  • There are 3 machines and 4 jobs in this example
  • Not all machines can do all jobs
  • We have a few precedence constraints
  • Given known processing times, find a schedule that minimizes the makespan (the time the last job is finished)
Assume the following data:


----     63 --------------- data ---

----     63 SET j  jobs

job1,    job2,    job3,    job4


----     63 SET m  machines

machine1,    machine2,    machine3


----     63 SET ok  allowed job/machine combinations

        machine1    machine2    machine3

job1         YES         YES
job2         YES         YES
job3         YES                     YES
job4                                 YES


----     63 PARAMETER proctime  processing time

job1  4.000,    job2  2.000,    job3 10.000,    job4 12.000


----     63 SET prec  precedence constraints

            job2        job4

job1         YES         YES

There are many ways to model this. Some approaches can be:

  • Discrete time using time slots. This leads to binary variables \[x_{j,m,t} = \begin{cases} 1 & \text{if job $j$ executes on machine $m$ at time $t$}\\ 0 & \text{otherwise}\end{cases}\]
  • Continuous time with a binary variable indicating if job \(j_1\) is executed before job \(j_2\) (on the same machine)
  • Continuous time with a binary variable indicating if job \(j_1\) immediately precedes job \(j_2\) (on the same machine)  
Let's try the second approach. First we need a planning horizon. A simplistic way to find a planning horizon \(T\) is just to schedule each job after another: \[T = \sum_j \mathit{proctime}_j \] For our small data set, this gives:


----     63 PARAMETER T                    =       28.000  max time

This \(T\) is used as a big-\(M\) constant, so we would like it to be as small as possible. For very large problems, we often use some heuristic to find an initial solution, and use this to set the planning horizon \(T\). If that is not possible, we may want to use indicator constraints, so we don't need any \(T\).

Decision Variables


We have two blocks of binary variables. One dealing with assigning jobs to machines and a second one about ordering of jobs on the same machine. The latter is handled by implementing no-overlap constraints. For this we need a binary variable indicating if job \(j_1\) is before or after job \(j_2\).

Binary variables
\[\color{DarkRed}{\mathit{assign}}_{j,m} = \begin{cases} 1 & \text{if job $j$ is placed on machine $m$}\\ 0 & \text{otherwise}\end{cases}\] \[\color{DarkRed}{\mathit{after}}_{j_1,j_2} = \begin{cases} 1 & \text{if job $j_1$ is executed after job $j_2$ when placed on the same machine}\\ 0 & \text{if job $j_1$ is executed before job $j_2$ when placed on the same machine}\end{cases}\]

Note that the variable \(\mathit{after}_{j_1,j_2}\) will be used only for \(j_1 \lt j_2\). This is to avoid double checking the same pair.
We also have a few continuous variables:

Continuous variables
\[\begin{align} & \color{DarkRed}{\mathit{start}}_{j} \ge 0 && \text{start time of job $j$}\\ & \color{DarkRed}{\mathit{finish}}_{j} \ge 0 && \text{finish time of job $j$}\\ & \color{DarkRed}{\mathit{makespan}} \ge 0 && \text{last finish time} \end{align}\]

Constraints


The model can look like:

Mixed Integer Programming Model
\[\begin{align} \min\> & \color{DarkRed}{\mathit{makespan}} \\ & \color{DarkRed}{\mathit{makespan}} \ge \color{DarkRed}{\mathit{finish}}_j && \forall j \\ & \color{DarkRed}{\mathit{finish}}_j = \color{DarkRed}{\mathit{start}}_j + \color{DarkBlue}{\mathit{proctime}}_j && \forall j\\ & \sum_m \color{DarkRed}{\mathit{assign}}_{j,m} = 1 && \forall j\\ & \color{DarkRed}{\mathit{start}}_{j_1} \ge \color{DarkRed}{\mathit{finish}}_{j_2} - \color{DarkBlue}T (1-\color{DarkRed}{\mathit{after}}_{j_1,j_2}) - \color{DarkBlue}T (1-\color{DarkRed}{\mathit{assign}}_{j_1,m})- \color{DarkBlue}T (1-\color{DarkRed}{\mathit{assign}}_{j_2,m}) && \forall m, j_1 \lt j_2 \\ & \color{DarkRed}{\mathit{start}}_{j_2} \ge \color{DarkRed}{\mathit{finish}}_{j_1} - \color{DarkBlue}T \color{DarkRed}{\mathit{after}}_{j_1,j_2} - \color{DarkBlue}T (1-\color{DarkRed}{\mathit{assign}}_{j_1,m})- \color{DarkBlue}T (1-\color{DarkRed}{\mathit{assign}}_{j_2,m}) && \forall m, j_1 \lt j_2 \\ & \color{DarkRed}{\mathit{start}}_{j_2} \ge \color{DarkRed}{\mathit{finish}}_{j_1} && \forall \color{DarkBlue}{\mathit{prec}}_{j_1,j_2} \\ & \color{DarkRed}{\mathit{assign}}_{j,m} = 0 && \forall \text{ not } \color{DarkBlue}{\mathit{ok}}_{j,m} \end{align}\]

The no-overlap constraints are the most complicated in this model. If the problem was a single machine scheduling problem, the no-overlap constraints would look like: \[\mathit{start}_{j_1} \ge \mathit{finish}_{j_2} \text{ or } \mathit{start}_{j_2} \ge \mathit{finish}_{j_1} \>\> \forall j_1 \lt j_2\] Note that we only need to compare jobs \(j_1 \lt j_2\) (each pair needs to be checked only once). The OR condition can be implemented using a binary variable, and big-\(M\) constraints: \[\begin{align} & \mathit{start}_{j_1} \ge \mathit{finish}_{j_2} - T (1-{\mathit{after}}_{j_1,j_2}) && \forall j_1 \lt j_2 \\ & \mathit{start}_{j_2} \ge \mathit{finish}_{j_1} - T {\mathit{after}}_{j_1,j_2} && \forall j_1 \lt j_2 \end{align}\] i.e. job \(j_1\) executes before or after job \(j_2\), but not in parallel. When we have multiple machines, we only want to keep this constraint active when jobs \(j_1\) and \(j_2\) are assigned to the same machine. That is: jobs are allowed to execute in parallel when they are on different machines. In a sense, we have built some nested big-\(M\) constraints.

Results


After running this model we get as results:


            --------------- solution -----

----     66 VARIABLE assign.L  assign job to machine

        machine1    machine2    machine3

job1                   1.000
job2                   1.000
job3       1.000
job4                               1.000


----     66 VARIABLE start.L  job start time

job2 4.000,    job4 4.000


----     66 VARIABLE finish.L  job finish time

job1  4.000,    job2  6.000,    job3 10.000,    job4 16.000


----     66 VARIABLE makespan.L            =       16.000  time last job is finished

----     66 VARIABLE after.L  j1 is scheduled after j2 on machine m

            job3

job1       1.000


How should we interpret the variable \(\mathit{after}\)? Only jobs 1 and 2 are on the same machine and for this combination we have \(\mathit{after}_{1,2}=0\). This means job1 is not after job2, and thus: job2 must be after job1. This is indeed what is happening. The values for \(\mathit{after}\) for jobs not on the same machine are basically just random: they are not used in active constraints. This means: the printed value \(\mathit{after}_{1,3}=1\) is not relevant as jobs 1 and 3 are on different machines.


Graphically this looks like:


We see that this solution obeys the precedence constraints (jobs 2 and 4 after job 1). Furthermore, it only assigns jobs to allowed machines (e.g. job 4 can only run on machine 3).

Larger data set


The above data set is very small. Such a data set is great during development. A beginners mistake I often see is using a large data set during model development. It is much more convenient and efficient using a very small data set when things are in flux, and you do many runs just to get the model right.

Now, it is important to see how the model behaves with a large data set. I used random data with 8 machines and 50 jobs. For this problem we only needed a few seconds to solve to guaranteed optimality. The model had about 20k constraints, 1,700 variables (of which 1,400 binary variables).

solution of 8 machine, 50 job problem with random data

The performance seems to be quite reasonable. The main problem is that for large models the number of constraints becomes big, and the big-\(M\) constraints may not be as tight as we want. But for a first formulation, not so bad.

References


Tuesday, May 28, 2019

Bad question?

In [1] a user asks:

I need to build a MILP (Mixed integer linear programming) constraint form this if-else statement: with beta is a constant.
if (a > b) then c = beta else c = 0
How can I build the statement to MILP constraint. Are there any techniques for solving this problem. Thank you.

In my opinion this is a difficult question to answer. To put it bluntly: this is a poor question because we lack lots of information.

  1. We are not sure which identifiers are decision variables and which ones are parameters. It is stated that beta is a constant, so probably we can deduce that all other identifiers refer to variables. In general, it is important to state these things explicitly.
  2. For the variables: we don't know the types (binary, integer, positive, free).
  3. We don't see the rest of the model. In its most general form, this if-construct is a somewhat difficult thing to handle. In practice however, we can almost always exploit knowledge from the model to simplify things considerably. I almost never encounter models where we have to handle this construct in its most general form. 
  4. Expanding on the previous point: one thing to look at is how the objective behaves. It may push variables in a certain direction, which we may exploit. Often this leads to substantial simplifications. In many cases we can drop an implication (e.g. drop the else part).
  5. We don't know what solver or modeling system is being used. Some tools have good support for implications (a.k.a. indicator constraints), which can make things much easier. In other cases, we may need to use big-M constraints.
  6. If  \(a\), \(b\) and \(c\) are (free) continuous variables, the condition \(\gt\) is a bit ambiguous. Do we really mean \(\gt\) or can we use \(\ge\)? In my models, I tend to use \(\ge\) and make \(a=b\) ambiguous: we can pick the best decision for this case (I don't want to skip a more profitable solution because of some epsilon thing). 
  7. Using assumptions in the previous point, we can write \[\begin{align} &\delta = 1 \Rightarrow a \ge b\\ & \delta=0 \Rightarrow a \le b\\ & c = \delta \cdot \beta\\ & \delta \in \{0,1\}\end{align}\] Note that the \(c\) constraint is linear: \(\beta\) is a parameter.
  8. This can be translated into  \[\begin{align} & a \ge b - M(1-\delta)\\ & a \le b + M\delta \\ & c = \delta \cdot \beta\\ & \delta \in \{0,1\}\end{align}\] To determine good values for big-\(M\) constants, again, we need to know more about the model.
  9. If you insist on \(a\gt b\), we can introduce some tolerance \(\varepsilon>0\) and write: \[\begin{align} &\delta = 1 \Rightarrow a \ge b + \varepsilon \\ & \delta=0 \Rightarrow a \le b\\ & c = \delta \cdot \beta\\ & \delta \in \{0,1\}\end{align}\] Here \(\varepsilon\) should be larger than the feasibility tolerance of the solver (scaling may make this not completely obvious). Note that we effectively create a "forbidden region". The variable \(a\) can not assume any value in the interval \((b,b+\varepsilon)\) (again, subject to feasibility tolerances).
  10. Of course when we have integer variables \(a \gt b\) is much more well-defined and we can interpret that as \(a \ge b+1\).

So the best answer is: I would need to look at the whole model to give a good answer.

These type of questions are quite common, and answering them is just very difficult. You can not expect a reply that enumerates all possible answers for all possible cases.

References


  1. Build MILP constraint from if-else statements, https://stackoverflow.com/questions/55899166/build-milp-constraint-from-if-else-statements