Sunday, November 11, 2018

Selecting Chess Players



In [1] a simple problem was posted:
  • We have a population of \(N=24\) players, each with an ELO rating \(r_i\)
  • We need to select \(2 \times 6\) players for 2 teams (each team has \(n=6\) members).
  • We want to minimize the difference in average ELO ratings of the teams.
The poster asked for an algorithm. But, of course, this looks like a problem we can solve as a mathematical programming model. 

First I generated some random data:


----     11 PARAMETER r  ELO rating

player1  1275,    player2  1531,    player3  1585,    player4   668,    player5  1107,    player6  1011
player7  1242,    player8  1774,    player9  1096,    player10 1400,    player11 1036,    player12 1538
player13 1135,    player14 1206,    player15 2153,    player16 1112,    player17  880,    player18  850
player19 1528,    player20 1875,    player21  939,    player22 1684,    player23 1807,    player24 1110

I have no idea if these numbers are realistic or not.

It make sense to look at this from an assignment problem point of view. It is amazing how often this concept is encountered in modeling. So we define:\[x_{i,j}=\begin{cases} 1 & \text{if player $i$ is assigned to team $j$}\\ 0 & \text{otherwise}\end{cases}\]

A high-level model can look like:


High-level Model
\[\begin{align}\min\>& | \color{DarkRed}{\mathit{avg}}_2 - \color{DarkRed}{\mathit{avg}}_1 | \\ & \sum_j \color{DarkRed} x_{i,j} \le 1 && \forall i\\ & \sum_i \color{DarkRed} x_{i,j} = \color{DarkBlue} n && \forall j \\ & \color{DarkRed}{\mathit{avg}}_j = \frac{\displaystyle \sum_i \color{DarkBlue} r_i \color{DarkRed} x_{i,j}}{\color{DarkBlue} n}  \\ & \color{DarkRed}x_{i,j} \in \{0,1\}\end{align} \]

As you can see, this model is largely an assignment problem plus some average calculations.

Notes:

  • The absolute value is easily linearized, e.g. \[\begin{align} \min\> & z \\ & -z \le \mathit{avg}_2-\mathit{avg}_1\le z\end{align}\]
  • We can use the sum instead of the average

Surprisingly, the solution looks like:


----     43 VARIABLE x.L  assignment

               team1       team2

player1        1.000
player2                    1.000
player4        1.000
player5                    1.000
player6                    1.000
player7        1.000
player8        1.000
player9        1.000
player10                   1.000
player11                   1.000
player17       1.000
player18                   1.000


----     43 VARIABLE avg.L  average rating of team

team1 1155.833,    team2 1155.833


----     43 PARAMETER report  solution report

               team1       team2

player1     1275.000
player2                 1531.000
player4      668.000
player5                 1107.000
player6                 1011.000
player7     1242.000
player8     1774.000
player9     1096.000
player10                1400.000
player11                1036.000
player17     880.000
player18                 850.000
sum         6935.000    6935.000
avg         1155.833    1155.833

We achieved a perfect match!

The probability of this must be somewhat low. Well, no. If we try this 10 times with different random \(r_i\), we get 10 times a perfect match.


----     50 PARAMETER trace  results for each solve

           avg1        avg2

k1     1155.833    1155.833
k2     1583.333    1583.333
k3     1385.333    1385.333
k4     1258.500    1258.500
k5     1423.167    1423.167
k6     1491.833    1491.833
k7     1262.167    1262.167
k8     1736.167    1736.167
k9     1514.167    1514.167
k10    1483.667    1483.667


Using a solve loop of length 100 gives the same result. It looks that we almost always can form two teams with equal average ELO rating. I suspect it will be very difficult to derive anything analytically about the probability of not being able to find a perfect match. I was surprised by these results.

Discussion


Often, people with a Computer Science/programming background immediately think about algorithms instead of mathematical models. They miss out on a paradigm that can be very powerful and efficient  (in terms of time needed to design, implement and maintain an algorithm or model). This model was thought out, implemented and tested in less than 20 minutes (including finding out what this ELO is). I am sure developing some special purpose algorithm will take much more time. In addition, for this model, the solver will find proven optimal solutions while developing an algorithm yourself will likely result in some heuristic without any concept of optimality.

References


Saturday, November 10, 2018

Quadratic Programming with Binary Variables

Quadratic Programming models where the quadratic terms involve only binary variables are interesting from a modeling point view: we can apply different reformulations.

Let's have a look at the basic model:

0-1 Unconstrained Non-convex Quadratic Programming
\[\begin{align}\min\>& \color{DarkRed}x^{T} \color{DarkBlue}Q \color{DarkRed}x + \color{DarkBlue} c^{T}\color{DarkRed}x\\ & \color{DarkRed}x_i \in \{0,1\}\end{align} \]

Only if the matrix \(Q\) is positive definite we have a convex problem. So, in general, the above problem is non-convex. To keep things simple, I have no constraints and no additional continuous variables (adding those does not not really change the story).

Test data


To play a bit a with this model, I generated random data:

  • Q is about 25% dense (i.e. about 75% of the entries \(q_{i,j}\) are zero). The nonzero entries are drawn from a uniform distribution between -100 and 100.
  • The linear coefficients are uniformly distributed \(c_i \sim U(-100,100)\).
  • The size of the model is: \(n=75\) (i.e. 75 binary variables). This is relative small, so the hope is we can solve this problem quickly. As we shall see the results will be very mixed.

Local MINLP solvers


Many local MINLP solvers tolerate non-convex problems, but they will not produce a global optimum. So we see:

SolverObjectiveTimeNotes
SBB -7558.62350.5Local optimum
Knitro-7714.5721 0.4Id.
Bonmin-7626.79751.3Id.

All solvers used default settings and timings are in seconds. It is not surprising that these local solvers find different local optima. For all solvers, the relaxed solution was almost integer and just a few nodes were needed to produce an integer solution. This looks promising. Unfortunately, we need to contain our optimism.

Global MINLP Solvers


Global MINLP solvers are in theory well-equipped to solve this model. Unfortunately, they are usually quite slow. For this example, we see a very wide performance range:

SolverObjectiveTimeNotes
Baron-7760.177182
Couenne-7646.5987>3600Time limit, gap 25%
Antigone-7760.1771252

Couenne is struggling with this model. Baron and Antigone are doing quite good on this model. We can further observe that the local solvers did not find the global optimal solution.

MIQP Solvers


If we just use an MIQP solver, we may get different results, depending on the solvers. If the solver expects a convex model, it will refuse to solve the model. Other solvers may use some automatic reformulation. Let's try a few:

SolverObjectiveTimeNotes
MosekQ not positive definite
Cplex-7760.1771 27Automatically reformulated to a MIP
Gurobi -7760.1760>9999Time limit, gap 37% (Gurobi 8.0)


Most solvers have options to influence what reformulations are applied. Here we ran with default settings. MIQP solvers tend to have many options, including those that influence automatic reformulations. I just used defaults, assuming "the solver knows best what to do".

The global MINLP solvers Baron and Antigone did not do bad at all. It is noted that Gurobi 8.1 has better MIQP performance [2] (hopefully it does much better than what we see here). It is noted that we can force Gurobi to linearize the MIQP model using the solver option preqlinearize 1, and in that case it solves fast.

Perturb Diagonal


For borderline non-convex models, it is not unusual to see messages from a quadratic solver that the diagonal of \(Q\) has been perturbed to make the problem convex. Here we do the same thing in the extreme [1].

Background: a matrix \(Q\) is positive definite (positive semi-definite) if all eigenvalues \(\lambda_i \gt 0\) (\(\lambda_i\ge 0\)). If there are negative eigenvalues, we can conclude \(\min x^TQx\) is a non-convex problem. From this we see that the sign of the smallest eigenvalue \(\lambda_{min}\) plays an important role.
To calculate the smallest eigenvalue we first have to make \(Q\) symmetric (otherwise we would get complex eigenvalues). This can easily be done by replacing \(Q\) by \(0.5(Q^T+Q)\). This operation will not change the values of the quadratic form \(x^TQx\).

If after calculating the smallest eigenvalue \(\lambda_{min}\), we observe \(\lambda_{min} \lt 0\), we can form  \[\widetilde{Q} = Q - \lambda_{min} I \] Note that we actually add a positive number to the diagonal as  \(\lambda_{min}\lt 0\). To compensate we need to add to the objective a linear term of the form \[\sum_i  \lambda_{min} x_i^2 = \sum_i  \lambda_{min} x_i\] (for binary variables we have \(x_i^2=x_i\)). With this trick, we made the problem convex.

For our data set we have \(\lambda_{min} = -353.710\). To make sure we are becoming convex, I added a very generous tolerance: \(\lambda_{min}-1\). So I used: \(\widetilde{Q} = Q - (\lambda_{min}-1) I \).

Convexified Model
\[\begin{align}\min\>& \color{DarkRed} x^T \left( \color{DarkBlue} Q - (\lambda_{min}-1) I \right) \color{DarkRed} x + \left(\color{DarkBlue} c + (\lambda_{min}-1) \right)^T \color{DarkRed} x \\ & \color{DarkRed}x_i \in \{0,1\}\end{align} \]

With this reformulation we obtained a convex MIQP. This means for instance that a solver like Mosek is back in play, and that local solvers will produce global optimal solutions. Let's try:

SolverObjectiveTimeNotes
Mosek-7760.1771725
Knitro-7760.1771 2724Node limit, gap: 3%
Bonmin-7760.1771>3600Time limit, gap: 6%

These results are a little bit slower than I expected, especially when comparing to the performance of the global solvers Baron and Antigone. These results are also much slower than the first experiment with local solvers where we found integer feasible local solutions very fast.

Note. We could have started by removing all diagonal elements from \(Q\) and moving them into \(c\). This is again based on the fact that \(x_i^2 = x_i\).  I did not do this step in this experiment.

Linearization


We already saw that some solvers (such as Cplex) apply a linearization automatically. Of course we can do this ourselves.

The first thing we can do to help things along is to make \(Q\) a triangular matrix. We can do this by: \[\tilde{q}_{i,j} = \begin{cases} q_{i,j}+q_{j,i} & \text{if $i \lt j$} \\ q_{i,j} & \text{if $i=j$}\\ 0 & \text{if $i \gt j$}\end{cases}\]

The next thing to do is to introduce variables \(y_{i,j} = x_i x_j\). This binary multiplication can be linearized easily: \[\begin{align} & y_{i,j} \le x_i \\ & y_{i,j} \le x_j \\ & y_{i,j} \ge x_i + x_j -1 \\ & 0 \le y_{i,j} \le 1 \end{align}\] In the actual model, we can skip a few of these inequalities by observing in which directions the objective pushes variables \(y_{i,j}\) (see [1]).


Linearized Model
\[\begin{align}
\min\>& \sum_{i,j|i\lt j} \color{DarkBlue}{\tilde{q}}_{i,j} \color{DarkRed} y_{i,j} + \sum_i  \left( \color{DarkBlue} {\tilde{q}}_{i,i} + \color{DarkBlue} c_i \right) \color{DarkRed} x_i  \\
&  \color{DarkRed}y_{i,j} \le \color{DarkRed}x_i && \forall i\lt j, \color{DarkBlue} {\tilde{q}}_{i,j} \lt 0 \\
&  \color{DarkRed}y_{i,j} \le \color{DarkRed}x_j && \forall i\lt j, \color{DarkBlue} {\tilde{q}}_{i,j} \lt 0 \\
&  \color{DarkRed}y_{i,j} \ge \color{DarkRed}x_i +\color{DarkRed}x_j -1 && \forall i\lt j, \color{DarkBlue} {\tilde{q}}_{i,j} \gt 0 \\

&  0 \le \color{DarkRed}y_{i,j} \le 1 &&  \forall i\lt j, \color{DarkBlue} {\tilde{q}}_{i,j} \ne 0 \\
&  \color{DarkRed}x_i \in \{0,1\} \\  \end{align} \]

This model does not care whether the original problem is convex or not. Let's see how this works:

SolverObjectiveTimeNotes
Cplex-7760.177141
CBC-7760.1771 6488

It is known this MIP is not so easy to solve. A commercial MIP solver may be required to get good solution times. Here we see that Cplex (commercial) is doing much better than CBC (open source).

Conclusion


The problem under consideration: an unconstrained MIQP with just \(n=75\) binary variables, is not that easy to solve. The overall winning strategy is to use a commercial MIP solver against a manually or automatically reformulated MIP model. Solving the MIQP directly is just very difficult for many solvers. The global solver Baron does a surprisingly good job. It is noted that if the data or the problem size changes, these performance figures may shift (a lot).

Update


An earlier version of this post had a much slower performance for Cplex MIQP. When rerunning this, I could not reproduce this, so this must have been a note taking error on my side (I suspect I was comparing with a result with \(n=100\)). Now, Cplex MIQP and Cplex MIP on the manually reformulated model perform comparable. My faith in Cplex automatic reformulation is fully restored (and my faith in my note taking skills further reduced). Apologies for this.

References

  1. Billionnet, A. and Elloumi, S., Using a mixed integer quadratic programming solver for the unconstrained quadratic 0-1 problem. Math. Program. 109 (2007) pp. 55–68
  2. http://yetanothermathprogrammingconsultant.blogspot.com/2018/10/gurobi-81.html

Wednesday, October 31, 2018

Strange objective

In [1], a question was posted how to use the \(\mathit{sign}()\) function in the SCIP solver. The problem to solve is

argmax(w) sum(sign(Aw) == sign(b)) 

This is a strange objective. Basically: find \(w\), with \(v=Aw\) such that we maximize the number of  \(v_i\) having the same sign as \(b_i\). I have never seen such an objective.

As \(A\) and \(b\) are constants, we can precompute \[ \beta_i = \mathrm{sign}(b_i)\] This simplifies the situation a little bit (but I will not need it below).

A different way to say "\(v_i\) and \(b_i\) have the same sign" is to state: \[ v_i b_i > 0 \] I assumed here \(b_i \ne 0\). Similarly, the constraint \( v_i b_i < 0\) means: "\(v_i\) and \(b_i\) have the opposite sign."

If we introduce binary variables: \[\delta_i = \begin{cases} 1 & \text{if $ v_i b_i > 0$}\\ 0 & \text{otherwise}\end{cases}\] a model can look like: \[\begin{align} \max & \sum_i \delta_i \\ &\delta_i =1 \Rightarrow \sum_j a_{i,j} b_i w_j > 0 \\ & \delta_i \in \{0,1\}\end{align}\] The implication can be implemented using indicator constraints, so we have now a linear MIP model.

Notes:

  • I replaced the \(\gt\) constraint by \(\sum_j a_{i,j} b_i w_j \ge 0.001\) 
  • If the \(b_i\) are very small or very large we can replace them by \(\beta_i\), i.e. \(\sum_j a_{i,j} \beta_i w_j \gt 0\)
  • The case where some \(b_i=0\) is somewhat ignored here. In this model, we assume \(\delta_i=0\) for this special case. 
  • We can add explicit support for  \(b_i=0\) by:  \[\begin{align} \max & \sum_i \delta_i \\ &\delta_i =1 \Rightarrow \sum_j a_{i,j} b_i w_j > 0 && \forall i | b_i\ne 0 \\ & \delta_i =1 \Rightarrow \sum_j a_{i,j} w_j = 0 && \forall i | b_i = 0  \\ & \delta_i \in \{0,1\}\end{align}\]
  • We could model this with binary variables or SOS1 variables. Binary variables require big-M values. It is not always easy to find good values for them. The advantage of indicator constraints is that they allow an intuitive formulation of the problem while not using big-M values. 
  • Many high-end solvers (Cplex, Gurobi, Xpress, SCIP) support indicator constraints. Modeling systems like AMPL also support them.

Test with small data set


Let's do a test with a small random data set.



----     17 PARAMETER a  random matrix

             j1          j2          j3          j4          j5

i1       -0.657       0.687       0.101      -0.398      -0.416
i2       -0.552      -0.300       0.713      -0.866 4.213380E-4
i3        0.996       0.157       0.982       0.525      -0.739
i4        0.279      -0.681      -0.500       0.338      -0.129
i5       -0.281      -0.297      -0.737      -0.700       0.178
i6        0.662      -0.538       0.331       0.552      -0.393
i7       -0.779       0.005      -0.680       0.745      -0.470
i8       -0.428       0.188       0.445       0.256      -0.072
i9       -0.173      -0.765      -0.372      -0.907      -0.323
i10      -0.636       0.291       0.121       0.540      -0.404
i11       0.322       0.512       0.255      -0.432      -0.827
i12      -0.795       0.283       0.091      -0.937       0.585
i13      -0.854      -0.649       0.051       0.500      -0.644
i14      -0.932       0.170       0.242      -0.221      -0.283
i15      -0.514      -0.507      -0.739       0.867      -0.240
i16       0.567      -0.400      -0.749       0.498      -0.862
i17      -0.596      -0.990      -0.461 -2.97050E-4      -0.697
i18      -0.652      -0.339      -0.366      -0.356       0.928
i19       0.987      -0.260      -0.254       0.544      -0.207
i20       0.826      -0.761       0.471      -0.889       0.153
i21      -0.897      -0.988      -0.198       0.040       0.258
i22      -0.549      -0.208      -0.448      -0.695       0.873
i23      -0.155      -0.731      -0.228      -0.251      -0.463
i24       0.897      -0.622      -0.405      -0.851      -0.197
i25      -0.797      -0.232      -0.352      -0.616      -0.775


----     17 PARAMETER b  random rhs

i1   0.193,    i2   0.023,    i3  -0.910,    i4   0.566,    i5   0.891,    i6   0.193,    i7   0.215,    i8  -0.275
i9   0.188,    i10  0.360,    i11  0.013,    i12 -0.681,    i13  0.314,    i14  0.048,    i15 -0.751,    i16  0.973
i17 -0.544,    i18  0.351,    i19  0.554,    i20  0.865,    i21 -0.598,    i22 -0.406,    i23 -0.606,    i24 -0.507
i25  0.293


----     17 PARAMETER beta  sign of b

i1   1.000,    i2   1.000,    i3  -1.000,    i4   1.000,    i5   1.000,    i6   1.000,    i7   1.000,    i8  -1.000
i9   1.000,    i10  1.000,    i11  1.000,    i12 -1.000,    i13  1.000,    i14  1.000,    i15 -1.000,    i16  1.000
i17 -1.000,    i18  1.000,    i19  1.000,    i20  1.000,    i21 -1.000,    i22 -1.000,    i23 -1.000,    i24 -1.000
i25  1.000


----     53 VARIABLE w.L  

j1 -0.285,    j2 -0.713,    j3 -0.261,    j4 -0.181,    j5 -0.630


----     53 PARAMETER v  sum(j, a(i,j)*w(j))

i1   0.005,    i2   0.342,    i3  -0.282,    i4   0.556,    i5   0.498,    i6   0.256,    i7   0.557,    i8  -0.129
i9   1.059,    i10  0.099,    i11  0.076,    i12 -0.197,    i13  1.008,    i14  0.299,    i15  0.695,    i16  0.771
i17  1.435,    i18  0.003,    i19  0.002,    i20  0.249,    i21  0.843,    i22 -0.002,    i23  0.962,    i24  0.571
i25  1.084


----     53 VARIABLE delta.L  

i1  1.000,    i2  1.000,    i3  1.000,    i4  1.000,    i5  1.000,    i6  1.000,    i7  1.000,    i8  1.000
i9  1.000,    i10 1.000,    i11 1.000,    i12 1.000,    i13 1.000,    i14 1.000,    i16 1.000,    i18 1.000
i19 1.000,    i20 1.000,    i22 1.000,    i25 1.000


----     53 VARIABLE z.L                   =       20.000  objective variable

This means, for this 25 row problem we can find \(w\)'s such that 20 rows yield the same sign as \(b_i\).


References


  1. SCIP What is the function for sign?, https://stackoverflow.com/questions/53030430/scip-what-is-the-function-for-sign

Thursday, October 25, 2018

Gurobi 8.1

Quite some improvements in MIQP performance. Of course the smaller improvements in other model types also help: over time these things add up to substantial performance gains.


Announcement


Gurobi is pleased to announce the release of Gurobi Version 8.1. This latest version improves the overall performance of Gurobi Optimizer, and adds enhancements to Gurobi Instant Cloud, including support for Microsoft Azure® and for the latest Amazon Web Services® machines, and more.

Version 8.1 demonstrates our commitment to delivering the new features our users request, and includes:

Performance Improvements

Gurobi Optimizer v8.1 continues to push the envelope of solver speed and performance. The overall v8.1 performance improvements versus v8.0 include:

MIQP 
  • More than a factor of 2.8x faster overall.
  • More than a factor of 6x faster on difficult models that take more than 100 seconds to solve.
MIQCP
  • 38% faster overall.
  • 92% faster on difficult models that take more than 100 seconds to solve.
LP
  • 2.9% faster overall in default settings.
  • 6.5% faster on difficult models that take more than 100 seconds to solve.
LP barrier
  • 4.4% faster overall.
  • 11% faster on difficult models that take more than 100 seconds to solve.
LP dual simplex
  • 4.2% faster overall.
  • 10.5% faster on difficult models that take more than 100 seconds to solve.

Enhancements
  • Gurobi Instant Cloud now supports Microsoft Azure®: Instant Cloud users can now use Microsoft Azure, in several regions.
  • Gurobi Instant Cloud adds faster and more powerful machines on Amazon EC2®: The new version supports c5, r5 and z1 instance types.
  • New Q matrix linearization for MIQP and MIQCP models: We added a new Q matrix linearization approach in presolve. This new option can be chosen by setting parameter PreQLinearize to the new value of 2.
  • Improved Mac Installation Package: Users no longer need to install Xcode to perform the installation.
  • Support for Python 3.7: We have added support for Python 3.7 on Windows, Linux and Mac platforms.
  • A callback function for multi-objective optimization: We now let users terminate optimization for individual objectives for multi-objective MIP models.
To learn more about all of the new features and improvements, visit What's New in Gurobi 8.1.

Wednesday, October 17, 2018

The Muffin Problem



We are asked to split \(m=5\) muffins between \(s=3\) students, such that each student gets in total \[\frac{m}{s} = \frac{5}{3}\] worth of muffins [1]. From [2] we see two possible ways of doing this:

Allocation 1 (from [2])

Allocation 2 (from [2])

(I believe Proc means Protocol). The problem is to find a way to divide the muffins such that the smallest piece is maximized.

There is a nice and simple MIP formulation for this. Let's define \(x_{i,j} \in [0,1]\) as the fraction of muffin \(i\) assigned to student \(j\). Also we need: \[\delta_{i,j} = \begin{cases} 1 & \text{if $x_{i,j} \gt 0$} \\ 0 & \text{if $x_{i,j}=0$}\end{cases}\] Then we can write:


Muffin Problem
\[\begin{align}\max\> & \color{DarkRed} z \\ & \sum_i \color{DarkRed} x_{i,j} = \color{DarkBlue} {\frac{m}{s}} && \forall j\\ & \sum_j \color{DarkRed} x_{i,j} = 1  && \forall i\\ & \color{DarkRed} \delta_{i,j} = 0 \Rightarrow \color{DarkRed} x_{i,j} = 0 \\ & \color{DarkRed} \delta_{i,j} = 1 \Rightarrow \color{DarkRed} z \le \color{DarkRed} x_{i,j} \\ & 0 \le \color{DarkRed} x_{i,j} \le 1\\ & \color{DarkRed} \delta_{i,j} \in \{0,1\}\end{align}\]

The objective takes care of \(x_{i,j}=0 \Rightarrow \delta_{i,j}=0\). Some MIP solvers allow us to use the implications directly (as so-called indicator constraints). For others we need to reformulate. It is not difficult to rewrite them as inequalities:

ImplicationInequality
\[\color{DarkRed} \delta_{i,j} = 0 \Rightarrow \color{DarkRed} x_{i,j} = 0 \]\[\color{DarkRed} x_{i,j} \le \color{DarkRed} \delta_{i,j}\]
\[\color{DarkRed} \delta_{i,j} = 1 \Rightarrow \color{DarkRed} z \le \color{DarkRed} x_{i,j} \]\[\color{DarkRed} z \le \color{DarkRed} x_{i,j}+ (1-\color{DarkRed} \delta_{i,j})\]


This is similar to what is proposed in [1] (they use \(y_{i,j} = 1-\delta_{i,j}\)) and somewhat simpler than the approach used in [3].

The results are:

----     26 VARIABLE x.L  fraction of muffin assigned to student

           student1    student2    student3

muffin1       0.500                   0.500
muffin2       0.583       0.417
muffin3                   0.417       0.583
muffin4       0.583       0.417
muffin5                   0.417       0.583


----     26 VARIABLE d.L  indicator for nonzero x

           student1    student2    student3

muffin1       1.000                   1.000
muffin2       1.000       1.000
muffin3                   1.000       1.000
muffin4       1.000       1.000
muffin5                   1.000       1.000


----     26 VARIABLE z.L                   =        0.417  smallest nonzero x


If student1 arrives early and confiscates muffin1, we can fix \(x_{\text{muffin1},\text{student1}}=1\). With this we can reproduce the first solution:


----     28 VARIABLE x.L  fraction of muffin assigned to student

           student1    student2    student3

muffin1       1.000
muffin2                   1.000
muffin3       0.333       0.667
muffin4                               1.000
muffin5       0.333                   0.667


----     28 VARIABLE z.L                   =        0.333  smallest nonzero x

A solution for 7 muffins and 5 students can look like:


----     26 VARIABLE x.L  fraction of muffin assigned to student

           student1    student2    student3    student4    student5

muffin1       0.667       0.333
muffin2       0.333                   0.667
muffin3       0.400       0.600
muffin4                               0.333       0.333       0.333
muffin5                   0.467                               0.533
muffin6                                           0.467       0.533
muffin7                               0.400       0.600


----     26 VARIABLE z.L                   =        0.333  smallest nonzero x

Not all problems are super easy to solve to proven optimality. E.g. with 11 muffins and 9 students, I had to spend several minutes. As usual the solver quickly found the optimal solution, but proving optimality was not so quick. There is lots of symmetry in the model. That may be something to exploit.

References

Tuesday, October 9, 2018

A pure fixed charge transportation problem

The standard transportation problem is an example of an easy linear programming problem:

Transportation Problem
\[\begin{align}\min & \sum_{i,j} \color{DarkBlue} c_{i,j} \color{DarkRed} x_{i,j}\\ & \sum_j \color{DarkRed} x_{i,j} = \color{DarkBlue} {\mathit{supply}}_i\\ & \sum_i \color{DarkRed} x_{i,j} = \color{DarkBlue} {\mathit{demand}}_j \\ & \color{DarkRed} x_{i,j} \ge 0\end{align}\]

We can solve very large instances very quickly. The fixed charge transportation problem is much more difficult:

Fixed Charge Transportation Problem
\[\begin{align}\min & \sum_{i,j} \color{DarkBlue} c_{i,j} \color{DarkRed} x_{i,j} + \color{DarkBlue} d_{i,j} \color{DarkRed} y_{i,j}\\ & \sum_j \color{DarkRed} x_{i,j} = \color{DarkBlue} {\mathit{supply}}_i\\ & \sum_i \color{DarkRed} x_{i,j} = \color{DarkBlue}{\mathit{demand}}_j \\ & 0 \le \color{DarkRed} x_{i,j} \le \color{DarkBlue} M \color{DarkRed} y_{i,j}\\ & \color{DarkRed} y_{i,j} \in \{0,1\}\end{align}\]

Here we have an additional cost \(d_{i,j}\) if link \(x_{i,j}\) is used (i.e. if \(x_{i,j}\gt 0\)).  The binary variable \(y_{i,j}\) indicates if link \(i \rightarrow j\) is used. I.e. we want: \[y_{i,j} = 0 \Rightarrow x_{i,j}=0\] The constraint \(x_{i,j} \le M y_{i,j}\) is a big-M formulation for this. Notice that we do not need to worry about \(x_{i,j}=0 \Rightarrow y_{i,j}=0\): this will be taken care of automatically (because of the objective function). In practice we can do \[\begin{align} & x_{i,j} \le M_{i,j} y_{i,j}\\ & M_{i,j} = \min \{ \mathit{supply}_i,  \mathit{demand}_j \}\end{align}\] This gives some reasonable values for \(M_{i,j}\). A problem with only fixed charge costs (i.e. \(c_{i,j}=0\)) is sometimes called a pure fixed charge transportation problem. If we just want to minimize the number of links that are being used we can set \(c_{i,j}=0\) and \(d_{i,j}=1\). These models can be incredibly difficult to solve to proven optimality.

Here we try to solve a very small problem with 10 source nodes and 10 destination nodes. I used random values for the supply and demand:


----     20 PARAMETER supply  

src1  0.041,    src2  0.203,    src3  0.132,    src4  0.072,    src5  0.070,    src6  0.054,    src7  0.084
src8  0.206,    src9  0.016,    src10 0.120


----     20 PARAMETER demand  

dest1  0.178,    dest2  0.103,    dest3  0.177,    dest4  0.136,    dest5  0.023,    dest6  0.114,    dest7  0.028
dest8  0.045,    dest9  0.119,    dest10 0.078


The objective is \(\min \sum_{i,j} y_{i,j}\), i.e. we minimize the number of open links. This means we have just 100 binary variables. After setting a time limit of 9999 seconds I see:

. . .
 5938307 5589691       15.7946    12       18.0000       15.0950 1.71e+08   16.14%
 5946634 5596825       15.1040    25       18.0000       15.0950 1.72e+08   16.14%
 5959435 5611506       15.1737    19       18.0000       15.0950 1.72e+08   16.14%
Elapsed time = 9977.59 sec. (3226694.74 ticks, tree = 38155.78 MB, solutions = 3)
Nodefile size = 36108.12 MB (23960.92 MB after compression)
 5965049 5616397       16.0000    19       18.0000       15.0950 1.72e+08   16.14%

Cover cuts applied:  264
Flow cuts applied:  5
Mixed integer rounding cuts applied:  331
Zero-half cuts applied:  4
Multi commodity flow cuts applied:  4

Root node processing (before b&c):
  Real time             =    0.88 sec. (41.53 ticks)
Parallel b&c, 8 threads:
  Real time             = 10021.94 sec. (3248441.83 ticks)
  Sync time (average)   = 5484.02 sec.
  Wait time (average)   =    0.11 sec.
                          ------------
Total (root+branch&cut) = 10022.81 sec. (3248483.36 ticks)
MIP status(107): time limit exceeded
Cplex Time: 10023.03sec (det. 3248483.36 ticks)

Ouch.. If you would have asked before, I certainly would have predicted somewhat better performance.

MIP bounds don't change anymore


The best objective value of 18 was found after 25 seconds.  We might as well have stopped at that point...

Most likely 18 is the optimal objective, but we have not proven that. The best solution looks like:

Best found solution

Saturday, October 6, 2018

Just your average Excel formula


Source:
Ira Iosebashvili, Wall Street Journal, The First Rule of Microsoft Excel -- Don't Tell Anyone You're Good at it, Oct. 5, 2018
https://www.wsj.com/articles/the-first-rule-of-microsoft-exceldont-tell-anyone-youre-good-at-it-1538754380

Wednesday, October 3, 2018

Scheduling and three sided football

Design a tournament schedule


In [1] a question is posed how to design a schedule for a sequence games:

  • There are 9 teams,
  • we have 4 rounds of 3 matches,
  • each match consists of 3 teams,
  • a team plays with another team exactly once in the same match.
This problem is related to the social golfer problem [2,3]. Not many games have three teams but there is something called three sided football [4].


Three sided football field [4]

Integer Quadratically Constrained Model


To find a feasible schedule we use the following sets: \[\begin{align} & t \in \{A,\dots,I\} && \text{9 teams} \\ & r \in \{\mathit{round}_1,\dots,\mathit{round}_4\} && \text{4 rounds}\\ & m \in \{\mathit{match}_1,\dots,\mathit{match}_3\} && \text{3 matches per round} \end{align}\] We introduce binary variables \[x_{r,m,t} = \begin{cases} 1 & \text{if team $t$ plays in match $m$ in round $r$} \\ 0 & \text{otherwise}\end{cases}\]
A high-level model can look like:

MIQCP feasibility model
\[\sum_m x_{r,m,t} = 1 \>\>\forall r,t\]Each team plays exactly once in each round
\[\sum_t x_{r,m,t} = 3 \>\>\forall r,m\]Each match consists of three teams
\[\sum_{r,m} x_{r,m,t} x_{r,m,t'} = 1 \>\>\forall t\lt t'\]Teams \(t\) and \(t'\) play exactly once in the same match. This is a non-convex, quadratic constraint. 

There is no objective: we are only looking for a feasible solution. The last equation is the most complicated one. We have \(x_{r,m,t} x_{r,m,t'}=1\) if teams \((t,t')\)  are in the same match \((r,m)\). Because of symmetry in the product \(x_{r,m,t} x_{r,m,t'}\) , we only need to check teams \(t\lt t'\). This saves us a few quadratic constraints.

This model can be solved with a constraint solver or a global (non-convex) MIQCP (Mixed Integer Quadratically Constrained Programming) solver. Of course, with a MIQCP solver we need to add a dummy objective. Global solvers, like Baron, Couenne or Antigone, can solve this easily. The results can look like:

Solution
We have marked the cases where \(x_{r,m,t}=1\). We can check that:

  • Every row has three entries (a match has three teams)
  • Every column has four entries: one in each round (a team plays once in each round)
  • Two teams play in the same match exactly once (e.g. A and B meet in round 1 only)

It is noted that there are many feasible solutions.

Making the problem convex, linear


If we want to solve with a convex MIQCP solver (like Cplex or Gurobi) we can replace the quadratic constraint by: \[\sum_{r,m} x_{r,m,t} x_{r,m,t'} \le 1 \>\>\forall t\lt t'\]

Finally, if we want, we can linearize the quadratic constraint. That would allow us to solve this problem with a straight linear MIP solver. We can write: \[\begin{align} & y_{r,m,t,t'} \ge x_{r,m,t} + x_{r,m,t'} - 1 && \forall r,m,t\lt t'\\ & \sum_{r,m} y_{r,m,t,t'} \le 1 && \forall t\lt t'\\ &   y_{r,m,t,t'} \in [0, 1]\end{align}\] Again, we need to do this only for \(t\lt t'\). The MIP model performs better than the quadratic versions if the problems become a bit larger. It is not unusual that linarized models, solved as a MIP, perform better than using a quadratic solver against the original model, even if we have to add a significant number of variables and equations in the process.

There is lots of symmetry in this problem and there are quite a few papers around, mainly from a constraint programming view, that try to deal with this.

Fixing


For larger problems there are a few tricks we can apply. For example, we can fix the first two rounds, and let the model only worry about the subsequent rounds. In addition we can fix the first \(m\) teams in all rounds. The idea is to make the remaining problem smaller so that the solver has an easier job. Fixing also reduces symmetry. With this I could solve very quickly the following instances [5] (I used the MIP formulation):

5 rounds, 4 matches, 16 teams (first 2 rounds, first 4 teams fixed)

6 rounds, 5 matches, 25 teams (first 2 rounds, first 5 teams fixed)

The dark areas were fixed and the light area was solved by the MIP solver.

Conclusion


With two simple tricks:

  • linearization of the quadratic constraint 
  • fixing a significant part of the problem
we can solve reasonable sized schedule design problems with a good MIP solver quickly. 


References


  1. PHP - Evenly Distribute Teams Into Arrays So That No Combination Repeats, https://stackoverflow.com/questions/52600546/php-evenly-distribute-teams-into-arrays-so-that-no-combination-repeats
  2. Markus Triska and Nysret Musliu, An Improved SAT Formulation for the Social Golfer Problem, Annals of Operations Research Vol. 194(1) (2012), pp. 427-438
  3. Nicolas Barnier, Pascal Brisset. Solving the Kirkman’s schoolgirl problem in a few seconds. CP 2002, 8th International Conference on Principles and Practice of Constraint Programming, Sep 2002, Ithaca, United States. Springer, 2470, pp 33-41, 2002.
  4. Three Sided Football, https://en.wikipedia.org/wiki/Three_sided_football
  5. All Match Combinations With Either 4 or 5 Teams Per Match When There are 16 or 25 Teams, https://math.stackexchange.com/questions/2939746/all-match-combinations-with-either-4-or-5-teams-per-match-when-there-are-16-or-2

Thursday, September 27, 2018

Maximize a product

In [1] a small problem is stated:
  • Let \(I = \{1,\dots,n\}\)
  • We have two parameters \(a_i, b_i \ge 0\)
  • Find a subset \(S\subset I\) that maximizes the following expression \[\max \left(\sum_{i \in S} a_i\right) \left(\sum_{i \notin S} b_i\right) \]  
This is easily formulated as a Mixed Integer Quadratic Programming (MIQP) problem. We can write:

MIQP Model
\[\begin{align}\max &\left(\sum_i a_i x_i \right) \left(\sum_i  b_i (1-x_i) \right) \\ & x_i \in \{0,1\}\end{align}\]

Using a modeling system like GAMS we can directly implement this:

obj.. z =e= sum(i, a(i)*x(i)) * sum(i, b(i)*(1-x(i)));


It is interesting to see what different solvers do with this. The model can be reformulated by the solver in different ways, so we can potentially see very different behavior. I generated some random values (uniformly distributed) for the parameters: \[\begin{align} &n=100\\ & a_i \sim U(0,1)\\ & b_i \sim U(0,1)\end{align}\] This is already large enough to make it interesting. Complete enumeration is out of the question: \[2^{100} = 1267650600228229401496703205376\] The generated input data looks like:


----      7 PARAMETER a  

i1   0.172,    i2   0.843,    i3   0.550,    i4   0.301,    i5   0.292,    i6   0.224,    i7   0.350,    i8   0.856
i9   0.067,    i10  0.500,    i11  0.998,    i12  0.579,    i13  0.991,    i14  0.762,    i15  0.131,    i16  0.640
i17  0.160,    i18  0.250,    i19  0.669,    i20  0.435,    i21  0.360,    i22  0.351,    i23  0.131,    i24  0.150
i25  0.589,    i26  0.831,    i27  0.231,    i28  0.666,    i29  0.776,    i30  0.304,    i31  0.110,    i32  0.502
i33  0.160,    i34  0.872,    i35  0.265,    i36  0.286,    i37  0.594,    i38  0.723,    i39  0.628,    i40  0.464
i41  0.413,    i42  0.118,    i43  0.314,    i44  0.047,    i45  0.339,    i46  0.182,    i47  0.646,    i48  0.561
i49  0.770,    i50  0.298,    i51  0.661,    i52  0.756,    i53  0.627,    i54  0.284,    i55  0.086,    i56  0.103
i57  0.641,    i58  0.545,    i59  0.032,    i60  0.792,    i61  0.073,    i62  0.176,    i63  0.526,    i64  0.750
i65  0.178,    i66  0.034,    i67  0.585,    i68  0.621,    i69  0.389,    i70  0.359,    i71  0.243,    i72  0.246
i73  0.131,    i74  0.933,    i75  0.380,    i76  0.783,    i77  0.300,    i78  0.125,    i79  0.749,    i80  0.069
i81  0.202,    i82  0.005,    i83  0.270,    i84  0.500,    i85  0.151,    i86  0.174,    i87  0.331,    i88  0.317
i89  0.322,    i90  0.964,    i91  0.994,    i92  0.370,    i93  0.373,    i94  0.772,    i95  0.397,    i96  0.913
i97  0.120,    i98  0.735,    i99  0.055,    i100 0.576


----      7 PARAMETER b  

i1   0.051,    i2   0.006,    i3   0.401,    i4   0.520,    i5   0.629,    i6   0.226,    i7   0.396,    i8   0.276
i9   0.152,    i10  0.936,    i11  0.423,    i12  0.135,    i13  0.386,    i14  0.375,    i15  0.268,    i16  0.948
i17  0.189,    i18  0.298,    i19  0.075,    i20  0.401,    i21  0.102,    i22  0.384,    i23  0.324,    i24  0.192
i25  0.112,    i26  0.597,    i27  0.511,    i28  0.045,    i29  0.783,    i30  0.946,    i31  0.596,    i32  0.607
i33  0.363,    i34  0.594,    i35  0.680,    i36  0.507,    i37  0.159,    i38  0.657,    i39  0.524,    i40  0.124
i41  0.987,    i42  0.228,    i43  0.676,    i44  0.777,    i45  0.932,    i46  0.201,    i47  0.297,    i48  0.197
i49  0.246,    i50  0.646,    i51  0.735,    i52  0.085,    i53  0.150,    i54  0.434,    i55  0.187,    i56  0.693
i57  0.763,    i58  0.155,    i59  0.389,    i60  0.695,    i61  0.846,    i62  0.613,    i63  0.976,    i64  0.027
i65  0.187,    i66  0.087,    i67  0.540,    i68  0.127,    i69  0.734,    i70  0.113,    i71  0.488,    i72  0.796
i73  0.492,    i74  0.534,    i75  0.011,    i76  0.544,    i77  0.451,    i78  0.975,    i79  0.184,    i80  0.164
i81  0.025,    i82  0.178,    i83  0.061,    i84  0.017,    i85  0.836,    i86  0.602,    i87  0.027,    i88  0.196
i89  0.951,    i90  0.336,    i91  0.594,    i92  0.259,    i93  0.641,    i94  0.155,    i95  0.460,    i96  0.393
i97  0.805,    i98  0.541,    i99  0.391,    i100 0.558


MIQP Solvers: Cplex, Gurobi and Baron


Cplex solves this very fast, all in the root node:

        Nodes                                         Cuts/
   Node  Left     Objective  IInf  Best Integer    Best Bound    ItCnt     Gap

*     0+    0                            0.0000     1807.5989              ---
Found incumbent of value 0.000000 after 0.89 sec. (112.24 ticks)
*     0+    0                          874.9563     1807.5989           106.59%
Found incumbent of value 874.956296 after 0.89 sec. (112.41 ticks)
      0     0      903.7995   100      874.9563      903.7995     3357    3.30%
      0     0      896.6965   182      874.9563    Cuts: 1268     3752    0.00%
      0     0      889.7657   772      874.9563    Cuts: 1337     4088    0.00%
      0     0      882.7062  1112      874.9563 ZeroHalf: 1337     4432    0.00%
      0     0      877.9208  2106      874.9563 ZeroHalf: 1337     4827    0.00%
      0     0      875.4173   762      874.9563 ZeroHalf: 1098     5071    0.00%
      0     0        cutoff            874.9563                   5112     ---
Elapsed time = 10.70 sec. (3363.84 ticks, tree = 0.01 MB, solutions = 2)

The solution looks like:


----     21 VARIABLE x.L  selection (binary)

i1   1.000,    i2   1.000,    i3   1.000,    i8   1.000,    i11  1.000,    i12  1.000,    i13  1.000,    i14  1.000
i19  1.000,    i20  1.000,    i21  1.000,    i25  1.000,    i26  1.000,    i28  1.000,    i34  1.000,    i37  1.000
i38  1.000,    i39  1.000,    i40  1.000,    i47  1.000,    i48  1.000,    i49  1.000,    i52  1.000,    i53  1.000
i58  1.000,    i60  1.000,    i64  1.000,    i67  1.000,    i68  1.000,    i70  1.000,    i74  1.000,    i75  1.000
i76  1.000,    i79  1.000,    i81  1.000,    i83  1.000,    i84  1.000,    i87  1.000,    i88  1.000,    i90  1.000
i91  1.000,    i92  1.000,    i94  1.000,    i96  1.000,    i98  1.000,    i100 1.000


----     21 PARAMETER count                =       46.000  number of x(i)=1
            VARIABLE z.L                   =      874.956  objective variable


Gurobi has some real troubles with this model. After 10,000 seconds I stopped it:

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0 1028.82147    0   88   -0.00000 1028.82147      -     -    0s
H    0     0                      94.1937601 1028.82147   992%     -    0s
H    0     0                     874.9562964 1028.82147  17.6%     -    0s
     0     2 1028.82147    0   88  874.95630 1028.82147  17.6%     -    0s
  5024  3355  981.36042   24   77  874.95630  989.02791  13.0%   2.4    5s
H13738 10488                     874.9564248  978.66510  11.9%   2.4    8s
 20782 16103  909.88622   55   44  874.95642  975.04173  11.4%   2.4   10s
 43494 34084  943.11968   43   57  874.95642  968.63227  10.7%   2.4   15s
 65078 50966  916.66669   49   50  874.95642  964.88315  10.3%   2.4   20s
 89405 69894  904.98150   56   44  874.95642  962.21433  10.0%   2.4   25s
 114800 89612  898.26415   51   50  874.95642  959.84933  9.70%   2.4   30s
 136964 106525  911.88894   51   49  874.95642  958.31653  9.53%   2.4   35s
 162025 125584  892.69915   52   48  874.95642  956.85312  9.36%   2.4   40s
 184525 142823  922.61714   44   57  874.95642  955.79186  9.24%   2.4   45s
 208972 161530  880.72495   54   46  874.95642  954.67328  9.11%   2.4   50s
 231067 178124  885.81556   57   44  874.95642  953.79519  9.01%   2.4   55s
 255408 196541  935.18869   46   54  874.95642  952.89911  8.91%   2.4   60s
 278680 214033  899.07410   51   47  874.95642  952.12450  8.82%   2.4   65s
 302859 232220  936.52130   37   64  874.95642  951.40827  8.74%   2.4   70s
 328028 251086  910.60834   50   50  874.95642  950.70076  8.66%   2.4   75s
 352051 268955  905.74964   54   45  874.95642  950.10317  8.59%   2.4   80s
 376120 286723  902.67637   57   43  874.95642  949.57421  8.53%   2.4   85s
 400406 304716  949.03348   26   77  874.95642  949.03348  8.47%   2.4   90s
 422676 321239  882.35938   52   47  874.95642  948.53308  8.41%   2.4   95s

. . .

 13091984 8119359  883.24548   54   44  874.95642  919.65956  5.11%   2.4 9990s
 13094401 8120718     cutoff   55       874.95642  919.65790  5.11%   2.4 9995s
 13095972 8121550  897.09522   56   41  874.95642  919.65687  5.11%   2.4 10000s
 13098502 8122984  908.84424   50   50  874.95642  919.65560  5.11%   2.4 10005s
 13100870 8124293  874.95653   56   44  874.95642  919.65388  5.11%   2.4 10010s
 13103160 8125658  883.98954   56   46  874.95642  919.65273  5.11%   2.4 10015s
 13105665 8127116  889.74823   60   40  874.95642  919.65125  5.11%   2.4 10020s
 13107878 8128361     cutoff   61       874.95642  919.64989  5.11%   2.4 10025s
 13110794 8129992  880.12914   53   48  874.95642  919.64814  5.11%   2.4 10030s
 13112818 8131096  891.49116   58   43  874.95642  919.64674  5.11%   2.4 10035s
Sending CtrlC signal

Explored 13114854 nodes (31784166 simplex iterations) in 10037.84 seconds
Thread count was 4 (of 8 available processors)

Solution count 4: 874.956 874.956 94.1938 -0

Solve interrupted
Best objective 8.749562757165e+02, best bound 9.196456216128e+02, gap 5.1076%
MIQP status(11): Optimization was terminated by the user.


Gurobi finds the optimal solution quickly but it is not able to prove optimality.

Interestingly, the global solver Baron does a very good jobs on this:

Preprocessing found feasible solution with value  0.00000000000    
 Doing local search
 Preprocessing found feasible solution with value  874.943601492    
 Solving bounding LP
 Starting multi-start local search
 Preprocessing found feasible solution with value  874.956296356    
 Done with local search
===========================================================================
  Iteration    Open nodes         Time (s)    Lower bound      Upper bound
          1             0             1.00     874.956          874.956   

 Cleaning up

                         *** Normal completion ***           

 Wall clock time:                     1.00
 Total CPU time used:                 0.75


Linearization


Let's see if we can make Gurobi look a little bit better. We can reformulate the model as a straight mixed-integer programming model:

MIP Model
\[\begin{align} \max & \sum_{i \ne j} a_i b_j y_{i,j} \\ & y_{i,j} \le x_i && \forall i \ne j \\ & y_{i,j} \le 1-x_j && \forall i \ne j \\ & 0 \le y_{i,j} \le 1 \\ & x_i \in \{0,1\} \end{align}\]

This model introduces \(100 \times 100 - 100 = 9,900\) extra continuous variables (and 19,800 additional constraints).

Derivation


We can rewrite \[\max \left(\sum_i a_i x_i \right) \left(\sum_i  b_i (1-x_i) \right) \] as \[\max \sum_{i,j} a_i b_j x_i (1-x_j)  \] We can linearize the product \[y_{i,j} = x_i (1-x_j)\] as \[\begin{align} &  y_{i,j} \le x_i \\ & y_{i,j} \le (1-x_j) \\ & y_{i,j} \ge x_i - x_j \\ & y_{i,j}, x_i \in \{0,1\}\end{align}\] You can verify the correctness of this by plugging in the possible values for \((x_i,x_j) = \{(0,0),(0,1),(1,0),(1,1)\}\). Finally we observe: 
  • \(y_{i,j}\) can be relaxed to be continuous between 0 and 1. It will be integer automatically.
  • We are really maximizing \(y_{i,j}\) so we can drop the last \(\ge\) constraints.
  • We can skip the case where \(i=j\): the product \(x_i (1-x_i)\) is always zero.

This formulation proves to be quite beneficial for Gurobi: 

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0  903.79947    0  100   -0.00000  903.79947      -     -    2s
H    0     0                       2.2104609  903.79947      -     -    3s
H    0     0                     163.3432337  903.79947   453%     -    3s
H    0     0                     177.4409096  903.79947   409%     -    3s
H    0     0                     672.0256560  903.79947  34.5%     -    3s
H    0     0                     690.8350810  903.79947  30.8%     -    3s
     0     0  903.64085    0  100  690.83508  903.64085  30.8%     -    3s
H    0     0                     787.7021585  903.64085  14.7%     -    3s
     0     0  903.47324    0  100  787.70216  903.47324  14.7%     -    4s
     0     0  903.30963    0  100  787.70216  903.30963  14.7%     -    4s
     0     2  903.30963    0  100  787.70216  903.30963  14.7%     -    7s
*    4     4               2     874.9562964  888.63188  1.56%   737    9s

Cutting planes:
  Gomory: 9

Explored 7 nodes (16549 simplex iterations) in 9.53 seconds
Thread count was 4 (of 8 available processors)

Solution count 8: 874.956 787.702 690.835 ... -0

Optimal solution found (tolerance 0.00e+00)
Best objective 8.749562963561e+02, best bound 8.749562963561e+02, gap 0.0000%
MIP status(2): Model was solved to optimality (subject to tolerances).


Instead of doing this linearization by ourselves, we can also tell Gurobi to do this by using the option preqlinearize. (The behavior is not exactly the same, it solved in 12 seconds instead of 9). The automatic behavior is apparently not to apply the linearization. Too bad Gurobi does not realize it is in so much trouble and should reformulate the problem.

Cheating: a simple heuristic


We can predict the optimal solution for our specific random data set quite easily: 
  • if \(a_i\gt b_i\), set \(x_i=1\)
  • if \(a_i\lt b_i\), set \(x_i=0\)

This correctly shows the optimal solution:



----     18 heuristic solution

----     18 VARIABLE x.L  selection (binary)

i1   1.000,    i2   1.000,    i3   1.000,    i8   1.000,    i11  1.000,    i12  1.000,    i13  1.000,    i14  1.000
i19  1.000,    i20  1.000,    i21  1.000,    i25  1.000,    i26  1.000,    i28  1.000,    i34  1.000,    i37  1.000
i38  1.000,    i39  1.000,    i40  1.000,    i47  1.000,    i48  1.000,    i49  1.000,    i52  1.000,    i53  1.000
i58  1.000,    i60  1.000,    i64  1.000,    i67  1.000,    i68  1.000,    i70  1.000,    i74  1.000,    i75  1.000
i76  1.000,    i79  1.000,    i81  1.000,    i83  1.000,    i84  1.000,    i87  1.000,    i88  1.000,    i90  1.000
i91  1.000,    i92  1.000,    i94  1.000,    i96  1.000,    i98  1.000,    i100 1.000


----     18 PARAMETER count                =       46.000  number of x(i)=1
            EQUATION obj.L                 =      874.956  


Of course, this approach will only work for certain very "balanced" data sets. We exploited here that \(a_i\) and \(b_i\) are drawn from the same distribution. In addition, we have no cases where \(a_i=b_i\). I don't know about good heuristics for the more general case.

A related problem


A related problem (or better: a special case) only looks at parameter \(a\):\[\max \left(\sum_{i \in S} a_i\right) \left(\sum_{i \notin S} a_i\right) \] This problem, as noted in the comments below, has an interesting linearization. The MIQP formulation is again straightforward: \[\begin{align}\max & \left(\sum_i a_i x_i \right) \left(\sum_i  a_i (1-x_i) \right)\\ & x_i \in \{0,1\}\end{align} \] We can find the optimal solution by solving instead  \[\min \left|\sum_i a_i x_i - \sum_i  a_i (1-x_i) \right| \] I.e. make sure we divide into \(I\) into \(S\) and \(I-S\) as evenly as possible. The absolute value is easily linearized, yielding a MIP model.

Note that when we write \(A=\sum_i a_i\), we can state this as: \[\min \left| \sum_i a_i x_i - \frac{A}{2}\right|\]

Here are some performance figures for this problem:

SolverModelTimeObjGap
CplexMIQP>3600465.93140%
GurobiMIQP>3600465.93165%
BaronMIQP2465.931Optimal
CplexMIP4465.931Optimal
GurobiMIP1465.931Optimal


So we see:
  • Cplex and Gurobi do not like the MIQP model at all. They stopped on a time limit before optimality was proven. This is worse performance than on the original problem with both \(a\) and \(b\).
  • Baron does a really good job. Baron is a global NLP/MINLP solver, so it is somewhat surprising to me that it is doing so much better than more specialized MIQP solvers.
  • The MIP formulations are easy to solve.
  • The objective reported for the MIP models is not the MIP objective but rather \(\left(\sum_i a_i x_i \right) \left(\sum_i  a_i (1-x_i) \right)\) evaluated at the solution point.

The Baron log is just very convincing:


  Iteration    Open nodes         Time (s)    Lower bound      Upper bound
*         1             1             2.00     465.931          1863.72   
          1             0             2.00     465.931          465.931   


Conclusion


MIQP models, including rather smallish ones, remain very difficult to solve, even with expensive commercial solvers. The only solver that is consistently doing a good job is Baron. For most solvers it pays off to look for linearizations.   

References