Yet Another Math Programming Consultant: Maximum Dispersion

Problem statement

Given \(n\) points with their distances \(d_{i,j}\), select \(k\) points such that the sum of the distances between the selected points is maximized [1].

A simple MIQP (Mixed-Integer Quadratic Programming) model is:

Non-convex MIQP Model
\[\begin{align} \max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}x_{i} \color{darkred}x_{j} \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}x_{i} \in \{0,1\} \end{align}\]

Notice we only need to look at distances \(d_{i,j}\) with \(i\lt j\) as we can assume symmetry. If not, just make \(d_{i,j}\) symmetric by: \[d_{i,j} = \frac{d_{i,j} + d_{j,i}}{2}\]

As we shall see this is not such an easy problem to solve to optimality.

This problem is called the \(p\)-dispersion-sum problem.

Example data

I generated random \(n=50\) \(2d\) points:

----     24 PARAMETER coord  random coordinates

              x           y

i1        1.717       8.433
i2        5.504       3.011
i3        2.922       2.241
i4        3.498       8.563
i5        0.671       5.002
i6        9.981       5.787
i7        9.911       7.623
i8        1.307       6.397
i9        1.595       2.501
i10       6.689       4.354
i11       3.597       3.514
i12       1.315       1.501
i13       5.891       8.309
i14       2.308       6.657
i15       7.759       3.037
i16       1.105       5.024
i17       1.602       8.725
i18       2.651       2.858
i19       5.940       7.227
i20       6.282       4.638
i21       4.133       1.177
i22       3.142       0.466
i23       3.386       1.821
i24       6.457       5.607
i25       7.700       2.978
i26       6.611       7.558
i27       6.274       2.839
i28       0.864       1.025
i29       6.413       5.453
i30       0.315       7.924
i31       0.728       1.757
i32       5.256       7.502
i33       1.781       0.341
i34       5.851       6.212
i35       3.894       3.587
i36       2.430       2.464
i37       1.305       9.334
i38       3.799       7.834
i39       3.000       1.255
i40       7.489       0.692
i41       2.020       0.051
i42       2.696       4.999
i43       1.513       1.742
i44       3.306       3.169
i45       3.221       9.640
i46       9.936       3.699
i47       3.729       7.720
i48       3.967       9.131
i49       1.196       7.355
i50       0.554       5.763

Let's try to find \(k=10\) points that are most dispersed. The distance matrix is formed by calculating Euclidean distances.

10 out of 50 most dispersed points

Solve non-convex problem

The above MIQP problem is non-convex. We can solve the non-convex MIQP problem in different ways:

Throw this into a global solver such as Baron, Couenne or Antigone
Use a solver like Cplex (option qtolin) or Gurobi (option preQlinearize) that can linearize the problem automatically for us. See further down how we can do this ourselves.
Instruct Cplex to use a QP formulation (option qtolin=0) and tell it to use the global QP solver (option optimalitytarget=3)

Convexification of the quadratic model

There is a trick to make this problem convex. For maximization, we require that the \(Q\) matrix is negative definite. In our case the \(Q\) matrix is our distance matrix. The following algorithm will make the problem convex:

Calculate the largest eigenvalue \(\lambda_{max}\) of the distance matrix \(0.5 D\). We multiply by 0.5 because we only use half of the matrix in the objective to prevent double counting.
If \(\lambda_{max} \lt 0 \): we are done (matrix is negative-definite)
Form the objective: \[\max \sum_{i \lt j} d_{i,j} x_{i} x_{j} - \lambda_{max} \sum_i x_i^2 + \lambda_{max} \sum_i x_i\]

This essentially imposes a possible large diagonal perturbation of the \(Q\) matrix. We compensate with a linear term. This uses the fact that \(x_i = x_i^2\) when \(x_i \in \{0,1\}\).

Linearization

The quadratic model is easily linearized by observing that a well-known trick to handle \(y_{i,j} = x_i x_j\) is to write: \[\begin{align} & y_{i,j} \le x_i \\ & y_{i,j} \le x_j \\ & y_{i,j} \ge x_i +x_j-1\\& y_{i,j}, x_i \in \{0,1\}\end{align}\] A complete model can look like:

Linearized Model 1
\[\begin{align} \max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}y_{i,j} \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}y_{i,j} \le \color{darkred} x_i && \forall i\lt j \\ & \color{darkred}y_{i,j} \le \color{darkred} x_j && \forall i\lt j \\ & \color{darkred}x_{i} \in \{0,1\} \\ & \color{darkred}y_{i,j} \in [0,1] \end{align}\]

There are two things that may need some attention:

The inequality \(y_{i,j}\ge x_i +x_j -1\) was dropped. The objective will take care of this.
The variables \(y_{i,j}\) were relaxed to be continuous between 0 and 1. The \(y\) variables will be automatically integer (well, where it matters). Often models with fewer integer variables are easier to solve. Modern solvers, however, may reintroduce binary variables for this particular model. In Cplex you can see a message like shown below.

Tried aggregator 1 time.
MIP Presolve eliminated 1 rows and 1 columns.
Reduced MIP has 2451 rows, 1275 columns, and 4950 nonzeros.
Reduced MIP has 50 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.06 sec. (2.79 ticks)
Found incumbent of value 0.000000 after 0.06 sec. (4.54 ticks)
Probing time = 0.02 sec. (0.51 ticks)
Tried aggregator 1 time.
Reduced MIP has 2451 rows, 1275 columns, and 4950 nonzeros.
Reduced MIP has 1275 binaries, 0 generals, 0 SOSs, and 0 indicators.

A tighter linearization

In [2] a tighter formulation is proposed:

Linearized Model 2
\[\begin{align} \max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}y_{i,j} \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}y_{i,j} \le \color{darkred} x_i && \forall i\lt j \\ & \color{darkred}y_{i,j} \le \color{darkred} x_j && \forall i\lt j \\ & \color{darkred}y_{i,j} \ge \color{darkred} x_i +\color{darkred} x_j -1 && \forall i\lt j \\ & \sum_{i\lt j} \color{darkred}y_{i,j} + \sum_{i\gt j} \color{darkred}y_{j,i} = (\color{darkblue} k-1) \color{darkred}x_j && \forall j \\ & \color{darkred}x_{i} \in \{0,1\} \\ & \color{darkred}y_{i,j} \in \{0,1\} \end{align}\]

Linearized Model 2

\[\begin{align} \max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}y_{i,j} \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}y_{i,j} \le \color{darkred} x_i && \forall i\lt j \\ & \color{darkred}y_{i,j} \le \color{darkred} x_j && \forall i\lt j \\ & \color{darkred}y_{i,j} \ge \color{darkred} x_i +\color{darkred} x_j -1 && \forall i\lt j \\ & \sum_{i\lt j} \color{darkred}y_{i,j} + \sum_{i\gt j} \color{darkred}y_{j,i} = (\color{darkblue} k-1) \color{darkred}x_j && \forall j \\ & \color{darkred}x_{i} \in \{0,1\} \\ & \color{darkred}y_{i,j} \in \{0,1\} \end{align}\]

Essentially, we multiplied the constraint \(\sum_i x_{i} = k\) by \(x_j\), and added these as constraints. The derivation is as follows: \[\begin{align} & \left( \sum_i x_i\right) x_j = k x_j\\ \Rightarrow & \sum_{i\lt j} x_i x_j + x_j^2 + \sum_{i\gt j} x_i x_j = k x_j\\ \Rightarrow & \sum_{i\lt j} y_{i,j} + \sum_{i\gt j} y_{j,i} = (k-1) x_j \end{align}\] We used here that \(x_j^2 = x_j\).

My intuition is as follows. If \(x_j=1\) then exactly \(k-1\) other \(x_i\)'s should be 1. That means \(k-1\) \(y_{i,j}\)'s should be 1. As we only use the upper-triangular part, the picture becomes:

Picture of y layout

Given that \(x_j=1\), we need to have \(k-1\) ones in the orange region. It is always a good idea to see if the constraint we formed by pure algebraic steps, has still a meaning.

Aggregation

We can aggregate the previous cut, which leads to: \[\sum_{i\lt j} y_{i,j} = \frac{k(k-1)}{2}\] The advantage of this version is that we only need one extra constraint [4]:

Linearized Model 3
\[\begin{align} \max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}y_{i,j} \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}y_{i,j} \le \color{darkred} x_i && \forall i\lt j \\ & \color{darkred}y_{i,j} \le \color{darkred} x_j && \forall i\lt j \\ & \color{darkred}y_{i,j} \ge \color{darkred} x_i +\color{darkred} x_j -1 && \forall i\lt j \\ & \sum_{i\lt j} \color{darkred}y_{i,j} = \frac{ \color{darkblue}k (\color{darkblue} k-1)}{2} \\ & \color{darkred}x_{i} \in \{0,1\} \\ & \color{darkred}y_{i,j} \in \{0,1\} \end{align}\]

Linearized Model 3

\[\begin{align} \max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}y_{i,j} \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}y_{i,j} \le \color{darkred} x_i && \forall i\lt j \\ & \color{darkred}y_{i,j} \le \color{darkred} x_j && \forall i\lt j \\ & \color{darkred}y_{i,j} \ge \color{darkred} x_i +\color{darkred} x_j -1 && \forall i\lt j \\ & \sum_{i\lt j} \color{darkred}y_{i,j} = \frac{ \color{darkblue}k (\color{darkblue} k-1)}{2} \\ & \color{darkred}x_{i} \in \{0,1\} \\ & \color{darkred}y_{i,j} \in \{0,1\} \end{align}\]

The intuition is easy: if we have \(k\) \(x_j\)'s equal to one, we need to have \(k(k-1)/2\) \(y_{i,j}\)'s equal to one.

Numerical results

I tried these formulations using Cplex with a time limit of 1800 seconds (half an hour).

Model	Time	Obj	Gap	Notes
Non-convex MIQP	1800	332.1393	87.64%	Solve as quadratic model. options: qtolin=0, optimalitytarget=3
Non-convex MIQP	1800	332.9129	57.65%	Automatically converted to linear model. options: qtolin=1
Convex MIQP	1800	332.9129	92.22%	option: qtolin=0
MIP 1	1800	332.9129	57.65%	Defaults
MIP 2	8	332.9129	optimal	Defaults (roughly same performance whether or not including \(y_{i,j}\ge x_i+x_j-1\))
MIP 3	400	332.9129	optimal	Defaults. Excludes \(y_{i,j}\ge x_i+x_j-1\).
MIP 3	11	332.9129	optimal	Defaults. Includes \(y_{i,j}\ge x_i+x_j-1\).

Just one observation, but I think these results are representative for other (small) instances.

We dropped the constraint \(y_{i,j}\ge x_i+x_j-1\) from MIP model 1: the objective will push \(y\) upwards on its own. For models MIP 2 and 3 it is wise to reintroduce them.

It is noted that the gaps are terrible for all models without extra cuts. However, some of these methods find the best solution very quickly. They are just not able to prove optimality in a reasonable time. Here is an example:

        Nodes                                         Cuts/
   Node  Left     Objective  IInf  Best Integer    Best Bound    ItCnt     Gap

      0     0     1235.6817    50                   1235.6817        7         
*     0+    0                          332.9129     1235.6817           271.17%
Found incumbent of value 332.912894 after 0.02 sec. (4.67 ticks)
      0     2     1235.6817    50      332.9129     1162.0829        7  249.07%
Elapsed time = 0.03 sec. (11.72 ticks, tree = 0.02 MB, solutions = 1)

Better dispersion

From the picture of the solution (earlier in this post), we see that this model does not prevent selected points to be close to each other. A better model may be to maximize the minimum distance between selected points. This can be modeled as:

Maximize Minimum Distance
\[\begin{align} \max\> & \color{darkred} {\Delta} \\ & \color{darkred} \Delta \le \color{darkblue} d_{i,j} + \color{darkblue} M (1- \color{darkred}x_{i} \color{darkred}x_{j}) && \forall i\lt j \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}x_{i} \in \{0,1\} \\ \end{align}\]

Here \(M\) is a large enough constant, e.g. \[M = \max_{i\le j} d_{i,j}\] The model can be easily linearized by formulating the distance constraint as \[ \Delta \le d_{i,j} + M (1- x_{i}) + M(1- x_{j}) \] This model quickly solves our 10 out of 50 problem. It gives as solution:

Maximization of minimum distance

This model has as disadvantage that it does not care about points being closer to each other as long as they are further away than the minimum. For this example, visual inspection seems to indicate this model does good job.

Conclusion

The \(p\)-dispersion-sum problem is very difficult to solve to optimality, even for very small data sets. Extra cuts can enormously help the linearized version of the model. A main drawback is that the optimal solutions do not provide us with well-dispersed points (points can be very close).

The maximin model solves much easier, and it gives better dispersion: selected points are never close to each other.

References

How to select n objects from a set of N objects, maximizing the sum of pairwise distances between them, https://stackoverflow.com/questions/56761115/how-to-select-n-objects-from-a-set-of-n-objects-maximizing-the-sum-of-pairwise
Warren P. Adams and Hanif D. Sherali, A Tight Linearization and an Algorithm for Zero-One Quadratic Programming Problems, Management Science, Vol. 32, No. 10 (Oct., 1986), pp. 1274-1290
Michael J. Kuby, Programming Models for Facility Dispersion: The p‐Dispersion and Maxisum Dispersion Problems, Geographical Analysis, vol. 19, pp.315-329, 1987.
Ed Klotz, Performance Tuning for Cplex’s Spatial Branch-and-Bound Solver for Global Nonconvex (Mixed Integer) Quadratic Programs, http://orwe-conference.mines.edu/files/IOS2018SpatialPerfTuning.pdf
Ed Klotz, Specialized Strategies for Products of Binary Variables
https://www.gurobi.com/wp-content/uploads/2021/03/Models-with-Products-of-Binary-Variables.pdf See also the webinar at: https://www.gurobi.com/resource/models-with-products-of-binary-variables/

3 comments:

VicmanJuly 31, 2019 at 10:46 AM
Beautiful blog. It has been a good resource of entertaining and defying problems to show in my classes. Thank you for your time and dedication to post these beautiful math optimization problems.
Rob PrattMarch 18, 2021 at 6:02 PM
I wrote a blog post that uses this example to illustrate the new automated linearization functionality in SAS: https://blogs.sas.com/content/subconsciousmusings/2021/03/18/automated-linearization-in-sas-optimization/

Sunday, June 30, 2019

Maximum Dispersion