Problem statement
Given \(n\) points with their distances \(d_{i,j}\), select \(k\) points such that the sum of the distances between the selected points is maximized [1].
A simple MIQP (Mixed-Integer Quadratic Programming) model is:
| Non-convex MIQP Model | 
|---|
| \[\begin{align} \max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}x_{i} \color{darkred}x_{j} \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}x_{i} \in \{0,1\} \end{align}\] | 
Notice we only need to look at distances \(d_{i,j}\) with \(i\lt j\) as we can assume symmetry. If not, just make \(d_{i,j}\) symmetric by: \[d_{i,j} = \frac{d_{i,j} + d_{j,i}}{2}\]
As we shall see this is not such an easy problem to solve to optimality.
This problem is called the \(p\)-dispersion-sum problem.
Example data
I generated random \(n=50\) \(2d\) points:
---- 24 PARAMETER coord random coordinates x y i1 1.717 8.433 i2 5.504 3.011 i3 2.922 2.241 i4 3.498 8.563 i5 0.671 5.002 i6 9.981 5.787 i7 9.911 7.623 i8 1.307 6.397 i9 1.595 2.501 i10 6.689 4.354 i11 3.597 3.514 i12 1.315 1.501 i13 5.891 8.309 i14 2.308 6.657 i15 7.759 3.037 i16 1.105 5.024 i17 1.602 8.725 i18 2.651 2.858 i19 5.940 7.227 i20 6.282 4.638 i21 4.133 1.177 i22 3.142 0.466 i23 3.386 1.821 i24 6.457 5.607 i25 7.700 2.978 i26 6.611 7.558 i27 6.274 2.839 i28 0.864 1.025 i29 6.413 5.453 i30 0.315 7.924 i31 0.728 1.757 i32 5.256 7.502 i33 1.781 0.341 i34 5.851 6.212 i35 3.894 3.587 i36 2.430 2.464 i37 1.305 9.334 i38 3.799 7.834 i39 3.000 1.255 i40 7.489 0.692 i41 2.020 0.051 i42 2.696 4.999 i43 1.513 1.742 i44 3.306 3.169 i45 3.221 9.640 i46 9.936 3.699 i47 3.729 7.720 i48 3.967 9.131 i49 1.196 7.355 i50 0.554 5.763
Let's try to find \(k=10\) points that are most dispersed. The distance matrix is formed by calculating Euclidean distances.
|  | 
| 10 out of 50 most dispersed points | 
Solve non-convex problem
The above MIQP problem is non-convex. We can solve the non-convex MIQP problem in different ways:
- Throw this into a global solver such as Baron, Couenne or Antigone
- Use a solver like Cplex (option qtolin) or Gurobi (option preQlinearize) that can linearize the problem automatically for us. See further down how we can do this ourselves.
- Instruct Cplex to use a QP formulation (option qtolin=0) and tell it to use the global QP solver (option optimalitytarget=3)
Convexification of the quadratic model
There is a trick to make this problem convex. For maximization, we require that the \(Q\) matrix is negative definite.  In our case the \(Q\) matrix is our distance matrix. The following algorithm will make the problem convex:
- Calculate the largest eigenvalue \(\lambda_{max}\) of the distance matrix \(0.5 D\). We multiply by 0.5 because we only use half of the matrix in the objective to prevent double counting.
- If \(\lambda_{max} \lt 0 \): we are done (matrix is negative-definite)
- Form the objective: \[\max \sum_{i \lt j} d_{i,j} x_{i} x_{j} - \lambda_{max} \sum_i x_i^2 + \lambda_{max} \sum_i x_i\]
This essentially imposes a possible large diagonal perturbation of the \(Q\) matrix. We compensate with a linear term. This uses the fact that \(x_i = x_i^2\) when \(x_i \in \{0,1\}\).  
Linearization
The quadratic model is easily linearized by observing that a well-known trick to handle \(y_{i,j} = x_i x_j\) is to write: \[\begin{align} & y_{i,j} \le x_i \\ & y_{i,j} \le x_j \\ & y_{i,j} \ge x_i +x_j-1\\& y_{i,j}, x_i \in \{0,1\}\end{align}\] A complete model can look like:
| Linearized Model 1 | 
|---|
| \[\begin{align} \max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}y_{i,j} \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}y_{i,j} \le \color{darkred} x_i && \forall i\lt j \\ & \color{darkred}y_{i,j} \le \color{darkred} x_j && \forall i\lt j \\ & \color{darkred}x_{i} \in \{0,1\} \\ & \color{darkred}y_{i,j} \in [0,1] \end{align}\] | 
There are two things that may need some attention:
- The inequality \(y_{i,j}\ge x_i +x_j -1\) was dropped. The objective will take care of this.
- The variables \(y_{i,j}\) were relaxed to be continuous between 0 and 1. The \(y\) variables will be automatically integer (well, where it matters). Often models with fewer integer variables are easier to solve. Modern solvers, however, may reintroduce binary variables for this particular model. In Cplex you can see a message like shown below.
Tried aggregator 1 time. MIP Presolve eliminated 1 rows and 1 columns. Reduced MIP has 2451 rows, 1275 columns, and 4950 nonzeros. Reduced MIP has 50 binaries, 0 generals, 0 SOSs, and 0 indicators. Presolve time = 0.06 sec. (2.79 ticks) Found incumbent of value 0.000000 after 0.06 sec. (4.54 ticks) Probing time = 0.02 sec. (0.51 ticks) Tried aggregator 1 time. Reduced MIP has 2451 rows, 1275 columns, and 4950 nonzeros. Reduced MIP has 1275 binaries, 0 generals, 0 SOSs, and 0 indicators.
A tighter linearization
In [2] a tighter formulation is proposed:
| Linearized Model 2 | 
|---|
| \[\begin{align} \max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}y_{i,j} \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}y_{i,j} \le \color{darkred} x_i && \forall i\lt j \\ & \color{darkred}y_{i,j} \le \color{darkred} x_j && \forall i\lt j \\ & \color{darkred}y_{i,j} \ge \color{darkred} x_i +\color{darkred} x_j -1 && \forall i\lt j \\ & \sum_{i\lt j} \color{darkred}y_{i,j} + \sum_{i\gt j} \color{darkred}y_{j,i} = (\color{darkblue} k-1) \color{darkred}x_j && \forall j \\ & \color{darkred}x_{i} \in \{0,1\} \\ & \color{darkred}y_{i,j} \in \{0,1\} \end{align}\] | 
Essentially, we multiplied the constraint \(\sum_i x_{i} = k\) by \(x_j\), and added these as constraints. The derivation is as follows: \[\begin{align} & \left( \sum_i x_i\right) x_j = k x_j\\ \Rightarrow & \sum_{i\lt j} x_i x_j + x_j^2 + \sum_{i\gt j} x_i x_j = k x_j\\ \Rightarrow & \sum_{i\lt j} y_{i,j} + \sum_{i\gt j} y_{j,i} = (k-1) x_j \end{align}\] We used here that \(x_j^2 = x_j\).
My intuition is as follows. If \(x_j=1\) then exactly \(k-1\) other \(x_i\)'s should be 1. That means \(k-1\) \(y_{i,j}\)'s should be 1. As we only use the upper-triangular part, the picture becomes:
|  | 
| Picture of y layout | 
Given that \(x_j=1\), we need to have \(k-1\) ones in the orange region. It is always a good idea to see if the constraint we formed by pure algebraic steps, has still a meaning. 
We can aggregate the previous cut, which leads to: \[\sum_{i\lt j} y_{i,j} = \frac{k(k-1)}{2}\] The advantage of this version is that we only need one extra constraint [4]:
The intuition is easy: if we have \(k\) \(x_j\)'s equal to one, we need to have \(k(k-1)/2\) \(y_{i,j}\)'s equal to one.
Just one observation, but I think these results are representative for other (small) instances.
We dropped the constraint \(y_{i,j}\ge x_i+x_j-1\) from MIP model 1: the objective will push \(y\) upwards on its own. For models MIP 2 and 3 it is wise to reintroduce them.
It is noted that the gaps are terrible for all models without extra cuts. However, some of these methods find the best solution very quickly. They are just not able to prove optimality in a reasonable time. Here is an example:
Here \(M\) is a large enough constant, e.g. \[M = \max_{i\le j} d_{i,j}\] The model can be easily linearized by formulating the distance constraint as \[ \Delta \le d_{i,j} + M (1- x_{i}) + M(1- x_{j}) \] This model quickly solves our 10 out of 50 problem. It gives as solution:
This model has as disadvantage that it does not care about points being closer to each other as long as they are further away than the minimum. For this example, visual inspection seems to indicate this model does good job.
The \(p\)-dispersion-sum problem is very difficult to solve to optimality, even for very small data sets. Extra cuts can enormously help the linearized version of the model. A main drawback is that the optimal solutions do not provide us with well-dispersed points (points can be very close).
Aggregation
We can aggregate the previous cut, which leads to: \[\sum_{i\lt j} y_{i,j} = \frac{k(k-1)}{2}\] The advantage of this version is that we only need one extra constraint [4]:
| Linearized Model 3 | 
|---|
| \[\begin{align} \max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}y_{i,j} \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}y_{i,j} \le \color{darkred} x_i && \forall i\lt j \\ & \color{darkred}y_{i,j} \le \color{darkred} x_j && \forall i\lt j \\ & \color{darkred}y_{i,j} \ge \color{darkred} x_i +\color{darkred} x_j -1 && \forall i\lt j \\ & \sum_{i\lt j} \color{darkred}y_{i,j} = \frac{ \color{darkblue}k (\color{darkblue} k-1)}{2} \\ & \color{darkred}x_{i} \in \{0,1\} \\ & \color{darkred}y_{i,j} \in \{0,1\} \end{align}\] | 
The intuition is easy: if we have \(k\) \(x_j\)'s equal to one, we need to have \(k(k-1)/2\) \(y_{i,j}\)'s equal to one.
Numerical results
I tried these formulations using Cplex with a time limit of 1800 seconds (half an hour).
| Model | Time | Obj | Gap | Notes | 
|---|---|---|---|---|
| Non-convex MIQP | 1800 | 332.1393 | 87.64% | Solve as quadratic model. options: qtolin=0, optimalitytarget=3 | 
| Non-convex MIQP | 1800 | 332.9129 | 57.65% | Automatically converted to linear model. options: qtolin=1 | 
| Convex MIQP | 1800 | 332.9129 | 92.22% | option: qtolin=0 | 
| MIP 1 | 1800 | 332.9129 | 57.65% | Defaults | 
| MIP 2 | 8 | 332.9129 | optimal | Defaults (roughly same performance whether or not including \(y_{i,j}\ge x_i+x_j-1\)) | 
| MIP 3 | 400 | 332.9129 | optimal | Defaults. Excludes \(y_{i,j}\ge x_i+x_j-1\). | 
| MIP 3 | 11 | 332.9129 | optimal | Defaults. Includes \(y_{i,j}\ge x_i+x_j-1\). | 
Just one observation, but I think these results are representative for other (small) instances.
We dropped the constraint \(y_{i,j}\ge x_i+x_j-1\) from MIP model 1: the objective will push \(y\) upwards on its own. For models MIP 2 and 3 it is wise to reintroduce them.
It is noted that the gaps are terrible for all models without extra cuts. However, some of these methods find the best solution very quickly. They are just not able to prove optimality in a reasonable time. Here is an example:
Nodes Cuts/ Node Left Objective IInf Best Integer Best Bound ItCnt Gap 0 0 1235.6817 50 1235.6817 7 * 0+ 0 332.9129 1235.6817 271.17% Found incumbent of value 332.912894 after 0.02 sec. (4.67 ticks) 0 2 1235.6817 50 332.9129 1162.0829 7 249.07% Elapsed time = 0.03 sec. (11.72 ticks, tree = 0.02 MB, solutions = 1)
Better dispersion
From the picture of the solution (earlier in this post), we see that this model does not prevent selected points to be close to each other.  A better model may be to maximize the minimum distance between selected points. This can be modeled as:
| Maximize Minimum Distance | 
|---|
| \[\begin{align} \max\> & \color{darkred} {\Delta} \\ & \color{darkred} \Delta \le \color{darkblue} d_{i,j} + \color{darkblue} M (1- \color{darkred}x_{i} \color{darkred}x_{j}) && \forall i\lt j \\ & \sum_i \color{darkred} x_{i} = \color{darkblue} k \\ & \color{darkred}x_{i} \in \{0,1\} \\ \end{align}\] | 
Here \(M\) is a large enough constant, e.g. \[M = \max_{i\le j} d_{i,j}\] The model can be easily linearized by formulating the distance constraint as \[ \Delta \le d_{i,j} + M (1- x_{i}) + M(1- x_{j}) \] This model quickly solves our 10 out of 50 problem. It gives as solution:
|  | 
| Maximization of minimum distance | 
This model has as disadvantage that it does not care about points being closer to each other as long as they are further away than the minimum. For this example, visual inspection seems to indicate this model does good job.
Conclusion
The \(p\)-dispersion-sum problem is very difficult to solve to optimality, even for very small data sets. Extra cuts can enormously help the linearized version of the model. A main drawback is that the optimal solutions do not provide us with well-dispersed points (points can be very close).
The maximin model solves much easier, and it gives better dispersion: selected points are never close to each other.
References
- How to select n objects from a set of N objects, maximizing the sum of pairwise distances between them, https://stackoverflow.com/questions/56761115/how-to-select-n-objects-from-a-set-of-n-objects-maximizing-the-sum-of-pairwise
- Warren P. Adams and Hanif D. Sherali, A Tight Linearization and an Algorithm for Zero-One Quadratic Programming Problems, Management Science, Vol. 32, No. 10 (Oct., 1986), pp. 1274-1290
- Michael J. Kuby, Programming Models for Facility Dispersion: The p‐Dispersion and Maxisum Dispersion Problems, Geographical Analysis, vol. 19, pp.315-329, 1987.
- Ed Klotz, Performance Tuning for Cplex’s Spatial Branch-and-Bound Solver for Global Nonconvex (Mixed Integer) Quadratic Programs, http://orwe-conference.mines.edu/files/IOS2018SpatialPerfTuning.pdf
- Ed Klotz, Specialized Strategies for Products of Binary Variables
 https://www.gurobi.com/wp-content/uploads/2021/03/Models-with-Products-of-Binary-Variables.pdf See also the webinar at: https://www.gurobi.com/resource/models-with-products-of-binary-variables/
 
 
Beautiful blog. It has been a good resource of entertaining and defying problems to show in my classes. Thank you for your time and dedication to post these beautiful math optimization problems.
ReplyDeleteI wrote a blog post that uses this example to illustrate the new automated linearization functionality in SAS: https://blogs.sas.com/content/subconsciousmusings/2021/03/18/automated-linearization-in-sas-optimization/
ReplyDeleteNice! This simple model makes a good example.
Delete