Problem statement
Given \(n\) points with their distances \(d_{i,j}\), select \(k\) points such that the sum of the distances between the selected points is
maximized [1].
A simple MIQP (MixedInteger Quadratic Programming) model is:
Nonconvex MIQP Model 
\[\begin{align}
\max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}x_{i} \color{darkred}x_{j} \\
& \sum_i \color{darkred} x_{i} = \color{darkblue} k \\
& \color{darkred}x_{i} \in \{0,1\}
\end{align}\]

Notice we only need to look at distances \(d_{i,j}\) with \(i\lt j\) as we can assume symmetry. If not, just make \(d_{i,j}\) symmetric by: \[d_{i,j} = \frac{d_{i,j} + d_{j,i}}{2}\]
As we shall see this is not such an easy problem to solve to optimality.
This problem is called the \(p\)dispersionsum problem.
Example data
I generated random \(n=50\) \(2d\) points:
 24 PARAMETER coord random coordinates
x y
i1 1.717 8.433
i2 5.504 3.011
i3 2.922 2.241
i4 3.498 8.563
i5 0.671 5.002
i6 9.981 5.787
i7 9.911 7.623
i8 1.307 6.397
i9 1.595 2.501
i10 6.689 4.354
i11 3.597 3.514
i12 1.315 1.501
i13 5.891 8.309
i14 2.308 6.657
i15 7.759 3.037
i16 1.105 5.024
i17 1.602 8.725
i18 2.651 2.858
i19 5.940 7.227
i20 6.282 4.638
i21 4.133 1.177
i22 3.142 0.466
i23 3.386 1.821
i24 6.457 5.607
i25 7.700 2.978
i26 6.611 7.558
i27 6.274 2.839
i28 0.864 1.025
i29 6.413 5.453
i30 0.315 7.924
i31 0.728 1.757
i32 5.256 7.502
i33 1.781 0.341
i34 5.851 6.212
i35 3.894 3.587
i36 2.430 2.464
i37 1.305 9.334
i38 3.799 7.834
i39 3.000 1.255
i40 7.489 0.692
i41 2.020 0.051
i42 2.696 4.999
i43 1.513 1.742
i44 3.306 3.169
i45 3.221 9.640
i46 9.936 3.699
i47 3.729 7.720
i48 3.967 9.131
i49 1.196 7.355
i50 0.554 5.763
Let's try to find \(k=10\) points that are most dispersed. The distance matrix is formed by calculating Euclidean distances.

10 out of 50 most dispersed points 
Solve nonconvex problem
The above MIQP problem is nonconvex. We can solve the nonconvex MIQP problem in different ways:
 Throw this into a global solver such as Baron, Couenne or Antigone
 Use a solver like Cplex (option qtolin) or Gurobi (option preQlinearize) that can linearize the problem automatically for us. See further down how we can do this ourselves.
 Instruct Cplex to use a QP formulation (option qtolin=0) and tell it to use the global QP solver (option optimalitytarget=3)
Convexification of the quadratic model
There is a trick to make this problem convex. For maximization, we require that the \(Q\) matrix is negative definite. In our case the \(Q\) matrix is our distance matrix. The following algorithm will make the problem convex:
 Calculate the largest eigenvalue \(\lambda_{max}\) of the distance matrix \(0.5 D\). We multiply by 0.5 because we only use half of the matrix in the objective to prevent double counting.
 If \(\lambda_{max} \lt 0 \): we are done (matrix is negativedefinite)
 Form the objective: \[\max \sum_{i \lt j} d_{i,j} x_{i} x_{j}  \lambda_{max} \sum_i x_i^2 + \lambda_{max} \sum_i x_i\]
This essentially imposes a possible large diagonal perturbation of the \(Q\) matrix. We compensate with a linear term. This uses the fact that \(x_i = x_i^2\) when \(x_i \in \{0,1\}\).
Linearization
The quadratic model is easily linearized by observing that a wellknown trick to handle \(y_{i,j} = x_i x_j\) is to write: \[\begin{align} & y_{i,j} \le x_i \\ & y_{i,j} \le x_j \\ & y_{i,j} \ge x_i +x_j1\\& y_{i,j}, x_i \in \{0,1\}\end{align}\] A complete model can look like:
Linearized Model 1 
\[\begin{align}
\max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}y_{i,j} \\
& \sum_i \color{darkred} x_{i} = \color{darkblue} k \\
& \color{darkred}y_{i,j} \le \color{darkred} x_i && \forall i\lt j \\
& \color{darkred}y_{i,j} \le \color{darkred} x_j && \forall i\lt j \\
& \color{darkred}x_{i} \in \{0,1\} \\
& \color{darkred}y_{i,j} \in [0,1]
\end{align}\]

There are two things that may need some attention:
 The inequality \(y_{i,j}\ge x_i +x_j 1\) was dropped. The objective will take care of this.
 The variables \(y_{i,j}\) were relaxed to be continuous between 0 and 1. The \(y\) variables will be automatically integer (well, where it matters). Often models with fewer integer variables are easier to solve. Modern solvers, however, may reintroduce binary variables for this particular model. In Cplex you can see a message like shown below.
Tried aggregator 1 time.
MIP Presolve eliminated 1 rows and 1 columns.
Reduced MIP has 2451 rows, 1275 columns, and 4950 nonzeros.
Reduced MIP has 50 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.06 sec. (2.79 ticks)
Found incumbent of value 0.000000 after 0.06 sec. (4.54 ticks)
Probing time = 0.02 sec. (0.51 ticks)
Tried aggregator 1 time.
Reduced MIP has 2451 rows, 1275 columns, and 4950 nonzeros.
Reduced MIP has 1275 binaries, 0 generals, 0 SOSs, and 0 indicators.
A tighter linearization
In [2] a tighter formulation is proposed:
Linearized Model 2 
\[\begin{align}
\max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}y_{i,j} \\
& \sum_i \color{darkred} x_{i} = \color{darkblue} k \\
& \color{darkred}y_{i,j} \le \color{darkred} x_i && \forall i\lt j \\
& \color{darkred}y_{i,j} \le \color{darkred} x_j && \forall i\lt j \\
& \color{darkred}y_{i,j} \ge \color{darkred} x_i +\color{darkred} x_j 1 && \forall i\lt j \\
& \sum_{i\lt j} \color{darkred}y_{i,j} + \sum_{i\gt j} \color{darkred}y_{j,i} = (\color{darkblue} k1) \color{darkred}x_j && \forall j \\
& \color{darkred}x_{i} \in \{0,1\} \\
& \color{darkred}y_{i,j} \in \{0,1\}
\end{align}\] 
Essentially, we multiplied the constraint \(\sum_i x_{i} = k\) by \(x_j\), and added these as constraints. The derivation is as follows: \[\begin{align} & \left( \sum_i x_i\right) x_j = k x_j\\ \Rightarrow & \sum_{i\lt j} x_i x_j + x_j^2 + \sum_{i\gt j} x_i x_j = k x_j\\ \Rightarrow & \sum_{i\lt j} y_{i,j} + \sum_{i\gt j} y_{j,i} = (k1) x_j \end{align}\] We used here that \(x_j^2 = x_j\).
My intuition is as follows. If \(x_j=1\) then exactly \(k1\) other \(x_i\)'s should be 1. That means \(k1\) \(y_{i,j}\)'s should be 1. As we only use the uppertriangular part, the picture becomes:

Picture of y layout 
Given that \(x_j=1\), we need to have \(k1\) ones in the orange region.
Aggregation
We can aggregate the previous cut, which leads to: \[\sum_{i\lt j} y_{i,j} = \frac{k(k1)}{2}\] The advantage of this version is that we only need one extra constraint [4]:
Linearized Model 3 
\[\begin{align}
\max & \sum_{i \lt j} \color{darkblue} d_{i,j} \color{darkred}y_{i,j} \\
& \sum_i \color{darkred} x_{i} = \color{darkblue} k \\
& \color{darkred}y_{i,j} \le \color{darkred} x_i && \forall i\lt j \\
& \color{darkred}y_{i,j} \le \color{darkred} x_j && \forall i\lt j \\
& \color{darkred}y_{i,j} \ge \color{darkred} x_i +\color{darkred} x_j 1 && \forall i\lt j \\
& \sum_{i\lt j} \color{darkred}y_{i,j} = \frac{ \color{darkblue}k (\color{darkblue} k1)}{2} \\
& \color{darkred}x_{i} \in \{0,1\} \\
& \color{darkred}y_{i,j} \in \{0,1\}
\end{align}\] 
The intuition is easy: if we have \(k\) \(x_j\)'s equal to one, we need to have \(k(k1)/2\) \(y_{i,j}\)'s equal to one.
Numerical results
I tried these formulations using Cplex with a time limit of 1800 seconds (half an hour).
Model  Time  Obj  Gap  Notes 
Nonconvex MIQP  1800  332.1393  87.64%  Solve as quadratic model.
options: qtolin=0, optimalitytarget=3 
Nonconvex MIQP  1800  332.9129  57.65%  Automatically converted to linear model.
options: qtolin=1 
Convex MIQP  1800  332.9129  92.22%  option: qtolin=0 
MIP 1  1800  332.9129  57.65%  Defaults 
MIP 2  8  332.9129  optimal  Defaults (roughly same performance whether or not including \(y_{i,j}\ge x_i+x_j1\)) 
MIP 3  400  332.9129  optimal  Defaults. Excludes \(y_{i,j}\ge x_i+x_j1\). 
MIP 3  11  332.9129  optimal  Defaults. Includes \(y_{i,j}\ge x_i+x_j1\). 
Just one observation, but I think these results are representative for other (small) instances.
We dropped the constraint \(y_{i,j}\ge x_i+x_j1\) from MIP model 1: the objective will push \(y\) upwards on its own. For models MIP 2 and 3 it is wise to reintroduce them.
It is noted that although the gaps are terrible for all models without extra cuts. However, some of these methods find the best solution very quickly. They are just not able to prove optimality in a reasonable time. Here is an example:
Nodes Cuts/
Node Left Objective IInf Best Integer Best Bound ItCnt Gap
0 0 1235.6817 50 1235.6817 7
* 0+ 0 332.9129 1235.6817 271.17%
Found incumbent of value 332.912894 after 0.02 sec. (4.67 ticks)
0 2 1235.6817 50 332.9129 1162.0829 7 249.07%
Elapsed time = 0.03 sec. (11.72 ticks, tree = 0.02 MB, solutions = 1)
Better dispersion
From the picture of the solution (earlier in this post), we see that this model does not prevent selected points to be close to each other. A better model may be to maximize the minimum distance between selected points. This can be modeled as:
Maximize Minimum Distance 
\[\begin{align}
\max\> & \color{darkred} {\Delta} \\
& \color{darkred} \Delta \le \color{darkblue} d_{i,j} + \color{darkblue} M (1 \color{darkred}x_{i} \color{darkred}x_{j}) && \forall i\lt j \\
& \sum_i \color{darkred} x_{i} = \color{darkblue} k \\
& \color{darkred}x_{i} \in \{0,1\} \\
\end{align}\] 
Here \(M\) is a large enough constant, e.g. \[M = \max_{i\le j} d_{i,j}\] The model can be easily linearized by formulating the distance constraint as \[ \Delta \le d_{i,j} + M (1 x_{i}) + M(1 x_{j}) \] This model quickly solves our 10 out of 50 problem. It gives as solution:

Maximization of minimum distance 
This model has as disadvantage that it does not care about points being closer to each other as long as they are further away than the minimum. For this example, visual inspection seems to indicate this model does good job.
Conclusion
The \(p\)dispersionsum problem is very difficult to solve to optimality, even for very small data sets. Extra cuts can enormously help the linearized version of the model. A main drawback is that the optimal solutions do not provide us with welldispersed points (points can be very close).
The maximin model solves much easier, and it gives better dispersion: selected points are never close to each other.
References
 How to select n objects from a set of N objects, maximizing the sum of pairwise distances between them, https://stackoverflow.com/questions/56761115/howtoselectnobjectsfromasetofnobjectsmaximizingthesumofpairwise
 Warren P. Adams and Hanif D. Sherali, A Tight Linearization and an Algorithm for ZeroOne Quadratic Programming Problems, Management Science,
Vol. 32, No. 10 (Oct., 1986), pp. 12741290
 Michael J. Kuby, Programming Models for Facility Dispersion: The p‐Dispersion and Maxisum Dispersion Problems, Geographical Analysis, vol. 19, pp.315329, 1987.
 Ed Klotz, Performance Tuning for Cplex’s Spatial BranchandBound Solver for Global Nonconvex (Mixed Integer) Quadratic Programs, http://orweconference.mines.edu/files/IOS2018SpatialPerfTuning.pdf