Problem
The problem is easy to describe: given the coordinates of points \(p_i\), find the central point \(x\) that minimizes the sum of the distances from \(x\) to \(p_i\) [1]. This point is sometimes called the geometric median [2].
Data
We just use some random data on the unit square \([0,1]\times[0,1]\):
---- 13 PARAMETER p points (randomly generated)
x y
point1 0.17174713 0.84326671
point2 0.55037536 0.30113790
point3 0.29221212 0.22405287
point4 0.34983050 0.85627035
point5 0.06711372 0.50021067
point6 0.99811763 0.57873338
point7 0.99113304 0.76225047
point8 0.13069248 0.63971876
point9 0.15951786 0.25008053
point10 0.66892861 0.43535638
point11 0.35970027 0.35144137
point12 0.13149159 0.15010179
point13 0.58911365 0.83089281
point14 0.23081574 0.66573446
point15 0.77585761 0.30365848
point16 0.11049229 0.50238487
point17 0.16017276 0.87246231
point18 0.26511455 0.28581432
point19 0.59395592 0.72271907
point20 0.62824868 0.46379787
point21 0.41330699 0.11769536
point22 0.31421227 0.04655151
point23 0.33855027 0.18209959
point24 0.64572713 0.56074555
point25 0.76996172 0.29780586
point26 0.66110626 0.75582167
point27 0.62744750 0.28386420
point28 0.08642462 0.10251467
point29 0.64125115 0.54530950
point30 0.03152485 0.79236064
point31 0.07276700 0.17566105
point32 0.52563261 0.75020767
point33 0.17812371 0.03414099
point34 0.58513117 0.62122998
point35 0.38936190 0.35871415
point36 0.24303462 0.24642154
point37 0.13050280 0.93344972
point38 0.37993791 0.78340046
point39 0.30003426 0.12548322
point40 0.74887411 0.06923246
point41 0.20201556 0.00506586
point42 0.26961305 0.49985148
point43 0.15128587 0.17416945
point44 0.33063773 0.31690605
point45 0.32208696 0.96397664
point46 0.99360221 0.36990306
point47 0.37288857 0.77197833
point48 0.39668414 0.91309632
point49 0.11957773 0.73547889
point50 0.05541847 0.57629980
mean 0.39042709 0.47299042
The last entry labeled
mean is just the average of the \(x\)- and \(y\)-coordinates.
NLP Model
A straightforward non-linear programming model can look like:
Unconstrained NLP Model |
\[\min \sum_i \sqrt{\sum_c (\color{darkred}x_c-\color{darkblue}p_{i,c})^2 }\] |
We use \(c = \{x,y\}\), i.e. we have \(x\) and \(y\)-coordinates. Note that we use \(x\) in two different contexts: element of set \(c\), being the \(x\)-coordinate, and the decision variable \(x_c\).
We can use the
mean as a very good
starting point to help the NLP solver. I.e. \[x_c := \frac{\displaystyle\sum_{i=1}^n p_{i,c}}{n}\]
The picture below shows why the mean is such a good starting point:
|
Optimal center point is close to mean point |
The numeric values are here:
---- 45 PARAMETER results x(center) vs mean
x y sumdist
mean 0.39042709 0.47299042 18.02414861
x 0.37048298 0.43857572 17.96891985
The
sumdist column shows the objective values for these two points.
This is an easy NLP problem. Most NLP solvers just need a few iterations. With a system like GAMS or AMPL we get exact gradients automatically. That is much preferable to finite differences which seems the prevalent method people use in an R or Python environment.
It was pointed out to me that the problem is not smooth: the gradient of \(\sqrt{x}\) is not defined at zero, This can only happen when our point \(x\) exactly coincides with a data point (i.e. all coordinates are exactly the same). We can protect against this by using: \[\min \sum_i \sqrt{\varepsilon+\sum_c (x_c-p_{i,c})^2 }\] for a very small constant \(\varepsilon\gt 0\).
Cone programming I
The above problem can also be written as a cone programming problem. This will allow us to use a different class of solvers to work on this problem. Here we use CVXPY [3] to express the model. The Python code can look like:
import cvxpy as cp
x = cp.Variable(2) # center point
obj = cp.Minimize( cp.sum( [ cp.norm(x-p[i,:]) for i in range(N) ] ) )
prob = cp.Problem(obj)
objval = prob.solve(solver=cp.SCS, verbose=True)
This is very much a straight translation of our unconstrained NLP model. Although we only declared two \(x\) variables, behind the scenes the model is blown up to a rather large one. We can see from the log:
----------------------------------------------------------------------------
SCS v2.1.1 - Splitting Conic Solver
(c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 150
eps = 1.00e-04, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback = 0, rho_x = 1.00e-03
Variables n = 52, constraints m = 150
Cones: soc vars: 150, soc blks: 50
The generated SOCP (second order cone programming) model is larger, but also very sparse. The solver has no problem solving it very quickly. With SOCP solvers we usually don't worry about an initial point like we used in the NLP model.
Cone programming II
If we don't use a modeling tool that can do these transformations automatically, we can use a DIY approach. Second-order cone constraints can be stated as: \[||A_i x+b_i||_2 \le c_i^Tx + d_i \>\>\forall i\] This would imply that we can write our model as:
SOCP Model Attempt |
\[\begin{align} \min & \sum_i \color{darkred} d_i \\ & \color{darkred} d_i^2 \ge \sum_c (\color{darkred}x_c-\color{darkblue}p_{i,c})^2 && \forall i \\ &\color{darkred} d_i \ge 0, \color{darkred} x_c \text{ free}\end{align} \] |
Unfortunately this will yield a message like:
CPLEX Error 5002: 'e(point1)' is not convex or
Constraint 'e(point1)'(0) is not convex. Q should be positive semidefinite in a constraint with finite upper bound. We can repair this as follows:
Repaired SOCP Model |
\[\begin{align} \min & \sum_i \color{darkred} d_i \\ & \color{darkred} d_i^2 \ge \sum_c \color{darkred}y_{i,c}^2 && \forall i \\ & \color{darkred} y_{i,c} = \color{darkred} x_c -\color{darkblue}p_{i,c} && \forall i,c \\ &\color{darkred} d_i \ge 0, \color{darkred} x_c \text{ free}, \color{darkred} y_{i,c} \text{ free}\end{align} \] |
This now solves quickly. We can now understand that
CVXPY did quite a few steps before passing the model to the solver. As argued in [4], it is much better if the modeling system takes care of these reformulations. Some of them are not immediate obvious, and hand-crafted reformulations can be error-prone.
The NLP model has just 2 variables \(x_c\). This SOCP model has 152 variables. Using \(n=50\) data points, we added \(3n\) variables. When looking at the number of constraints we see a similar thing. The NLP model has no constraints, but this SOCP model has \(3n=150\) constraints (\(n\) of them cones).
Conclusion
The min sum distance problem has a simple NLP formulation which can be improved by using a good initial point. It can also be formulated as a SOCP problem. Using high-level modeling tools this is not difficult. Without automatic reformulations things become a bit less obvious.
References