Yet Another Math Programming Consultant: Euclidean Steiner tree problems as MISOCP

In [1] I looked at MIP models for the Minimum Spanning Tree problem. A related, but much more difficult problem is the Euclidean Steiner Tree Problem. Here we are allowed to add points to make the tree shorter. Here are two obvious examples [2]:

Here A, B, C, and D are the original or given points (also known as terminals), and S, S1, S2 are the additional Steiner points. For the first figure with three given points, we can reduce the total length from 2 to 1.732. In the second figure, adding two Steiner points leads to a tree with a length 2.732 (compared to 3).

Joseph Diaz Gergonne (1771–1859) first
proposed the Euclidean Steiner tree problem [6]

There are a few things we know about these Steiner points:

Steiner points have a degree (connections to other points) of 3.
The angles of the links from a Steiner are equal (and thus 120 degrees).
The maximum number of Steiner points is $\color{darkblue}n-2$ ($\color{darkblue}n$ is the number of given points).
Steiner points are located inside the convex hull of the given points.
A point in an optimal Minimum Spanning Tree has a degree of 1, 2 or 3.
If we add $\color{darkblue}n-2$ Steiner points, the total number of nodes becomes $2n-2$ and the total number of edges is $2n-3$. Furthermore, we have that the degree of each terminal is 1 (and the connection is to a Steiner point).

So our problem becomes: given the coordinates of $\color{darkblue}n$ points, where should we place the additional $\color{darkblue}n-2$ Steiner points such that the total length of minimum spanning tree is minimized?

In [3,4] a non-convex MINLP model is proposed. We order the $\color{darkblue}n$ given points and $\color{darkblue}n-2$ Steiner points as follows: \[\begin{align}&\color{darkblue}P = \{1,\dots,\color{darkblue}n\}\\ & \color{darkblue}S=\{\color{darkblue}n+1,\dots,2\color{darkblue}n-2\}\end{align}\] and all points are \[\color{darkblue}I =\{1,\dots,2\color{darkblue}n-2\}\] Finally we have a set of edges $\color{darkblue}E$. Only edges $(i,j)$ with $i\lt j$ exist. Now introduce a continuous variable $\color{darkred}x_{i,c} \in [\color{darkblue}L_c,\color{darkblue}U_c]$ where $\color{darkblue}L_c$ is smallest $x$ or $y$ coordinate of the given points and $\color{darkblue}U_c$ is the largest one. In addition we have \[\color{darkred}y_{i,j} = \begin{cases} 1 & \text{if edge $(i,j)$ is part of the tree} \\ 0 & \text{otherwise}\end{cases}\]With this we can write:

Non-convex MINLP Model
\[\begin{align} \min & \sum_{(i,j)\in\color{darkblue}E} \|\|\color{darkred}x_i-\color{darkred}x_j\|\| \cdot \color{darkred}y_{i,j} && &&(1) \\& \color{darkred}x_{p,c} = \color{darkblue}x^0_{p,c} && \forall p,c && (2)\\& \color{darkred}x_{s,c} \in [\color{darkblue}L_c,\color{darkblue}U_c] && \forall s,c && (3) \\ & \sum_s \color{darkred}y_{p,s} = 1 &&\forall p&& (4)\\ & \sum_{(p,s)\in\color{darkblue}E}\color{darkred}x_{p,s} + \sum_{(s',s)\in\color{darkblue}E}\color{darkred}x_{s',s} + \sum_{(s,s')\in\color{darkblue}E}\color{darkred}x_{s,s'} = 3 && \forall s && (5)\\ & \sum_{(s',s)\in\color{darkblue}E}\color{darkred}y_{s',s}=1 && \forall s \ne 1 && (6) \\ & \color{darkred}y_{i,j} \in \{0,1\} \end{align}\]

Non-convex MINLP Model

\[\begin{align} \min & \sum_{(i,j)\in\color{darkblue}E} ||\color{darkred}x_i-\color{darkred}x_j|| \cdot \color{darkred}y_{i,j} && &&(1) \\& \color{darkred}x_{p,c} = \color{darkblue}x^0_{p,c} && \forall p,c && (2)\\& \color{darkred}x_{s,c} \in [\color{darkblue}L_c,\color{darkblue}U_c] && \forall s,c && (3) \\ & \sum_s \color{darkred}y_{p,s} = 1 &&\forall p&& (4)\\ & \sum_{(p,s)\in\color{darkblue}E}\color{darkred}x_{p,s} + \sum_{(s',s)\in\color{darkblue}E}\color{darkred}x_{s',s} + \sum_{(s,s')\in\color{darkblue}E}\color{darkred}x_{s,s'} = 3 && \forall s && (5)\\ & \sum_{(s',s)\in\color{darkblue}E}\color{darkred}y_{s',s}=1 && \forall s \ne 1 && (6) \\ & \color{darkred}y_{i,j} \in \{0,1\} \end{align}\]

The notation $||x||$ indicates the 2-norm, i.e. $||x||=\sqrt{\sum_k x_k}$. This is a non-linear objective. We may want to use the expression $\sqrt{\sum_k x_k+\epsilon}$ to make sure we can evaluate the gradient.
The $\color{darkred}x$ variables for the given points are fixed.
The $\color{darkred}x$ variables for the Steiner points are inside a box formed by the smallest and largest given points (for each coordinate $\{x,y\}$).
Each terminal or given point $p$ is connected to exactly one Steiner point $s$. This is related to each terminal having degree 1.
This constraint says: the degree of each Steiner point is 3.
The purpose of this constraint is to prevent cycles.

It is noted this model adds $n-2$ Steiner points. It does not work with say just 1 Steiner point.

This model seems to work:

MINLP results

Not all cases are very interesting: the $n=5$ case is not that exciting.

Ideas for improvement

Looking at this model, I see three ways we could potentially improve this.

Instead of a non-convex MINLP that requires a global solver, we can reformulate this as a convex Mixed-Integer Conic (MISOCP) model. That will allow us to use different solvers, potentially getting better performance.
Reduce some symmetry. We can renumber the Steiner points: that will give the same solution. To prevent this symmetry we can order the Steiner points by say $x$-coordinate.
Exploit that the given points can be handled linearly.

Here is the convex version of the MINLP model:

Convex MISOCP Model
\[\begin{align} \min & \sum_{(i,j)\in\color{darkblue}E} \color{darkred}d_{i,j} \\ & \color{darkred}\delta_{i,j,c} = \color{darkred}x_{i,c}-\color{darkred}x_{j,c}&& \forall (i,j)\in \color{darkblue}E, \forall c \\ & \color{darkred}\Delta^2_{i,j} \ge \sum_c \color{darkred}\delta^2_{i,j,c} && \forall (i,j)\in \color{darkblue}E \\ & \color{darkred}d_{i,j} \ge \color{darkred}\Delta_{i,j} - \color{darkblue}M\cdot (1-\color{darkred}y_{i,j}) && \forall (i,j)\in \color{darkblue}E \\& \color{darkred}x_{p,c} = \color{darkblue}x^0_{p,c} && \forall p,c\\& \color{darkred}x_{s,c} \in [\color{darkblue}L_c,\color{darkblue}U_c] && \forall s,c \\ & \sum_s \color{darkred}y_{p,s} = 1 &&\forall p\\ & \sum_{(p,s)\in\color{darkblue}E}\color{darkred}x_{p,s} + \sum_{(s',s)\in\color{darkblue}E}\color{darkred}x_{s',s} + \sum_{(s,s')\in\color{darkblue}E}\color{darkred}x_{s,s'} = 3 && \forall s \\ & \sum_{(s',s)\in\color{darkblue}E}\color{darkred}y_{s',s}=1 && \forall s \ne 1 \\ & \color{darkred}y_{i,j} \in \{0,1\} \\ & \color{darkred}d_{i,j} \ge 0, \color{darkred} \Delta_{i,j} \ge 0\end{align}\]

Convex MISOCP Model

\[\begin{align} \min & \sum_{(i,j)\in\color{darkblue}E} \color{darkred}d_{i,j} \\ & \color{darkred}\delta_{i,j,c} = \color{darkred}x_{i,c}-\color{darkred}x_{j,c}&& \forall (i,j)\in \color{darkblue}E, \forall c \\ & \color{darkred}\Delta^2_{i,j} \ge \sum_c \color{darkred}\delta^2_{i,j,c} && \forall (i,j)\in \color{darkblue}E \\ & \color{darkred}d_{i,j} \ge \color{darkred}\Delta_{i,j} - \color{darkblue}M\cdot (1-\color{darkred}y_{i,j}) && \forall (i,j)\in \color{darkblue}E \\& \color{darkred}x_{p,c} = \color{darkblue}x^0_{p,c} && \forall p,c\\& \color{darkred}x_{s,c} \in [\color{darkblue}L_c,\color{darkblue}U_c] && \forall s,c \\ & \sum_s \color{darkred}y_{p,s} = 1 &&\forall p\\ & \sum_{(p,s)\in\color{darkblue}E}\color{darkred}x_{p,s} + \sum_{(s',s)\in\color{darkblue}E}\color{darkred}x_{s',s} + \sum_{(s,s')\in\color{darkblue}E}\color{darkred}x_{s,s'} = 3 && \forall s \\ & \sum_{(s',s)\in\color{darkblue}E}\color{darkred}y_{s',s}=1 && \forall s \ne 1 \\ & \color{darkred}y_{i,j} \in \{0,1\} \\ & \color{darkred}d_{i,j} \ge 0, \color{darkred} \Delta_{i,j} \ge 0\end{align}\]

This is a bit of a pain to setup: we need a bunch of extra variables and constraints. But as a result, we have a convex quadratic model that can be solved with solvers like Cplex and Gurobi.

To reduce symmetry, we can add the constraint: \[\color{darkred}x_{s,x} \le \color{darkred}x_{s+1,x}\] for Steiner points $s$. This constraint seems to help, especially for the larger intances.

Finally, it looks like we can exploit that for given points, we don't need all this quadratic machinery. So the idea is to split $\color{darkblue}E$ into $\color{darkblue}E_1$ with only points $p$ and $\color{darkblue}E_2$ for all other cases (edges with at least one Steiner point). This did not make a difference: good solvers have sophisticated presolvers that recognize this situation automatically. So we don't need to worry about this at the modeling level.

MISOCP results

Even after this, we can only solve small instances in a reasonable amount of time. It is noted that there are specialized algorithms and implementations around that can solve quite large instances [5].

Conclusion

This is a version of a model for the Euclidean Steiner tree problem that can be solved with high-performance solvers like Cplex and Gurobi. That sounds better than it is: we only can solve small models to optimality. Of course, this model is still interesting from a modeling point of view.

References

Minimum spanning trees in Math Programming models, https://yetanothermathprogrammingconsultant.blogspot.com/2021/03/minimum-spanning-trees-in-math.html.
Steiner tree problem, https://en.wikipedia.org/wiki/Steiner_tree_problem.
Nelson Maculan, Philippe Michelon and Adilson E. Xavier, The Euclidean Steiner tree problem in $R^n$: A mathematical programming formulation, Annals of Operations Research 96 (2000) 209–220
Marcia Fampa, Jon Lee and Nelson Maculana, An overview of exact algorithms for the Euclidean Steiner tree problem in $n$-space, Intl. Trans. in Op. Res. 23 (2016) 861–874
http://www.geosteiner.com/
Brazil, M., Graham, R.L., Thomas, D.A. et al. On the history of the Euclidean Steiner tree problem. Arch. Hist. Exact Sci. 68, 327–354 (2014). Interestingly: "the modern attribution of the problem to Jakob Steiner (1796–1863) is misleading, if not to say erroneous."

Appendix: GAMS model

$ontext

   Euclidean Steiner Tree Model
   erwin@amsterdamoptimization.com

   Based on model from:

     Nelson Maculan, Philippe Michelon and Adilson E. Xavier,
     The Euclidean Steiner tree problem in R^n: A mathematical
     programming formulation,
     Annals of Operations Research 96 (2000) 209–220

$offtext

* size of the problem
* np = number of given points
* ns = number of steiner points

$set   np 5
$eval ns (%np%-2)

set
i     'all points'     /p1*p%np%,s1*s%ns%/
p(i) 'given points'   /p1*p%np%/
s(i) 'Steiner points' /s1*s%ns%/
c     'coordinate'     /x,y/
;

scalars
   np 'number of given points' /%np%/
   ns 'number of Steiner points' /%ns%/
;

alias (i,j), (s,s1), (p,p1);

option seed = 1234;

*--------------------------------------------------
* data
*--------------------------------------------------

parameters
x0(p,c)   'coordinates given points'
box(c,*) 'min/max of x0'
;
x0(p,c) = uniform(0,100);
box(c,'lo') = smin(p,x0(p,c));
box(c,'up') = smax(p,x0(p,c));
box(c,'range') = box(c,'up')-box(c,'lo');
display x0,box;

set E(i,j) 'edges';
E(i,j) = ord(i)<ord(j);

scalar epsilon 'protect square root' /0.0001/;

*--------------------------------------------------
* Nonconvex MINLP model
*--------------------------------------------------

variables
   x(i,c)    'coordinates of points'
   y(i,j)    'tree'
   z         'objective'
;
binary variable y;
positive variable d;

x.fx(p,c) = x0(p,c);
x.lo(s,c) = box(c,'lo');
x.up(s,c) = box(c,'up');

equations
   objective 'min lengths'
   degree1   'terminal points p have degree 1'
   degree3   'Steiner points have degree 3'
   nocycle   'prevent cycles'
   order     'ordering constraint (symmetry breaker)'
   total     'optional constraint (set total edges)'
;

objective..
   z =e= sum(E(i,j),sqrt(sum(c,sqr(x(i,c)-x(j,c)))+epsilon)*y(E));

degree1(p)..
   sum(E(p,s), y(E)) =e= 1;

degree3(s)..
   sum(E(p,s), y(E)) + sum(E(s1,s), y(E)) + sum(E(s,s1), y(E)) =e= 3;

nocycle(s)$(ord(s)>1)..
   sum(E(s1,s),y(E)) =e= 1;

order(s+1)..
   x(s,'x') =l= x(s+1,'x');

total..
sum(E,y(E)) =e= card(i)-1;

model m1 /objective,degree1,degree3,nocycle,order,total/;
option minlp=baron, optcr=0, threads=8;
solve m1 minimizing z using minlp;

*--------------------------------------------------
* MISOCP model
*--------------------------------------------------

variables
d(i,j)    'distance (length)'
d2(i,j,c) 'auxiliary variable in convex distance calculation'
d3(i,j)   'auxiliary variable in convex distance calculation'
;
positive variable d3;

equations
   objective2    'min lengths'
   defd2(i,j,c) 'auxiliary constraint for socp constraint'
   socp(i,j)     'convex conic constraint'
   bound(i,j)    'auxiliary constraint for socp constraint'
;

scalar M;
M = sqrt(2)*(smax(c,box(c,'up'))-smin(c,box(c,'lo')));

objective2..       z =e= sum(E,d(E));

defd2(E(i,j),c).. d2(E,c) =e= x(i,c)-x(j,c);
socp(E)..          sqr(d3(E)) =g= sum(c,sqr(d2(E,c)));
bound(E)..         d(E) =g= d3(E) - M*(1-y(E));

model m2 /objective2,defd2,socp,bound,degree1,degree3,nocycle,order,total/;
option miqcp=cplex;
solve m2 minimizing z using miqcp;

*--------------------------------------------------
* prepare data for plotting in R
*--------------------------------------------------

alias(*,c2,point);

parameter
Points(i,point,c)
Tree(i,j,c2)
;
Points(p,'given',c) = x.l(p,c);
Points(s,'steiner',c) = x.l(s,c);
loop((i,j)$(y.l(i,j)>0.5),
Tree(i,j,'x1') = x.l(i,'x');
Tree(i,j,'y1') = x.l(i,'y');
Tree(i,j,'x2') = x.l(j,'x');
Tree(i,j,'y2') = x.l(j,'y');
);
display Points,Tree;

Yet Another Math Programming Consultant

Monday, March 15, 2021

Euclidean Steiner tree problems as MISOCP

Ideas for improvement

Conclusion

References

Appendix: GAMS model

No comments:

Post a Comment