In this post, let's do some experiments with the "hostile brothers problem." We have a plot of land. In my test models \([L,U]\times [L,U]\) with \(L=0\), \(U=100\)). We want to place \(n\) brothers on this plot. As they don't get along, we want to spread them out by maximizing the distance between neighbors. In modeling terms, we create a maximin model that maximizes the minimum distance between any two brothers. This yields some non-convex problems, so I'll make the problem small: we just have \(n=10\) brothers. This is a very small, but awe-inspiring problem.
Model 1: the simplest quadratic model
We start with the most obvious model:
| Model 1 |
| \[\begin{aligned}\max\>&{\color{darkred}z}\\ & {\color{darkred}z} \le \sum_c \left({\color{darkred}x}_{i,c}-{\color{darkred}x}_{j,c}\right)^2 && \forall i\lt j \\ & {\color{darkred}x}_{i,c} \in [{\color{darkblue}L},{\color{darkblue}U}] \\ & c \in \{x,y\} \\ & i,j \in \{1,\dots,n\} \end{aligned}\] |
Instead of working with the distance, we use the squared distance. That saves us a square root. The square root may cause problems: it is not differentiable at 0. So it is better to use a quadratic model.
The model is very small:
BLOCKS OF EQUATIONS 1 SINGLE EQUATIONS 45
BLOCKS OF VARIABLES 2 SINGLE VARIABLES 21
NON ZERO ELEMENTS 225 NON LINEAR N-Z 180
CODE LENGTH 451 CONSTANT POOL 16
However, this is not an easy model at all! Proving global optimality is actually very difficult. I gave the solver 1,000 seconds, and it came back with:
Elapsed time (sec): 999, estimated tree completion: 0.00015
Node BestSoln BestBound Sols Active Depth Gap GInf Time
2483289 1774.764542 4812.031822 11 2420538 24 63.12% 25 999
Many solvers now produce a progress measure indicating how much of the tree has been explored [1]. I usually am watching the gap. A disadvantage of the gap is that it moves quickly at the beginning and slowly at the end. (Not really a disadvantage: this is much better than the other way around.) That means, for me, I am always too optimistic about how much time the solver will need to finish when looking at how the bounds converge. The tree completion should progress more linearly with time. So it may be easier to make a rough estimate of how much longer the solver will need.
Here, we did not really put a dent in this problem in 1,000 seconds: tree completion is about zero, and the gap is very large. This is not an issue with this particular solver (Xpress): other solvers are struggling in the same way.
In all of the following experiments, I'll keep the time limit of 1,000 seconds. Let's see if we can get closer to proving optimality.
Model 2: less nonlinear, but not better
This is a small experiment, that is unlikely to yield great results. Let's try to make the problem less nonlinear as follows:
| Model 2 |
|---|
| \[\begin{aligned}\max\>&{\color{darkred}z}\\ & {\color{darkred}d}_{i,j,c} ={\color{darkred}x}_{i,c}-{\color{darkred}x}_{j,c} && \forall i\lt j, c \\ & {\color{darkred}z} \le \sum_c {\color{darkred}d}_{i,j,c}^2 && \forall i\lt j \\ & {\color{darkred}x}_{i,c} \in [{\color{darkblue}L},{\color{darkblue}U}] \\ & c \in \{x,y\} \\ & i,j \in \{1,\dots,n\} \end{aligned}\] |
The nonlinear nonzero count (a measure of how nonlinear the model is) goes down:
BLOCKS OF EQUATIONS 2 SINGLE EQUATIONS 135
BLOCKS OF VARIABLES 3 SINGLE VARIABLES 111
NON ZERO ELEMENTS 405 NON LINEAR N-Z 90
CODE LENGTH 361 CONSTANT POOL 16
I hoped this would help somewhat. This particular reformulation is especially useful in large NLP models. I often see models where we can significantly reduce the number of nonlinear variables and nonlinear nonzero elements. Often, this reformulation is quite beneficial for sparse NLP solvers (which typically try to exploit the linear parts of the model). In this case, however, it really didn't move things. We see:
Elapsed time (sec): 962, estimated tree completion: 0.01635
Node BestSoln BestBound Sols Active Depth Gap GInf Time
4902062 1774.764542 5413.400780 11 3749096 48 67.22% 9 998
More of the tree has been explored, but the gap actually deteriorated. Usually, these measures go in the same direction, but not always, as we see. We found the same best solution, as will be the case for all quadratic experiments. That is actually an indication (but not proof) that this solution is optimal or near-optimal. Further evidence of this is provided by the subsequent reformulations: we find the same optimal solution. Often in branch-and-bound type algorithms, when we stop early, the best possible bound is really way too optimistic, and the best found solution is much better than you think. Another observation: putzing around with a model makes you understand it much better.
Model 3: adding a symmetry breaker
Here, we add an ordering constraint to model 1. In model 1, we could just renumber (or rename) the points without changing the situation. Requiring that solutions are ordered by their \(x\)-coordinate can help.
| Model 3 |
|---|
| \[\begin{aligned}\max\>&{\color{darkred}z}\\ & {\color{darkred}z} \le \sum_c \left({\color{darkred}x}_{i,c}-{\color{darkred}x}_{j,c}\right)^2 && \forall i\lt j \\ &{\color{darkred}x}_{i,x} \le{\color{darkred}x}_{i+1,x} && \forall i\lt n\\ & {\color{darkred}x}_{i,c} \in [{\color{darkblue}L},{\color{darkblue}U}] \\ & c \in \{x,y\} \\ & i,j \in \{1,\dots,n\} \end{aligned}\] |
Results:
Elapsed time (sec): 984, estimated tree completion: 0.25586
Node BestSoln BestBound Sols Active Depth Gap GInf Time
3113456 1774.764542 2782.885513 10 1379914 34 36.23% 13 999
This is an improvement. And offers some hope.
In the solution, points with smaller indices \(i\) should be further to the left. This ordering constraint is also used in [2].
Model 4: refining the symmetry breaker
In the previous model, we considered only the \(x\)-coordinate in the ordering constraint. I try to refine that a bit, so that the \(y\)-coordinate also plays a role.
| Model 4 |
|---|
| \[\begin{aligned}\max\>&{\color{darkred}z}\\ & {\color{darkred}z} \le \sum_c \left({\color{darkred}x}_{i,c}-{\color{darkred}x}_{j,c}\right)^2 && \forall i\lt j \\ &{\color{darkred}x}_{i,x}+0.123{\color{darkred}x}_{i,y} \le{\color{darkred}x}_{i+1,x}+0.123 {\color{darkred}x}_{i+1,y} && \forall i\lt n\\ & {\color{darkred}x}_{i,c} \in [{\color{darkblue}L},{\color{darkblue}U}] \\ & c \in \{x,y\} \\ & i,j \in \{1,\dots,n\} \end{aligned}\] |
Why 0.123? No good reason. I suspect that the actual value is not very important. I am just trying to make things unique. Let's see:
Elapsed time (sec): 988, estimated tree completion: 0.55634
Node BestSoln BestBound Sols Active Depth Gap GInf Time
2462808 1774.764542 2484.977151 10 886081 32 28.58% 12 999
That looks definitely better. We have explored more than half of the tree. And the gap has been reduced further. The best-found solution is still the same. If we want to prove optimality, this would be a good candidate.
It is not clear if this symmetry breaker is consistently better or if we were just lucky.
Model 5: Manhattan distance
If we want to use a different metric for the distance between neighbors, e.g., the distance based on the L1 norm, we get a model like:
| Model 5 |
|---|
| \[\begin{aligned}\max\>&{\color{darkred}z}\\ & {\color{darkred}z} \le \sum_c \left|{\color{darkred}x}_{i,c}-{\color{darkred}x}_{j,c}\right| && \forall i\lt j \\ & {\color{darkred}x}_{i,c} \in [{\color{darkblue}L},{\color{darkblue}U}] \\ & c \in \{x,y\} \\ & i,j \in \{1,\dots,n\} \end{aligned}\] |
We can actually use the \(\mathbf{abs}(.)\) function directly in Xpress. That is very nice, as maximizing an L1 norm requires some attention. When I ran this as stated, I saw:
------ unbounded -------
Concurrent statistics:
Dual: 90 simplex iterations, 0.00s
Problem is unbounded if C212 enters the basis
*** Search completed ***
Problem is unbounded
I think Xpress should be able to handle this a bit better. If an LP relaxation is unbounded, the underlying MIP is not necessarily unbounded. Luckily, we can easily fix this by adding an upper bound on \({\color{darkred}z}\): \[{\color{darkred}z} \le 1\text{e}6\] Now, it actually reformulates the problem as a linear MIP model. I had high hopes for this, but unfortunately:
Elapsed time (sec): 999, estimated tree completion: 0.35231
Node BestSoln BestBound Sols Active Depth Gap GInf Time
26588858 57.142857 75.000000 17 9164865 31 23.81% 18 999
Obviously, the best solution is different. After all, we used a different objective. But still no proof of global optimality. Note that Gurobi does not accept this formulation; it requires something simpler when using an absolute value, such as \(y=|x|\). This is really somewhat primitive and prevents me from writing things as I would like. I am not completely sure if I should blame Gurobi or GAMS for this.
Model 6: Manhattan distance + symmetry breaker 1
| Model 6 |
|---|
| \[\begin{aligned}\max\>&{\color{darkred}z}\\ & {\color{darkred}z} \le \sum_c \left|{\color{darkred}x}_{i,c}-{\color{darkred}x}_{j,c}\right| && \forall i\lt j\\ &{\color{darkred}x}_{i,x} \le {\color{darkred}x}_{i+1,y} && \forall i\lt n\\ & {\color{darkred}x}_{i,c} \in [{\color{darkblue}L},{\color{darkblue}U}] \\ & c \in \{x,y\} \\ & i,j \in \{1,\dots,n\} \end{aligned}\] |
Elapsed time (sec): 0, estimated tree completion: 0.89442
8374 57.142857 72.727273 21 6 23 21.43% 11 0
*** Search completed ***
Wow, that is quite something. Proven optimality in 2 seconds. Here we see the impact of this symmetry breaker. From hitting a time limit of 1,000 seconds to optimality in 0 seconds. Note that the best solution is the same as in the previous model.
This looks great, but be aware that increasing the number of points to, say, 20 is already very taxing for the MIP solver.
Model 7: Manhattan distance + symmetry breaker 2
| Model 7 |
|---|
| \[\begin{aligned}\max\>&{\color{darkred}z}\\ & {\color{darkred}z} \le \sum_c \left|{\color{darkred}x}_{i,c}-{\color{darkred}x}_{j,c}\right| && \forall i\lt j\\ &{\color{darkred}x}_{i,x}+0.123{\color{darkred}x}_{i,y} \le{\color{darkred}x}_{i+1,x}+0.123 {\color{darkred}x}_{i+1,y} && \forall i\lt n\\ & {\color{darkred}x}_{i,c} \in [{\color{darkblue}L},{\color{darkblue}U}] \\ & c \in \{x,y\} \\ & i,j \in \{1,\dots,n\} \end{aligned}\] |
This shows:
Elapsed time (sec): 1, estimated tree completion: 0.98114
Node BestSoln BestBound Sols Active Depth Gap GInf Time
59659 57.142857 58.713838 18 42 27 2.68% 9 2
*** Search completed ***
Here, the second symmetry breaker requires more nodes.
Model 8: Manhattan distance + binary variables
Let's try to implement this ourselves using binary variables. Will I be able to beat what Xpress is doing automatically?
| Model 8 |
|---|
| \[\begin{aligned}\max\>&{\color{darkred}z}\\ & {\color{darkred}z} \le \sum_c \left( {\color{darkred}d}_{i,j,c}^+ +{\color{darkred}d}_{i,j,c}^- \right) && \forall i\lt j \\ &{\color{darkred}d}_{i,j,c}^+ -{\color{darkred}d}_{i,j,c}^- = {\color{darkred}x}_{i,c}-{\color{darkred}x}_{j,c} && \forall i \lt j \\& {\color{darkred}d}_{i,j,c}^+ \le {\color{darkblue}M} {\color{darkred}\delta}_{i,j,c} && \forall i \lt j, c \\ & {\color{darkred}d}_{i,j,c}^- \le {\color{darkblue}M} (1-{\color{darkred}\delta}_{i,j,c}) && \forall i \lt j, c \\ &{\color{darkred}x}_{i,x}+0.123{\color{darkred}x}_{i,y} \le{\color{darkred}x}_{i+1,x}+0.123 {\color{darkred}x}_{i+1,y} && \forall i\lt n\\ & {\color{darkred}x}_{i,c} \in [{\color{darkblue}L},{\color{darkblue}U}] \\ & {\color{darkred}d}_{i,j,c}^+,{\color{darkred}d}_{i,j,c}^- \ge 0 \\ & {\color{darkred}\delta}_{i,j,c} \in\{0,1\} \\& {\color{darkblue}M} = 2({\color{darkblue}U}-{\color{darkblue}L}) \\ & c \in \{x,y\} \\ & i,j \in \{1,\dots,n\} \end{aligned}\] |
This is more work than we needed when we used \(\mathbf{abs}(.)\) directly. We see:
Elapsed time (sec): 2, estimated tree completion: 0.89402
Node BestSoln BestBound Sols Active Depth Gap GInf Time
28397 57.142857 72.001382 16 57 20 20.64% 32 3
*** Search completed ***
This needed a bit more time. The number of nodes required seems significantly lower than that of the previous model. So it is not clear who the winner is. Of course, in the end, turnaround time is the most important measure for a user (far more important than the number of nodes used). So from that perspective, I did not win here.
Conclusion
A simple problem where we need to place 10 points on a square is very difficult. We find good solutions (probably optimal or near optimal) very quickly. But proving global optimality is another kettle of fish.
The non-convex quadratic models are extremely difficult. The linear models based on the L1 distance are very easy once we add the ordering constraint.
Instead of adding a symmetry breaker only to the \(x\)-coordinate, I suggest also using the \(y\)-coordinate. That seems to improve progress significantly for the quadratic. For the L1 model, the \(x\)-coordinate version seemed faster (fewer nodes). I guess we need more evidence before we can say which version is better.
There is a connection with a packing problem here. Pack \(n\) unit circles in an outer square of minimum enclosing square. Scale the solution back to our problem. A little attention is needed for the scaling step, as we only need the center points to lie within the outer square.
Model
GAMS Models
$ontext Hostile Brothers Problem $offtext * solver option qcp=xpress, dnlp=xpress, mip=xpress; * time limit option reslim=1000; sets i /point1*point10/ c /x,y/ pm /'+','-'/ ; alias (i,j); set lt(i,j) 'upper triangular part'; lt(i,j) = ord(i) < ord(j); scalars L /0/ U /100/ ; variables z 'objective value' x(i,c) 'location of point i in dimension c' d(i,j,c) 'distance between points i and j in dimension c' d2(i,j,c,pm) 'split variables for linearizing manhattan distance' ; x.lo(i,c) = L; x.up(i,c) = U; positive variable d2; binary variable delta(i,j,c); equation sqdistance 'squared distance' edelta 'definition of d' sqdistance2 'squared distance (alternative, using d)' order 'ordering of points by x-coordinate' order2 'ordering of points by x- and y-coordinate (alternative)' l1distance 'manhattan distance' l1distance2 'manhattan distance (linearized)' split1 'variable splitting constraint 1' split2 'variable splitting constraint 2' split3 'variable splitting constraint 3' ; sqdistance(lt(i,j)).. z =l= sum(c, sqr(x(i,c) - x(j,c))); edelta(lt(i,j),c).. d(i,j,c) =e= x(i,c) - x(j,c); sqdistance2(lt(i,j)).. z =l= sum(c, sqr(d(i,j,c))); order(i+1).. x(i,'x') =l= x(i+1,'x'); order2(i+1).. x(i,'x')+0.123*x(i,'y') =l= x(i+1,'x')+0.123*x(i+1,'y'); * manhattan distance l1distance(lt(i,j)).. z =l= sum(c, abs(x(i,c) - x(j,c))); * needed for xpress solver z.up = 1e6; scalar M 'big-M'; M = 2*(U-L); * manhattan distance manually linearized split1(lt(i,j),c).. d2(i,j,c,'+') - d2(i,j,c,'-') =e= x(i,c) - x(j,c); split2(lt(i,j),c).. d2(i,j,c,'+') =l= M*delta(i,j,c); split3(lt(i,j),c).. d2(i,j,c,'-') =l= M*(1-delta(i,j,c)); l1distance2(lt(i,j)).. z =l= sum((c,pm), d2(i,j,c,pm)); model brothers1 /sqdistance/; model brothers2 /edelta, sqdistance2/; model brothers3 /sqdistance, order/; model brothers4 /sqdistance, order2/; model brothers5 /l1distance/; model brothers6 /l1distance, order/; model brothers7 /l1distance, order2/; model brothers8 /l1distance2, split1,split2,split3, order2/; solve brothers1 maximizing z using qcp; solve brothers2 maximizing z using qcp; solve brothers3 maximizing z using qcp; solve brothers4 maximizing z using qcp; solve brothers5 maximizing z using dnlp; solve brothers6 maximizing z using dnlp; solve brothers7 maximizing z using dnlp; solve brothers8 maximizing z using mip; |
References
Very nice, I've played with similar models myself. There are a couple of ways to improve the formulation. One is to extend the symmetry breaking: you are taking care of the ordering of the points, but not the 8 possible ways the same arrangement can appear. You can add a constraint that the barycentre of the points is in a particular 8th of the square.
ReplyDeleteAnother thing is that with the symmetry ordering you can remove the distance constraint from the points that are far away. Care must be taken though and the final solution needs to be checked to make sure that all distances are accounted for. For the ordering symmetry breaking I prefer to use integer coefficients 2 and 3. Those should be generic enough when circles are concerned.
By the way, the problem is equivalent to placing n disjoint circles of radius r in the unit square (only the centres are in the square, not necessarily the whole circle), maximizing 2r. I know it doesn't help a lot for modelling, but it does help for visualizing the problem.
As to the lack of improvement when using the auxiliary variables d: Xpress creates those variables internally. I'll look into why we don't get the bound for the absolute value expression.
Great! Thanks.
Delete