#### Problem

Assume we have \(N\) points (with \(N\) even). Find \(N/2\) pairs of points, such that the sum of the lengths of the line segments based on these pairs, is minimized. This looks like an assignment problem where we do not know in advance the partition of the nodes in two equally sized sets.

A picture is probably better than my arduous description:

Optimal MIP solution: sum of the lengths is minimized |

Let's try a MIP model.

#### MIP Formulation

The first thing to do is to calculate a distance matrix between points \(i\) and \(j\). As this matrix is symmetric, we only need to store the upper-triangular part (with \(i\lt j\)). That means we only need to store a little bit less than half the number of entries: \[\mathit{ndist}=\frac{1}{2}N(N-1)\]

If we use as variables \[x_{i,j} = \begin{cases} 1 & \text{if nodes $i$ and $j$ are connected}\\ 0 & \text{otherwise}\end{cases}\] we need to think a bit about symmetry. If nodes 1 and 2 are connected, then we only want to see \(x_{1,2}=1\) while ignoring \(x_{2,1}=1\). So again, we only consider the variable \(x_{i,j}\) when \(i \lt j\). Again, this saves about half the number of variables we otherwise would use.

The model can look like:

MIP Model |
---|

\[\begin{align}\min& \sum_{i,j|i \lt j} \color{darkblue}d_{i,j} \color{darkred}x_{i,j}\\ & \sum_{j|j \gt i} \color{darkred}x_{i,j} + \sum_{j|j \lt i} \color{darkred}x_{j,i} = 1 && \forall i\\ & \color{darkred}x_{i,j} \in \{0,1\} \end{align} \] |

The above picture is the optimal solution from this MIP model.

The constraint looks a bit strange. It basically says: every node \(i\) is either a start- or an end-point of a single segment. Somewhat surprising, at first sight, is that this is enough to characterize our graph.

SidenoteWe can see in a picture how the constraint for \(i=4\) is formed: The blue zone must contain exactly one 1. If we would pick \(j=7\), i.e. \(x_{4,7}=1\), the picture becomes: |

It turns out this MIP model is fairly easy to solve. The example problem with \(N=50\) points solves in a matter of seconds. For \(N=500\) points, we get a large MIP with 124,750 binary variables and 500 constraints. Cplex solves this model quickly to optimality in 90 seconds on my laptop.

#### Greedy heuristic

An obvious greedy heuristic is as follows:

- Find the pair with the shortest distance \[\min_{i \lt j} d_{i,j}\] and record this segment.
- Remove the two nodes that belong to this segment from the problem.
- If there are still nodes left, go to step 1.

This heuristic is exceedingly simple to implement. Some results for our 50 point example:

MIP vs Greedy | ||
---|---|---|

Points | Optimal MIP Objective | Greedy Objective |

50 | 241.2667 | 339.236 |

The greedy algorithm is really astonishingly bad. The animation below shows why: with a simple greedy algorithm we are doing very good initially, but pay the price in the end.

Greedy algorithm in action |

We can distinguish three areas:

- For the first, very small segments, the MIP solution and the greedy algorithm pick the same pairs. No difference.
- In the middle, the greedy algorithm is doing a little bit better: it picks shorter segments than the MIP solver.
- But at the end, the greedy algorithm totally collapses: it has to choose very long segments.

#### References

- Finding minimum total length of line segments to connect 2N points, https://stackoverflow.com/questions/59684772/finding-minimum-total-length-of-line-segments-to-connect-2n-points