Monday, November 14, 2016

Taxicab distances: L1 norm minimization

From this post:

Suppose we are given \(n\) points \(A_1, \dots, A_n \in \mathbb{R}^2\). The task is to find a point \(x = (x_1,x_2) \in \mathbb{R}^2\) such that the sum of distances to the points \(A_1, \dots, A_n\) in the \(\ell_1\)-norm is minimized. Formulate this problem as a linear program.

There are at least two (primal) formulations. A variable splitting technique can look like:

  \min & \sum_{i,j} y^+_{i,j} + y^-_{i,j}\\
  &y^+_{i,j} - y^-_{i,j} = x_j-A_{i,j}\\
  &y^+_{i,j}\ge 0, y^-_{i,j}\ge 0

A different formulation is

  \min & \sum_{i,j} y_{i,j}\\
  &-y_{i,j} \le x_j-A_{i,j} \le y_{i,j}\\
  &y_{i,j}\ge 0

I don’t think this formulation has a name. We can drop \(y_{i,j}\ge 0\) if we want, as this is implied by \(-y_{i,j}\le y_{i,j}\).

If you let mathematicians loose on this, you may end up with formulation 2 stated as:

\[\boxed{\begin{array}{ll} \text{minimize} & 1_{2n}^{\top} \mathrm y\\ \text{subject to} & -\mathrm y \leq (1_n \otimes \mathrm I_2) \, \mathrm x - \mbox{vec} (\mathrm A) \leq \mathrm y\\ & \mathrm y \geq 0_{2n}\end{array}}\]

I don’t really like this type of notation: at least for me it is much less intuitive.

The first formulation has fewer equations but more variables. The second formulation has more constraints as we need to split \(-y_{i,j} \le x_j-A_{i,j} \le y_{i,j}\) into \(x_j-A_{i,j} \ge –y_{i,j}\) and \(x_j-A_{i,j} \le y_{i,j}\). I don’t think there is much reason to choose one formulation above the other. Depending largely on my mood I use either version 1 or 2.