## Thursday, February 25, 2021

### A strange regression problem: multiple-line fitting

Consider the following data:

----     14 PARAMETER x  data

i1  17.175,    i2  84.327,    i3  55.038,    i4  30.114,    i5  29.221,    i6  22.405,    i7  34.983,    i8  85.627
i9   6.711,    i10 50.021,    i11 99.812,    i12 57.873,    i13 99.113,    i14 76.225,    i15 13.069,    i16 63.972
i17 15.952,    i18 25.008,    i19 66.893,    i20 43.536,    i21 35.970,    i22 35.144,    i23 13.149,    i24 15.010
i25 58.911,    i26 83.089,    i27 23.082,    i28 66.573,    i29 77.586,    i30 30.366,    i31 11.049,    i32 50.238
i33 16.017,    i34 87.246,    i35 26.511,    i36 28.581,    i37 59.396,    i38 72.272,    i39 62.825,    i40 46.380
i41 41.331,    i42 11.770,    i43 31.421,    i44  4.655,    i45 33.855,    i46 18.210,    i47 64.573,    i48 56.075
i49 76.996,    i50 29.781

----     14 PARAMETER y  data

i1    32.320,    i2   238.299,    i3    87.081,    i4    17.825,    i5   -10.154,    i6   -35.813,    i7    14.506
i8   204.903,    i9  -103.014,    i10   79.521,    i11 -153.679,    i12  -47.971,    i13  277.483,    i14  160.084
i15   28.687,    i16  133.650,    i17  -63.820,    i18    4.028,    i19  145.434,    i20   48.326,    i21   49.295
i22   35.478,    i23  -58.602,    i24   34.463,    i25  -50.666,    i26 -115.522,    i27  -19.540,    i28  134.829
i29  198.074,    i30   -7.351,    i31   40.786,    i32   82.107,    i33   27.661,    i34  -82.669,    i35    5.081
i36   -1.278,    i37  118.345,    i38  172.905,    i39  -57.943,    i40   56.981,    i41  -45.248,    i42   34.517
i43   11.780,    i44  -98.506,    i45   -1.064,    i46   33.323,    i47  139.050,    i48  -35.595,    i49 -102.580
i50    3.912


When we plot this, we see:

This gives a clear impression that we actually have two different regression problems here. As we only have one data set, let's see if we can develop an optimization model that can assign points $$(\color{darkblue}x_i,\color{darkblue}y_i)$$ to one of the regression lines, and find the least-squares fit for both lines at the same time.

For this, we introduce a binary variable $\color{darkred}\delta_i = \begin{cases} 1 & \text{if point i belongs to regression line 1} \\ 0 & \text{if point i belongs to regression line 2}\end{cases}$

A first model to perform this combined task can look like this:

Non-convex MIQCP model A
\begin{align}\min& \sum_i \left(\color{darkred}r_{1,i}^2 + \color{darkred}r_{2,i}^2 \right)\\ & \color{darkred}r_{1,i} = \left(\color{darkblue}y_i - \color{darkred}\alpha_0 - \color{darkred}\alpha_1 \cdot \color{darkblue}x_i \right)\cdot \color{darkred}\delta_i \\& \color{darkred}r_{2,i} = \left(\color{darkblue}y_i - \color{darkred}\beta_0 - \color{darkred}\beta_1 \cdot \color{darkblue}x_i \right)\cdot(1-\color{darkred}\delta_i) \\ & \color{darkred}\delta_i \in \{0,1\}\end{align}

Here we set the residual to zero if the point belongs to the other regression line. For all models here, variables without restrictions are supposed to be free (i.e. their bounds are minus infinity and plus infinity). Warning: many solvers make variables non-negative unless otherwise instructed.

A somewhat more compact formulation is:

Non-convex MIQCP model B
\begin{align}\min& \sum_i \color{darkred}r_i^2 \\ & \color{darkred}r_i = \color{darkblue}y_i - \left[\color{darkred}\alpha_0\cdot \color{darkred}\delta_i + \color{darkred}\beta_0\cdot(1-\color{darkred}\delta_i)\right] - \left[\color{darkred}\alpha_1\cdot \color{darkred}\delta_i+ \color{darkred}\beta_1\cdot(1-\color{darkred}\delta_i)\right] \cdot \color{darkblue}x_i \\ & \color{darkred}\delta_i \in \{0,1\}\end{align}

Although fewer variables and constraints, this model may be more difficult to solve.

Instead of using the $$\color{darkred}\delta$$ variables in the constraints, we can also move them to the objective:

Non-convex MINLP model C
\begin{align}\min& \sum_i \left( \color{darkred}r_{i,1}^2\cdot\color{darkred}\delta_i +\color{darkred}r_{2,1}^2\cdot (1-\color{darkred}\delta_i) \right)\\ & \color{darkred}r_{1,i} = \color{darkblue}y_i - \color{darkred}\alpha_0 - \color{darkred}\alpha_1\cdot \color{darkblue}x_i\\ & \color{darkred}r_{2,i} = \color{darkblue}y_i - \color{darkred}\beta_0 - \color{darkred}\beta_1\cdot \color{darkblue}x_i \\ & \color{darkred}\delta_i \in \{0,1\}\end{align}

This MINLP model is no longer quadratic.

We can easily make this model convex by using linear constraints. If we can use indicator constraints (depends on the solver), we can write:

Convex MIQP model D
\begin{align}\min& \sum_i \left( \color{darkred}r_{i,1}^2 +\color{darkred}r_{2,1}^2 \right)\\ & \color{darkred}\delta_i=1 \Rightarrow\color{darkred}r_{1,i} = \color{darkblue}y_i - \color{darkred}\alpha_0 - \color{darkred}\alpha_1\cdot \color{darkblue}x_i\\ &\color{darkred}\delta_i=0 \Rightarrow\color{darkred}r_{2,i} = \color{darkblue}y_i - \color{darkred}\beta_0 - \color{darkred}\beta_1\cdot \color{darkblue}x_i \\ & \color{darkred}\delta_i \in \{0,1\}\end{align}

We can make those model a little bit less nonlinear as follows:

Convex MIQP model E
\begin{align}\min& \sum_i \color{darkred}r_i^2 \\ & \color{darkred}\delta_i=1 \Rightarrow\color{darkred}r_i = \color{darkblue}y_i - \color{darkred}\alpha_0 - \color{darkred}\alpha_1\cdot \color{darkblue}x_i\\ &\color{darkred}\delta_i=0 \Rightarrow\color{darkred}r_i = \color{darkblue}y_i - \color{darkred}\beta_0 - \color{darkred}\beta_1\cdot \color{darkblue}x_i \\ & \color{darkred}\delta_i \in \{0,1\}\end{align}

Let's do a big-M formulation:

Convex MIQP model F
\begin{align}\min& \sum_i \color{darkred}r_i^2 \\ &\color{darkred}r_i = \color{darkblue}y_i - \color{darkred}\alpha_0 - \color{darkred}\alpha_1\cdot \color{darkblue}x_i +\color{darkred}s_{1,i}\\ &\color{darkred}r_i = \color{darkblue}y_i - \color{darkred}\beta_0 - \color{darkred}\beta_1\cdot \color{darkblue}x_i +\color{darkred}s_{2,i} \\ & -\color{darkblue}M\cdot \color{darkred}\delta_i \le \color{darkred}s_{1,i} \le \color{darkblue}M\cdot \color{darkred}\delta_i\\ & -\color{darkblue}M\cdot (1-\color{darkred}\delta_i )\le \color{darkred}s_{2,i} \le \color{darkblue}M\cdot (1-\color{darkred}\delta_i)\\ & \color{darkred}\delta_i \in \{0,1\}\end{align}

and finally a SOS1 formulation:

Convex MIQP model G
\begin{align}\min& \sum_i \color{darkred}r_i^2 \\ &\color{darkred}r_i = \color{darkblue}y_i - \color{darkred}\alpha_0 - \color{darkred}\alpha_1\cdot \color{darkblue}x_i +\color{darkred}s_{1,i}\\ &\color{darkred}r_i = \color{darkblue}y_i - \color{darkred}\beta_0 - \color{darkred}\beta_1\cdot \color{darkblue}x_i +\color{darkred}s_{2,i} \\ & \color{darkred}s_{1,i}, \color{darkred}s_{2,i} \in \textbf{SOS1}\end{align}

Here the SOS1 construct will make sure that only one of the pair $$(\color{darkred}s_{1,i}, \color{darkred}s_{2,i})$$ will be non-zero. The advantage of this formulation is that we don't need good or even any bounds.

There are a few formulations I left out, but the point is clear: even a small problem can lead to very many different formulations.

I have not done extensive computational testing, but the big-M formulation seems to be working quite well. Of course, this assumes we have some reasonable bounds ($$-1000$$ and $$+1000$$ in my example). Some of the other formulations may be more appropriate if you are not sure about bounds.

#### Results

After solving the model with any of these models, we have the following result:

The color of the points indicates to what regression line the points are assigned.

This problem is called the multiple-line fitting problem. There are other approaches to this problem [2,3,4]. Some of these methods are heuristic in nature and may require tuning some parameters. This may be a statistical problem where MIP/MIQP tools have something to offer. Using a proper formulation we can solve a problem like this within a few seconds to proven optimality and without any tuning parameters.

#### Extensions: multiple lines

We can easily extend these models in different ways:
• Finding more lines. This is not difficult to add to the models.
Note: the SOS1 formulation may be an exception: we need "1 of  K to be zero" while SOS1 can only do: "1 of K can be nonzero".
• Using LAD or Chebychev regression instead of least squares [5,6]. This can help with performance, as we reduce the problem to a linear MIP model. This is also useful if you don't have access to a MIQP solver.

A model for multiple lines and using an LAD regression [5] approach is as follows:

\begin{align} \min\>&\sum_i \color{darkred}r^{+}_i +\color{darkred}r^{-}_i\\ & \color{darkred}r^+_i-\color{darkred}r^-_i = \color{darblue}y_i - \color{darkred}\alpha_{0,k} - \color{darkred}\alpha_{1,k}\cdot\color{darkblue}x_i +\color{darkred}s_{i,k} &&\forall i,k \\ & -\color{darkblue}M \cdot(1-\color{darkred}\delta_{i,k}) \le \color{darkred}s_{i,k} \le \color{darkblue}M \cdot(1-\color{darkred}\delta_{i,k}) && \forall i,k \\ & \sum_k\color{darkred}\delta_{i,k} = 1 && \forall i \\ & \color{darkred}\alpha_{0,k} \le \color{darkred}\alpha_{0,k+1} && \forall k \>\text{except last} \\ & \color{darkred}\delta_{i,k} \in \{0,1\} & \\\color{darkred}r^{+}_i,\color{darkred}r^{-}_i \ge 0\end{align}

The picture below was generated from three different linear models:

This just looks almost random. Let's see what happens.

This is not bad at all.

#### References

1. Fitting two lines to a set of 2d points, https://stackoverflow.com/questions/66349148/fitting-two-lines-to-a-set-of-2d-points
2. Yan Guo, A Multiple-Line Fitting Algorithm Without Initialization, https://cse.sc.edu/~songwang/CourseProj/proj2005/report/guo.pdf
3. Lara-Alvarez, C., Romero, L. & Gomez, C. Multiple straight-line fitting using a Bayes factor. Adv Data Anal Classif 11, 205–218 (2017)
4. Random sample consensus, https://en.wikipedia.org/wiki/Random_sample_consensus
6. Linear Programming and Chebyshev Regression, https://yetanothermathprogrammingconsultant.blogspot.com/2017/11/lp-and-chebyshev-regression.html

#### Appendix: Example GAMS model

 $ontext Multiple-line regression Least squares and least absolute deviations can be used as objective. erwin@amsterdamoptimization.com$offtext *--------------------------------------------------------- * data *--------------------------------------------------------- set    i 'observations' /i1*i40/    k 'line'     /line1*line3/    f /'const.term', slope/ ; table data(i,*)               x           y i1       20.202      85.162 i2        0.507       2.103 i3       26.961      55.969 i4       49.985      44.690 i5       15.129      86.515 i6       17.417      79.866 i7       33.064      56.328 i8       31.691      29.422 i9       32.209      64.021 i10      96.398      85.191 i11      99.360      68.235 i12      36.990      57.516 i13      37.289      25.884 i14      77.198      56.157 i15      39.668      58.398 i16      91.310      66.205 i17      11.958      93.742 i18      73.548      28.178 i19       5.542       5.788 i20      57.630      60.830 i21       5.141      53.988 i22       0.601      42.559 i23      40.123      61.928 i24      51.988      42.984 i25      62.888      58.308 i26      22.575       4.414 i27      39.612      67.282 i28      27.601      56.445 i29      15.237       0.218 i30      93.632      11.896 i31      42.266      60.515 i32      13.466      51.721 i33      38.606      65.392 i34      37.463      16.978 i35      26.848      74.588 i36      94.837      -0.803 i37      18.894      60.060 i38      29.751      14.005 i39       7.455      60.066 i40      40.135      62.898 ; parameter x(i),y(i); x(i) = data(i,'x'); y(i) = data(i,'y'); display x,y; *--------------------------------------------------------- * model 1: multiple line least squares regression *--------------------------------------------------------- scalar M 'big-M' /1000/; free variables    r(i)         'residuals'    s(i,k)       'slack (to turn off a fit equation)'    z            'objective'    a(k,f)       'coefficients to estimate' ; binary variable delta(i,k) 'assignment of point to line'; equations    obj    fit(i,k)   'standard linear fit + slack'    impl1      'big-M constraint: delta=1 => s<=0'    impl2      'big-M constraint: delta=1 => s>=0'    sum1       'assignment'    order      'order by constant term' ; obj..          z =e= sum(i, sqr(r(i))); fit(i,k)..     r(i) =e= y(i) - a(k,'const.term') - a(k,'slope')*x(i) + s(i,k); impl1(i,k)..   s(i,k) =l= M*(1-delta(i,k)); impl2(i,k)..   s(i,k) =g= -M*(1-delta(i,k)); sum1(i)..      sum(k, delta(i,k)) =e= 1; order(k+1)..   a(k,'const.term') =g= a(k+1,'const.term'); option optcr=0, threads=8, mip=cplex, miqcp=cplex; model m1 /all/; solve m1 minimizing z using miqcp; * reporting parameter report1(*,*) 'least squares results'; report1(k,f) = a.l(k,f); report1(k,'points') = sum(i$(delta.l(i,k)>0.5),1); report1(k,'sum(r^2)') = sum(i$(delta.l(i,k)>0.5),sqr(r.l(i))); report1(k,'sum|r|') = sum(i$(delta.l(i,k)>0.5),abs(r.l(i))); report1('total','sum(r^2)') = sum(i,sqr(r.l(i))); report1('total','sum|r|') = sum(i,abs(r.l(i))); report1('total','points') = card(i); display report1; *--------------------------------------------------------- * model 2: multiple line LAD regression *--------------------------------------------------------- positive variables rplus(i) 'positive residuals' rmin(i) 'negative residuals' ; equation fit2(i,k) 'LAD fit' obj2 'LAD obj' ; obj2.. z =e= sum(i, rplus(i)+rmin(i)); fit2(i,k).. rplus(i) - rmin(i) =e= y(i) - a(k,'const.term') - a(k,'slope')*x(i) + s(i,k); model m2 /obj2,fit2,impl1,impl2,sum1,order/; solve m2 minimizing z using mip; * reporting parameter report2(*,*) 'least absolute deviations results'; report2(k,f) = a.l(k,f); report2(k,'points') = sum(i$(delta.l(i,k)>0.5),1); report2(k,'sum(r^2)') = sum(i$(delta.l(i,k)>0.5),sqr(rplus.l(i)-rmin.l(i))); report2(k,'sum|r|') = sum(i$(delta.l(i,k)>0.5),rplus.l(i)+rmin.l(i)); report2('total','sum(r^2)') = sum(i,sqr(rplus.l(i)-rmin.l(i))); report2('total','sum|r|') = sum(i,rplus.l(i)+rmin.l(i)); report2('total','points') = card(i); display report2;

The output looks like:

----    116 PARAMETER report1  least squares results

const.term       slope      points    sum(r^2)      sum|r|

line1     102.897      -1.033      15.000     225.890      48.501
line2      51.738       0.158      14.000     184.930      31.032
line3      -6.663       0.879      11.000     521.717      70.506
total                              40.000     932.536     150.039

----    147 PARAMETER report2  least absolute deviations results

const.term       slope      points    sum(r^2)      sum|r|

line1     104.206      -1.034      14.000     189.051      38.432
line2      51.680       0.159      15.000     245.838      38.770
line3      -6.425       0.950      11.000     553.187      65.961
total                              40.000     988.076     143.163


Here is a plot that shows how the solution converges to the optimum:

As is usual with MIP/MIQP solvers: most of the action is in the beginning.