Consider the following data:
---- 14 PARAMETER x data
i1 17.175, i2 84.327, i3 55.038, i4 30.114, i5 29.221, i6 22.405, i7 34.983, i8 85.627
i9 6.711, i10 50.021, i11 99.812, i12 57.873, i13 99.113, i14 76.225, i15 13.069, i16 63.972
i17 15.952, i18 25.008, i19 66.893, i20 43.536, i21 35.970, i22 35.144, i23 13.149, i24 15.010
i25 58.911, i26 83.089, i27 23.082, i28 66.573, i29 77.586, i30 30.366, i31 11.049, i32 50.238
i33 16.017, i34 87.246, i35 26.511, i36 28.581, i37 59.396, i38 72.272, i39 62.825, i40 46.380
i41 41.331, i42 11.770, i43 31.421, i44 4.655, i45 33.855, i46 18.210, i47 64.573, i48 56.075
i49 76.996, i50 29.781
---- 14 PARAMETER y data
i1 32.320, i2 238.299, i3 87.081, i4 17.825, i5 -10.154, i6 -35.813, i7 14.506
i8 204.903, i9 -103.014, i10 79.521, i11 -153.679, i12 -47.971, i13 277.483, i14 160.084
i15 28.687, i16 133.650, i17 -63.820, i18 4.028, i19 145.434, i20 48.326, i21 49.295
i22 35.478, i23 -58.602, i24 34.463, i25 -50.666, i26 -115.522, i27 -19.540, i28 134.829
i29 198.074, i30 -7.351, i31 40.786, i32 82.107, i33 27.661, i34 -82.669, i35 5.081
i36 -1.278, i37 118.345, i38 172.905, i39 -57.943, i40 56.981, i41 -45.248, i42 34.517
i43 11.780, i44 -98.506, i45 -1.064, i46 33.323, i47 139.050, i48 -35.595, i49 -102.580
i50 3.912
When we plot this, we see:
This gives a clear impression that we actually have two different regression problems here. As we only have one data set, let's see if we can develop an optimization model that can assign points \((\color{darkblue}x_i,\color{darkblue}y_i)\) to one of the regression lines, and find the least-squares fit for both lines at the same time.
For this, we introduce a binary variable \[\color{darkred}\delta_i = \begin{cases} 1 & \text{if point $i$ belongs to regression line 1} \\ 0 & \text{if point $i$ belongs to regression line 2}\end{cases}\]
A first model to perform this combined task can look like this:
Non-convex MIQCP model A |
\[\begin{align}\min& \sum_i \left(\color{darkred}r_{1,i}^2 + \color{darkred}r_{2,i}^2 \right)\\ & \color{darkred}r_{1,i} = \left(\color{darkblue}y_i - \color{darkred}\alpha_0 - \color{darkred}\alpha_1 \cdot \color{darkblue}x_i \right)\cdot \color{darkred}\delta_i \\& \color{darkred}r_{2,i} = \left(\color{darkblue}y_i - \color{darkred}\beta_0 - \color{darkred}\beta_1 \cdot \color{darkblue}x_i \right)\cdot(1-\color{darkred}\delta_i) \\ & \color{darkred}\delta_i \in \{0,1\}\end{align} \] |
Here we set the residual to zero if the point belongs to the other regression line. For all models here, variables without restrictions are supposed to be free (i.e. their bounds are minus infinity and plus infinity). Warning: many solvers make variables non-negative unless otherwise instructed.
A somewhat more compact formulation is:
Non-convex MIQCP model B |
---|
\[\begin{align}\min& \sum_i \color{darkred}r_i^2 \\ & \color{darkred}r_i = \color{darkblue}y_i - \left[\color{darkred}\alpha_0\cdot \color{darkred}\delta_i + \color{darkred}\beta_0\cdot(1-\color{darkred}\delta_i)\right] - \left[\color{darkred}\alpha_1\cdot \color{darkred}\delta_i+ \color{darkred}\beta_1\cdot(1-\color{darkred}\delta_i)\right] \cdot \color{darkblue}x_i \\ & \color{darkred}\delta_i \in \{0,1\}\end{align} \] |
Although fewer variables and constraints, this model may be more difficult to solve.
Instead of using the \(\color{darkred}\delta\) variables in the constraints, we can also move them to the objective:
Non-convex MINLP model C |
---|
\[\begin{align}\min& \sum_i \left( \color{darkred}r_{i,1}^2\cdot\color{darkred}\delta_i +\color{darkred}r_{2,1}^2\cdot (1-\color{darkred}\delta_i) \right)\\ & \color{darkred}r_{1,i} = \color{darkblue}y_i - \color{darkred}\alpha_0 - \color{darkred}\alpha_1\cdot \color{darkblue}x_i\\ & \color{darkred}r_{2,i} = \color{darkblue}y_i - \color{darkred}\beta_0 - \color{darkred}\beta_1\cdot \color{darkblue}x_i \\ & \color{darkred}\delta_i \in \{0,1\}\end{align} \] |
This MINLP model is no longer quadratic.
We can easily make this model convex by using linear constraints. If we can use indicator constraints (depends on the solver), we can write:
Convex MIQP model D |
---|
\[\begin{align}\min& \sum_i \left( \color{darkred}r_{i,1}^2 +\color{darkred}r_{2,1}^2 \right)\\ & \color{darkred}\delta_i=1 \Rightarrow\color{darkred}r_{1,i} = \color{darkblue}y_i - \color{darkred}\alpha_0 - \color{darkred}\alpha_1\cdot \color{darkblue}x_i\\ &\color{darkred}\delta_i=0 \Rightarrow\color{darkred}r_{2,i} = \color{darkblue}y_i - \color{darkred}\beta_0 - \color{darkred}\beta_1\cdot \color{darkblue}x_i \\ & \color{darkred}\delta_i \in \{0,1\}\end{align} \] |
We can make those model a little bit less nonlinear as follows:
Convex MIQP model E |
---|
\[\begin{align}\min& \sum_i \color{darkred}r_i^2 \\ & \color{darkred}\delta_i=1 \Rightarrow\color{darkred}r_i = \color{darkblue}y_i - \color{darkred}\alpha_0 - \color{darkred}\alpha_1\cdot \color{darkblue}x_i\\ &\color{darkred}\delta_i=0 \Rightarrow\color{darkred}r_i = \color{darkblue}y_i - \color{darkred}\beta_0 - \color{darkred}\beta_1\cdot \color{darkblue}x_i \\ & \color{darkred}\delta_i \in \{0,1\}\end{align} \] |
Let's do a big-M formulation:
Convex MIQP model F |
---|
\[\begin{align}\min& \sum_i \color{darkred}r_i^2 \\ &\color{darkred}r_i = \color{darkblue}y_i - \color{darkred}\alpha_0 - \color{darkred}\alpha_1\cdot \color{darkblue}x_i +\color{darkred}s_{1,i}\\ &\color{darkred}r_i = \color{darkblue}y_i - \color{darkred}\beta_0 - \color{darkred}\beta_1\cdot \color{darkblue}x_i +\color{darkred}s_{2,i} \\ & -\color{darkblue}M\cdot \color{darkred}\delta_i \le \color{darkred}s_{1,i} \le \color{darkblue}M\cdot \color{darkred}\delta_i\\ & -\color{darkblue}M\cdot (1-\color{darkred}\delta_i )\le \color{darkred}s_{2,i} \le \color{darkblue}M\cdot (1-\color{darkred}\delta_i)\\ & \color{darkred}\delta_i \in \{0,1\}\end{align} \] |
and finally a SOS1 formulation:
Convex MIQP model G |
---|
\[\begin{align}\min& \sum_i \color{darkred}r_i^2 \\ &\color{darkred}r_i = \color{darkblue}y_i - \color{darkred}\alpha_0 - \color{darkred}\alpha_1\cdot \color{darkblue}x_i +\color{darkred}s_{1,i}\\ &\color{darkred}r_i = \color{darkblue}y_i - \color{darkred}\beta_0 - \color{darkred}\beta_1\cdot \color{darkblue}x_i +\color{darkred}s_{2,i} \\ & \color{darkred}s_{1,i}, \color{darkred}s_{2,i} \in \textbf{SOS1}\end{align} \] |
Here the SOS1 construct will make sure that only one of the pair \((\color{darkred}s_{1,i}, \color{darkred}s_{2,i})\) will be non-zero. The advantage of this formulation is that we don't need good or even any bounds.
There are a few formulations I left out, but the point is clear: even a small problem can lead to very many different formulations.
I have not done extensive computational testing, but the big-M formulation seems to be working quite well. Of course, this assumes we have some reasonable bounds (\(-1000\) and \(+1000\) in my example). Some of the other formulations may be more appropriate if you are not sure about bounds.
Results
After solving the model with any of these models, we have the following result:
The color of the points indicates to what regression line the points are assigned.
This problem is called the multiple-line fitting problem. There are other approaches to this problem [2,3,4]. Some of these methods are heuristic in nature and may require tuning some parameters. This may be a statistical problem where MIP/MIQP tools have something to offer. Using a proper formulation we can solve a problem like this within a few seconds to proven optimality and without any tuning parameters.
Extensions: multiple lines
We can easily extend these models in different ways:
- Finding more lines. This is not difficult to add to the models.
Note: the SOS1 formulation may be an exception: we need "1 of K to be zero" while SOS1 can only do: "1 of K can be nonzero".
- Using LAD or Chebychev regression instead of least squares [5,6]. This can help with performance, as we reduce the problem to a linear MIP model. This is also useful if you don't have access to a MIQP solver.
A model for multiple lines and using an LAD regression [5] approach is as follows:
Linear LAD Model |
\[\begin{align} \min\>&\sum_i \color{darkred}r^{+}_i +\color{darkred}r^{-}_i\\ & \color{darkred}r^+_i-\color{darkred}r^-_i = \color{darblue}y_i - \color{darkred}\alpha_{0,k} - \color{darkred}\alpha_{1,k}\cdot\color{darkblue}x_i +\color{darkred}s_{i,k} &&\forall i,k \\ & -\color{darkblue}M \cdot(1-\color{darkred}\delta_{i,k}) \le \color{darkred}s_{i,k} \le \color{darkblue}M \cdot(1-\color{darkred}\delta_{i,k}) && \forall i,k \\ & \sum_k\color{darkred}\delta_{i,k} = 1 && \forall i \\ & \color{darkred}\alpha_{0,k} \le \color{darkred}\alpha_{0,k+1} && \forall k \>\text{except last} \\ & \color{darkred}\delta_{i,k} \in \{0,1\} & \\\color{darkred}r^{+}_i,\color{darkred}r^{-}_i \ge 0\end{align} \] |
The picture below was generated from three different linear models:
This just looks almost random. Let's see what happens.
References
- Fitting two lines to a set of 2d points, https://stackoverflow.com/questions/66349148/fitting-two-lines-to-a-set-of-2d-points
- Yan Guo, A Multiple-Line Fitting Algorithm Without Initialization, https://cse.sc.edu/~songwang/CourseProj/proj2005/report/guo.pdf
- Lara-Alvarez, C., Romero, L. & Gomez, C. Multiple straight-line fitting using a Bayes factor. Adv Data Anal Classif 11, 205–218 (2017)
- Random sample consensus, https://en.wikipedia.org/wiki/Random_sample_consensus
- Linear Programming and LAD regression, https://yetanothermathprogrammingconsultant.blogspot.com/2017/11/lp-and-lad-regression.html
- Linear Programming and Chebyshev Regression, https://yetanothermathprogrammingconsultant.blogspot.com/2017/11/lp-and-chebyshev-regression.html
Appendix: Example GAMS model
$ontext
Multiple-line
regression
Least
squares and least absolute deviations can be used
as
objective.
erwin@amsterdamoptimization.com
$offtext
*---------------------------------------------------------
* data
*---------------------------------------------------------
set
i 'observations' /i1*i40/
k 'line' /line1*line3/
f /'const.term', slope/
;
table data(i,*)
x y
i1
20.202 85.162
i2
0.507 2.103
i3
26.961 55.969
i4
49.985 44.690
i5
15.129 86.515
i6
17.417 79.866
i7
33.064 56.328
i8
31.691 29.422
i9
32.209 64.021
i10 96.398 85.191
i11
99.360 68.235
i12
36.990 57.516
i13
37.289 25.884
i14
77.198 56.157
i15
39.668 58.398
i16
91.310 66.205
i17
11.958 93.742
i18
73.548 28.178
i19 5.542
5.788
i20
57.630 60.830
i21
5.141 53.988
i22
0.601 42.559
i23
40.123 61.928
i24
51.988 42.984
i25
62.888 58.308
i26
22.575 4.414
i27
39.612 67.282
i28 27.601 56.445
i29
15.237 0.218
i30
93.632 11.896
i31
42.266 60.515
i32
13.466 51.721
i33
38.606 65.392
i34
37.463 16.978
i35
26.848 74.588
i36
94.837 -0.803
i37
18.894 60.060
i38
29.751 14.005
i39
7.455 60.066
i40
40.135 62.898
;
parameter x(i),y(i);
x(i) = data(i,'x');
y(i) = data(i,'y');
display x,y;
*---------------------------------------------------------
* model 1: multiple line least squares regression
*---------------------------------------------------------
scalar M 'big-M' /1000/;
free variables
r(i) 'residuals'
s(i,k) 'slack (to turn off a fit equation)'
z 'objective'
a(k,f) 'coefficients to estimate'
;
binary variable delta(i,k) 'assignment of point to
line';
equations
obj
fit(i,k) 'standard linear fit + slack'
impl1 'big-M constraint: delta=1 => s<=0'
impl2 'big-M constraint: delta=1 => s>=0'
sum1 'assignment'
order 'order by constant term'
;
obj.. z =e= sum(i, sqr(r(i)));
fit(i,k).. r(i) =e= y(i) - a(k,'const.term') - a(k,'slope')*x(i) + s(i,k);
impl1(i,k).. s(i,k) =l=
M*(1-delta(i,k));
impl2(i,k).. s(i,k) =g=
-M*(1-delta(i,k));
sum1(i).. sum(k, delta(i,k)) =e= 1;
order(k+1).. a(k,'const.term') =g= a(k+1,'const.term');
option optcr=0,
threads=8, mip=cplex, miqcp=cplex;
model m1 /all/;
solve m1
minimizing z using miqcp;
* reporting
parameter report1(*,*)
'least squares
results';
report1(k,f) = a.l(k,f);
report1(k,'points') = sum(i$(delta.l(i,k)>0.5),1);
report1(k,'sum(r^2)') = sum(i$(delta.l(i,k)>0.5),sqr(r.l(i)));
report1(k,'sum|r|') = sum(i$(delta.l(i,k)>0.5),abs(r.l(i)));
report1('total','sum(r^2)') = sum(i,sqr(r.l(i)));
report1('total','sum|r|') = sum(i,abs(r.l(i)));
report1('total','points') = card(i);
display report1;
*---------------------------------------------------------
* model 2: multiple line LAD regression
*---------------------------------------------------------
positive variables
rplus(i) 'positive residuals'
rmin(i) 'negative residuals'
;
equation
fit2(i,k) 'LAD fit'
obj2 'LAD obj'
;
obj2.. z =e= sum(i, rplus(i)+rmin(i));
fit2(i,k).. rplus(i) - rmin(i) =e=
y(i) - a(k,'const.term') - a(k,'slope')*x(i) + s(i,k);
model m2 /obj2,fit2,impl1,impl2,sum1,order/;
solve m2
minimizing z using mip;
* reporting
parameter report2(*,*)
'least absolute
deviations results';
report2(k,f) = a.l(k,f);
report2(k,'points') = sum(i$(delta.l(i,k)>0.5),1);
report2(k,'sum(r^2)') = sum(i$(delta.l(i,k)>0.5),sqr(rplus.l(i)-rmin.l(i)));
report2(k,'sum|r|') = sum(i$(delta.l(i,k)>0.5),rplus.l(i)+rmin.l(i));
report2('total','sum(r^2)') = sum(i,sqr(rplus.l(i)-rmin.l(i)));
report2('total','sum|r|') = sum(i,rplus.l(i)+rmin.l(i));
report2('total','points') = card(i);
display report2;
|
The output looks like:
---- 116 PARAMETER report1 least squares results
const.term slope points sum(r^2) sum|r|
line1 102.897 -1.033 15.000 225.890 48.501
line2 51.738 0.158 14.000 184.930 31.032
line3 -6.663 0.879 11.000 521.717 70.506
total 40.000 932.536 150.039
---- 147 PARAMETER report2 least absolute deviations results
const.term slope points sum(r^2) sum|r|
line1 104.206 -1.034 14.000 189.051 38.432
line2 51.680 0.159 15.000 245.838 38.770
line3 -6.425 0.950 11.000 553.187 65.961
total 40.000 988.076 143.163
Here is a plot that shows how the solution converges to the optimum:
As is usual with MIP/MIQP solvers: most of the action is in the beginning.
No comments:
Post a Comment