**Total Least Squares** (TLS) is an alternative for OLS (**Ordinary Least Squares**). It is a form of **orthogonal regression** and also deals with the problem of EIV (**Errors-in-Variables**).

The standard OLS model is \[\color{darkblue}y = \color{darkblue}X\color{darkred}\beta + \color{darkred}\varepsilon\] where we minimize the sum-of-squares of the residuals \[\min ||\color{darkred}\varepsilon||_2^2\] We can interpret \(\color{darkred}\varepsilon\) as the error in \(\color{darkblue}y\).

In TLS, we also allow for errors in \(\color{darkblue}X\). The model becomes \[\color{darkblue}y+\color{darkred}\varepsilon=(\color{darkblue}X+\color{darkred}E)\color{darkred}\beta\] Note that we made a sign change in \(\color{darkred}\varepsilon\). This is pure aesthetics: to make the equation more symmetric looking. The objective is specified as \[\min \> ||\left(\color{darkred}\varepsilon \> \color{darkred}E\right)||_F\] i.e. the Frobenius norm of the matrix formed by \(\color{darkred}\varepsilon\) and \(\color{darkred}E\). The Frobenius norm is just \[||A||_F=\sqrt{\sum_{i,j}a_{i,j}^2}\] We can drop the square root from the objective (the solution will remain the same, but we got rid of a non-linear function with a possible problem near zero: the gradient is not defined there). The remaining problem is a non-convex quadratic problem which can be solved with global MINLP solvers such as Baron or with a global quadratic solver like Gurobi.

One property of TLS is that orthogonal distances are measured instead of vertical ones for OLS:

Orthogonal distances for TLS (picture from [2]) |

Usually, linear TLS is solved using Singular Value Decomposition. But, here I want to try the following non-convex model:

Non-convex Quadratic Model |
---|

\[\begin{align}\min&\sum_i \color{darkred}\varepsilon_i^2+ \sum_{i,j}\color{darkred}E_{i,j}^2\\ & \color{darkblue}y_i + \color{darkred}\varepsilon_i = \sum_j \left(\color{darkblue}X_{i,j}+\color{darkred}E_{i,j}\right) \color{darkred}\beta_j && \forall i\end{align}\] |

#### Example

---- 51 PARAMETER data y x1 x2 i1 5.658390 0.497808 3.435376 i2 10.206082 1.545745 9.413754 i3 11.735484 1.653134 9.476257 i4 10.994595 2.886785 9.694459 i5 8.178234 2.353039 10.770926 i6 10.586022 2.768120 5.760277 i7 4.481738 2.063961 2.004087 i8 7.495135 3.542960 3.440644 i9 10.959411 2.174741 6.331947 i10 4.809324 2.802416 1.620228 i11 13.290220 3.406511 10.506181 i12 9.774514 3.560376 5.563656 i13 12.084065 3.403917 8.248807 i14 7.992731 4.481498 2.519736 i15 12.111285 3.996363 7.249193 i16 14.487092 3.970683 9.410644 i17 11.062400 3.734251 7.364956 i18 7.592865 4.753497 4.695553 i19 11.244580 3.445085 6.214768 i20 7.837413 6.782433 6.367412 i21 12.199275 4.144486 7.570009 i22 13.872630 5.454476 10.463741 i23 10.461876 5.057793 4.034925 i24 14.713012 5.672384 7.914547 i25 10.751043 4.185621 7.648005

I tried the non-convex quadratic optimization model using GAMS with different global MINLP and QCP solvers. Most solvers found the optimal solution quite quickly but had enormous problems proving optimality.

---- 83 VARIABLE beta.L parameter to estimate x1 1.282931, x2 0.844328

> tls(Y ~ X1 + X2, data = df)

Error in tls(Y ~ X1 + X2, data = df) :

Using total least squares requires model without intercept.

> tls(Y ~ X1 + X2 - 1, data = df) $coefficient X1 X2 1.2829314 0.8443277 $confidence.interval 2.5% lower bound 97.5% upper bound X1 0.6508868 1.914976 X2 0.5248924 1.163763 $sd.est X1 X2 0.3224776 0.1629802

Good to see we agree on the solution.

- What if we want a constant term? This is often modeled by a data column with ones. Looks to me that there is no error in this column.
- In OLS we may add dummy variables to indicate a structural break. This is encoded by a data column with 0's and 1's. Like the constant term, there is no measurement error involved here.
- These situations could be handled easily in the optimization model. How to handle this with a singular value decomposition-based TLS algorithm is described in [3].
- This is a good benchmark model for non-convex MINLP/QCP solvers: small but very difficult.
- The optimization model confirms my understanding of the underlying mathematical model. Such a model is often more precise and succinct than a verbal description.
- The TLS model measures orthogonal distances. This is dependent on the scale of the variables (columns). One could preprocess the data using a normalization step.
- Statistics is full of interesting optimization problems.

#### References

- Total least squares, https://en.wikipedia.org/wiki/Total_least_squares
- Ivan Markovsky, Sabine Van Huffel,
*Overview of total least squares methods*, Signal Processing, Volume 87, Issue 10, October 2007, Pages 2283-2302, https://eprints.soton.ac.uk/263855/1/tls_overview.pdf - P. de Groen,
*An Introduction to Total Least Squares*, https://arxiv.org/pdf/math/9805076.pdf

#### Appendix 1: GAMS model

$ontext Ivan
Markovsky, Sabine Van Huffel, Overview of total least squares methods,Signal
Processing, Volume 87, Issue 10, October 2007, Pages 2283-2302compare
to R function tls (package tls)$offtext seti 'observations' /i1*i25/j0 'statistical variables' /y,x1*x2/j(j0) 'explanatory variables' /x1*x2/; table data(i,j0)Y X1 X2 i1 5.658390 0.4978076 3.435376 i2 10.206082 1.5457447 9.413754 i3 11.735484 1.6531337 9.476257 i4 10.994595 2.8867848 9.694459 i5 8.178234 2.3530392 10.770926 i6 10.586022 2.7681198 5.760277 i7 4.481738 2.0639606 2.004087 i8 7.495135 3.5429598 3.440644 i9 10.959411 2.1747406 6.331947 i10 4.809324 2.8024155 1.620228 i11 13.290220 3.4065109 10.506181 i12 9.774514 3.5603761 5.563656 i13 12.084065 3.4039173 8.248807 i14 7.992731 4.4814979 2.519736 i15 12.111285 3.9963628 7.249193 i16 14.487092 3.9706833 9.410644 i17 11.062400 3.7342514 7.364956 i18 7.592865 4.7534969 4.695553 i19 11.244580 3.4450848 6.214768 i20 7.837413 6.7824328 6.367412 i21 12.199275 4.1444857 7.570009 i22 13.872630 5.4544764 10.463741 i23 10.461876 5.0577928 4.034925 i24 14.713012 5.6723841 7.914547 i25 10.751043 4.1856209 7.648005 ; option decimals=6;display data;parametery(i) X(i,j) ; y(i) = data(i,'y'); X(i,j) = data(i,j); variablesbeta(j) 'parameter to estimate'epsilon(i) 'error in y'E(i,j) 'error in X'z 'objective variable'; equationsobj 'objective'fit(i) 'equation to fit'; obj.. z =e= sum(i, sqr(epsilon(i))) + sum((i,j), sqr(E(i,j)));fit(i).. y(i) + epsilon(i) =e= sum(j, (X(i,j)+E(i,j))*beta(j));beta.lo(j) = -10; beta.up(j) = 10; epsilon.lo(i) = -10; epsilon.up(i) = 10; E.lo(i,j) = -10; E.up(i,j) = 10; model tls /all/;option optcr=0,
threads=8, reslim=1000, minlp=baron;solve tls using minlp
minimizing z;display beta.l; |

#### Appendix 2: R code

df <- read.table(text=" id Y X1 X2 i1 5.658390 0.4978076 3.435376 i2 10.206082 1.5457447 9.413754 i3 11.735484 1.6531337 9.476257 i4 10.994595 2.8867848 9.694459 i5 8.178234 2.3530392 10.770926 i6 10.586022 2.7681198 5.760277 i7 4.481738 2.0639606 2.004087 i8 7.495135 3.5429598 3.440644 i9 10.959411 2.1747406 6.331947 i10 4.809324 2.8024155 1.620228 i11 13.290220 3.4065109 10.506181 i12 9.774514 3.5603761 5.563656 i13 12.084065 3.4039173 8.248807 i14 7.992731 4.4814979 2.519736 i15 12.111285 3.9963628 7.249193 i16 14.487092 3.9706833 9.410644 i17 11.062400 3.7342514 7.364956 i18 7.592865 4.7534969 4.695553 i19 11.244580 3.4450848 6.214768 i20 7.837413 6.7824328 6.367412 i21 12.199275 4.1444857 7.570009 i22 13.872630 5.4544764 10.463741 i23 10.461876 5.0577928 4.034925 i24 14.713012 5.6723841 7.914547 i25 10.751043 4.1856209 7.648005", header=T) library(tls) tls(Y ~ X1 + X2 - 1, data = df)

I tried the model with some solvers, with the following results:

ReplyDeleteExcel Solver GRG nonlinear: 5 seconds to optimality

Bonmin via OpenSolver: < 1 second to optimality

Couenne via OpenSolver: Finds optimal solution in < 1 second, but then very slow to converge towards proving optimality

Octeract: Finds optimal solution in presolve (< 3.5 seconds), but then very slow to converge towards proving optimality

In each case, I started with all variables set to zero.

Excel/GRG and Bonmin are local solvers, so "optimal" means locally optimal. Proving (global) optimality is surprisingly difficult for this small model.

DeleteThat's why I tried Octeract, which claims "guaranteed global optimal solutions for any complex optimisation problem, turning insights into decisions in no time". After 10 hours, Octeract had bounds on the objective function of 18.95 and 27.84, with little change in the lower bound in the last hour. As you say, surprisingly difficult for such a small problem.

ReplyDelete