Processing math: 100%

Friday, November 10, 2017

Linear Programming and Chebyshev Regression

The LAD (Least Absolute Deviation) or 1 regression problem (minimize the sum of the absolute values of the residuals) is often discussed in Linear Programming textbooks: it has a few interesting LP formulations [1]. Chebyshev or regression is a little bit less well-known. Here we minimize the maximum (absolute) residual.

minβmaxi|ri|yXβ=r

As in [1],  β are coefficients to estimate and  X,y are data. r are the residuals.
Some obvious and less obvious formulations are:
  • Variable splitting:minzzr+i+rir+iri=yijXi,jβjr+i,ri0 With variable splitting we use two non-negative variables r+iri to describe a value ri that can be positive or negative. We need to make sure that one of them is zero in order for r+i+ri to be equal to the absolute value |ri|. Here we have an interesting case, as we are only sure of this for the particular index i that has the largest absolute deviation (i.e. where |ri|=z). In cases where |ri|<z we actually do not  enforce r+iri=0. Indeed, when looking at the solution I see lots of cases where r+i>0,ri>0. Effectively those r+i,ri have no clear physical interpretation. This is very different from the LAD regression formulation [1] where we require all r+iri=0.
  • Bounding:minzzyijXi,jβjzHere z can be left free or you can make it a non-negative variable (it will be non-negative automatically). Note that there are actually two constraints here: zyijXi,jβj and yijXi,jβjz. This model contains the data twice, leading to a large number of nonzero elements in the LP matrix.
  • Sparse bounding: This model tries to remedy the disadvantage of the standard bounding model by introducing extra free variables d and extra equations:minzzdizdi=yijXi,jβj Note again that zdiz is actually two constraints: zdi and diz.
  • Dual:maxiyi(di+ei)iXi,j(di+ei)=0βji(diei)=1di0,ei0The estimates β can be found to be the duals of equation XT(d+e)=0.
We use the same synthetic data as in [1] with m=5,000 cases, and n=100 coefficients. Some timings with Cplex (default LP method) yield the following results (times are in seconds):

image

Opposed to the LAD regression example in [1], the bounding formulation is very fast here. The dual formulation remains doing very good.
Historical Note
The use of this minimax principle goes back to Euler (1749) [3,4].

Leonhard Euler (1707-1783)
The term Chebyshev regression is related to the Chebyshev distance (a different name for the metric).

Pafnuty Lvovich Chebyshev (1821-1894)


References
  1. Linear Programming and LAD Regression, http://yetanothermathprogrammingconsultant.blogspot.com/2017/11/lp-and-lad-regression.html
  2. A.Giloni, M.Padberg, Alternative Methods of Linear Regression, Mathematical and Computer Modeling 35 (2002), pp.361-374.
  3. H.L. Harter, The method of least squares and some alternatives, Technical Report, Aerospace Research Laboratories, 1972.
  4. L. Euler, Pièce qui a Remporté le Prix de l'Academie Royale des Sciences en 1748, sur les Inegalités de Mouvement de Saturn et de Jupiter. Paris (1749).

No comments:

Post a Comment