Yet Another Math Programming Consultant: Least squares as QP: convexity issues

This is a post about theory vs practice. Theory says: least squares problems are convex. In practice we can encounter cases where this is not the case. The background is a post [1] about a non-negative least squares problem being solved using Cplex's matlab interface (function cplexlsqnonneglin [2]). It fails. We will fix this by modeling: a somewhat non-intuitive reformulation will help us solving this problem.

Model 1

The model this function tries to solve is:
\[\begin{align}\min\>&||Cx-d||_2^2\\&x\ge 0\end{align}\]
Unfortunately we see:

Number of nonzeros in lower triangle of Q = 861
Using Approximate Minimum Degree ordering
Total time for automatic ordering = 0.00 sec. (0.02 ticks)
Summary statistics for factor of Q:
  Rows in Factor            = 42
  Integer space required    = 42
  Total non-zeros in factor = 903
  Total FP ops to factor    = 25585
CPLEX Error  5002: objective is not convex.
QP with an indefinite objective can be solved
to local optimality with optimality target 2,
or to global optimality with optimality target 3.
Presolve time = 0.03 sec. (0.06 ticks)
Barrier time = 0.03 sec. (0.06 ticks)

This looks like a numerical or tolerance issue: least-squares problems should be convex. More on this further down.

Model 2

I often propose a different model:
\[\begin{align}\min\>&||r||_2^2\\&r=Cx-d\\&x\ge 0, r\text{ free}\end{align}\]
This looks like a bad idea: we add variables \(r\) and extra constraints. However this model has a big advantage: the \(Q\) matrix is much more well-behaved. It is basically a (very well-scaled) diagonal matrix. Using this model we see:

Tried aggregator 1 time.
QP Presolve eliminated 1 rows and 1 columns.
Reduced QP has 401 rows, 443 columns, and 17201 nonzeros.
Reduced QP objective Q matrix has 401 nonzeros.
Presolve time = 0.02 sec. (1.21 ticks)
Parallel mode: using up to 8 threads for barrier.
Number of nonzeros in lower triangle of A*A' = 80200
Using Approximate Minimum Degree ordering
Total time for automatic ordering = 0.00 sec. (3.57 ticks)
Summary statistics for Cholesky factor:
  Threads                   = 8
  Rows in Factor            = 401
  Integer space required    = 401
  Total non-zeros in factor = 80601
  Total FP ops to factor    = 21574201
 Itn      Primal Obj        Dual Obj  Prim Inf Upper Inf  Dual Inf          
   0   3.3391791e-01  -3.3391791e-01  9.70e+03  0.00e+00  4.20e+04
   1   9.6533667e+02  -3.0509942e+03  1.21e-12  0.00e+00  1.71e-11
   2   6.4361775e+01  -3.6729243e+02  3.08e-13  0.00e+00  1.71e-11
   3   2.2399862e+01  -6.8231454e+01  1.14e-13  0.00e+00  3.75e-12
   4   6.8012056e+00  -2.0011575e+01  2.45e-13  0.00e+00  1.04e-12
   5   3.3548410e+00  -1.9547176e+00  1.18e-13  0.00e+00  3.55e-13
   6   1.9866256e+00   6.0981384e-01  5.55e-13  0.00e+00  1.86e-13
   7   1.4271894e+00   1.0119284e+00  2.82e-12  0.00e+00  1.15e-13
   8   1.1434804e+00   1.1081026e+00  6.93e-12  0.00e+00  1.09e-13
   9   1.1163905e+00   1.1149752e+00  5.89e-12  0.00e+00  1.14e-13
  10   1.1153877e+00   1.1153509e+00  2.52e-11  0.00e+00  9.71e-14
  11   1.1153611e+00   1.1153602e+00  2.10e-11  0.00e+00  8.69e-14
  12   1.1153604e+00   1.1153604e+00  1.10e-11  0.00e+00  8.96e-14
Barrier time = 0.17 sec. (38.31 ticks)

Total time on 8 threads = 0.17 sec. (38.31 ticks)
QP status(1): optimal
Cplex Time: 0.17sec (det. 38.31 ticks)

Optimal solution found.
Objective :           1.115360

This reformulation is not only useful for Cplex. Some other solvers show the same behavior: the first model is declared to be non-convex while the second model solves just fine. Some other solvers solve both models ok. These solvers add a very tiny value to the diagonal elements of the \(Q\) matrix to make it numerically positive semi-definite [3].

A little background in numerics

The \(Q\) matrix is formed by \(C^TC\). We can show that \(C^TC\) is always positive semi-definite. This follows from: \[x^T(C^TC)x=(Cx)^T(Cx)=y^Ty\ge 0\] However the inner product \(Q=C^TC\) can accumulate some small floating point errors, causing \(Q\) to be no longer positive semi-definite. When I calculate \(Q=C^TC\) and print its eigenvalues I see:

 [1]  6.651381e+03  4.436663e+02  3.117601e+02  2.344391e+02  1.377582e+02  4.842287e+01
 [7]  2.471843e+01  4.970353e+00  4.283370e+00  3.878515e+00  9.313360e-01  4.146773e-01
[13]  2.711321e-01  2.437365e-01  2.270969e-01  8.583981e-02  2.784610e-02  2.047456e-02
[19]  1.670033e-02  8.465492e-03  3.948085e-03  2.352994e-03  1.726044e-03  1.287873e-03
[25]  1.095779e-03  2.341448e-04  1.776231e-04  1.266256e-04  4.064795e-05  2.980187e-05
[31]  1.056841e-05  2.409130e-06  2.039620e-06  5.597181e-07  1.367130e-07  2.376693e-08
[37]  3.535393e-09  1.355988e-11  2.996032e-14 -1.468230e-13 -1.622616e-13 -2.217776e-13

The presence of negative eigenvalues indicate this calculated matrix \(Q\) is indeed not pos def. Another verification is to see if we can form a Cholesky factorization [4]. This will fail if the matrix is not positive definite:

This type of factorization on the \(Q\) matrix is typically used inside Quadratic Programming solvers. Finally, it is noted that it is also possible that \(Q\) is actually positive definite but both tests (eigenvalues and Cholesky decomposition) fail to see this due to floating point errors inside their computations. Of course, for all practical purposes that means the matrix is not numerically positive definite.

In Ordinary Least Squares (OLS) it is well known that solving the normal equations is not very numerically stable. I believe this is related (the same inner product is formed).