Yet Another Math Programming Consultant: CVXPY large memory allocation

In [1] a simple regression problem was stated and solved with CVXPY. The number of observations is very large (\(200,000\)), while the number of coefficients to estimate is moderate (\(100\)). The first formulation is simply:

Regression I
\[\begin{align}\min_{\color{darkred}w}\>& \|\|\color{darkblue}y-\color{darkblue}X\color{darkred}w\|\|^2_2 \end{align}\]

In Python code, this can look like:

import cvxpy as cp
import numpy as np

N = 200000
M = 100

X = np.random.normal(0, 1, size=(N, M))
y = np.random.normal(0, 1, size=N)

w = cp.Variable(M)
prob = cp.Problem(cp.Minimize(cp.sum_squares(y - X @ w)))
prob.solve()
print("status:",prob.status)
print("obj:",prob.value)

Unfortunately, this is giving the following runtime error:

[ec2-user@ip-172-30-0-79 etc]$ python3 ls0.py 
Traceback (most recent call last):
  File "ls0.py", line 12, in <module>
    prob.solve()
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 289, in solve
    return solve_func(self, *args, **kwargs)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 567, in _solve
    self._construct_chains(solver=solver, gp=gp)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 510, in _construct_chains
    raise e
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 501, in _construct_chains
    self._intermediate_chain.apply(self)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/chain.py", line 65, in apply
    problem, inv = r.apply(problem)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/qp2quad_form/qp2symbolic_qp.py", line 60, in apply
    return super(Qp2SymbolicQp, self).apply(problem)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/canonicalization.py", line 58, in apply
    problem.objective)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/canonicalization.py", line 96, in canonicalize_tree
    canon_arg, c = self.canonicalize_tree(arg)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/canonicalization.py", line 99, in canonicalize_tree
    canon_expr, c = self.canonicalize_expr(expr, canon_args)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/canonicalization.py", line 108, in canonicalize_expr
    return self.canon_methods[type(expr)](expr, args)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/qp2quad_form/atom_canonicalizers/quad_over_lin_canon.py", line 29, in quad_over_lin_canon
    return SymbolicQuadForm(t, eye(affine_expr.size)/y, expr), [affine_expr == t]
  File "/home/ec2-user/.local/lib/python3.7/site-packages/numpy/lib/twodim_base.py", line 201, in eye
    m = zeros((N, M), dtype=dtype, order=order)
MemoryError: Unable to allocate array with shape (200000, 200000) and data type float64
[ec2-user@ip-172-30-0-79 etc]$

This is interesting: CVXPY seems to allocate a \(200,000\times 200,000\) matrix here. Actually it is an identity matrix! This can be seen from the name eye. This seems to be related to the reformulation into a QP model.

To get this a bit more under control, lets make the size a bit smaller and use a memory profiler. At least we get some information about the memory usage.

[ec2-user@ip-172-30-0-79 etc]$ python3 -m memory_profiler ls1.py 
-----------------------------------------------------------------
           OSQP v0.6.0  -  Operator Splitting QP Solver
              (c) Bartolomeo Stellato,  Goran Banjac
        University of Oxford  -  Stanford University 2019
-----------------------------------------------------------------
problem:  variables n = 20100, constraints m = 20000
          nnz(P) + nnz(A) = 2040000
settings: linear system solver = qdldl,
          eps_abs = 1.0e-05, eps_rel = 1.0e-05,
          eps_prim_inf = 1.0e-04, eps_dual_inf = 1.0e-04,
          rho = 1.00e-01 (adaptive),
          sigma = 1.00e-06, alpha = 1.60, max_iter = 10000
          check_termination: on (interval 25),
          scaling: on, scaled_termination: off
          warm start: on, polish: on, time_limit: off

iter   objective    pri res    dua res    rho        time
   1   0.0000e+00   4.38e+00   2.98e+04   1.00e-01   1.03e+00s
  50   1.9804e+04   2.13e-09   4.01e-07   1.00e-01   1.65e+00s
plsh   1.9804e+04   4.01e-15   1.03e-12   --------   2.45e+00s

status:               solved
solution polish:      successful
number of iterations: 50
optimal objective:    19804.0226
run time:             2.45e+00s
optimal rho estimate: 1.35e-02

status: optimal
obj: 19804.022648294507
Filename: ls1.py

Line #    Mem usage    Increment   Line Contents
================================================
      4   50.188 MiB   50.188 MiB   @profile
     5                             def f():
     6                             #    N = 200000
     7   50.188 MiB    0.000 MiB       N = 20000
     8   50.188 MiB    0.000 MiB       M = 100
     9                             
    10   50.188 MiB    0.000 MiB       np.random.seed(123)
    11   65.477 MiB   15.289 MiB       X = np.random.normal(0, 1, size=(N, M))
    12   65.707 MiB    0.230 MiB       y = np.random.normal(0, 1, size=N)
    13                             
    14   65.707 MiB    0.000 MiB       w = cp.Variable(M)
    15   65.707 MiB    0.000 MiB       prob = cp.Problem(cp.Minimize(cp.sum_squares(y - X @ w)))
    16  412.086 MiB  346.379 MiB       prob.solve(verbose=True)
    17  412.086 MiB    0.000 MiB       print("status:",prob.status)
    18  412.086 MiB    0.000 MiB       print("obj:",prob.value)

We see indeed this is solved as QP model (it is solved by the QP solver OSQP), and in the solve statement we have this spike in memory usage.

The allocation is measured in MiBs or Mebibytes. A mebibyte is equal to \(2^{20}=1,048,576\) bytes.

It is probably a bad idea to allocate this identity matrix like this as a fully allocated matrix. The better approach would be not to use this matrix at all. A second best approach would be to make this identity matrix a sparse matrix.

Approach 1: use a conic programming solver

This is an easy fix. Just use a solver like ECOS:

[ec2-user@ip-172-30-0-79 etc]$ python3 -m memory_profiler ls1.py 

ECOS 2.0.7 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS

It     pcost       dcost      gap   pres   dres    k/t    mu     step   sigma     IR    |   BT
 0  +0.000e+00  -2.601e-05  +2e+04  5e-01  7e-06  1e+00  1e+04    ---    ---    1  2  - |  -  - 
 1  -9.312e-01  +2.050e-02  +3e+02  2e-02  1e-07  1e+00  2e+02  0.9829  1e-04   1  2  2 |  0  0
 2  -5.192e+00  +2.947e+00  +1e+02  7e-03  5e-08  8e+00  7e+01  0.6186  7e-02   2  3  3 |  0  0
 3  +6.518e+02  +8.866e+02  +3e+02  6e-02  4e-07  2e+02  1e+02  0.1800  9e-01   2  2  2 |  0  0
 4  +9.646e+02  +9.849e+02  +4e+00  1e-03  8e-09  2e+01  2e+00  0.9826  1e-04   7  7  6 |  0  0
 5  +1.130e+03  +1.144e+03  +7e-01  2e-03  1e-09  1e+01  4e-01  0.8836  2e-02   2  3  2 |  0  0
 6  +1.493e+03  +1.521e+03  +3e-01  1e-02  4e-10  3e+01  2e-01  0.9890  1e-01   1  1  1 |  0  0
 7  +2.057e+03  +2.080e+03  +5e-02  8e-03  9e-11  2e+01  3e-02  0.9271  5e-02   1  2  1 |  0  0
 8  +2.134e+03  +2.152e+03  +4e-02  4e-03  4e-11  2e+01  2e-02  0.8136  2e-01   2  2  2 |  0  0
 9  +2.244e+03  +2.256e+03  +3e-02  4e-02  2e-11  1e+01  1e-02  0.7107  1e-01   1  1  1 |  0  0
10  +1.995e+03  +2.004e+03  +4e-03  6e-02  2e-11  9e+00  4e-03  0.0005  8e-01   1  1  1 |  0  0
11  +6.609e+02  +6.615e+02  +8e+00  8e-03  2e-12  6e-01  4e+00  0.9890  8e-01   0  0  0 |  0  0
12  +6.293e+02  +6.423e+02  +1e+01  5e-03  2e-12  1e+01  8e+00  0.5325  3e-01   1  1  1 |  0  0
13  +5.873e+02  +7.017e+02  +3e+01  2e-02  3e-12  1e+02  2e+01  0.4675  5e-01   2  3  3 |  0  0
14  +7.549e+02  +1.687e+03  +4e+00  7e-03  4e-12  9e+02  3e+00  0.9890  1e-01   1  1  1 |  0  0
15  +1.292e+03  +1.300e+03  +2e+00  2e-03  3e-12  8e+00  9e-01  0.8813  2e-01   3  2  2 |  0  0
16  +2.141e+03  +2.149e+03  +2e-01  2e-02  2e-12  8e+00  9e-02  0.9890  5e-03   1  1  1 |  0  0
17  +2.819e+03  +2.827e+03  +2e-02  4e-02  4e-12  8e+00  1e-02  0.9052  3e-02   1  1  1 |  0  0
18  +2.757e+03  +2.774e+03  +5e-02  2e-02  2e-12  2e+01  2e-02  0.9746  4e-01   1  2  2 |  0  0
19  +2.615e+03  +2.631e+03  +1e-02  8e-02  2e-12  2e+01  8e-03  0.1350  2e-01   1  1  1 |  0  0
20  +2.002e+03  +2.009e+03  +2e-03  6e-02  3e-13  7e+00  3e-03  0.0029  9e-01   1  1  1 |  0  0
21  +2.884e+03  +2.959e+03  +2e-03  8e-02  3e-12  8e+01  9e-03  0.7080  7e-01   0  0  0 |  0  0
22  +2.861e+03  +2.934e+03  +2e-03  9e-02  3e-12  7e+01  9e-03  0.0007  9e-01   1  1  1 |  0  0
23  +1.847e+03  +1.857e+03  +3e-02  5e-02  2e-13  1e+01  2e-02  0.0026  1e+00   0  0  0 |  0  0
24  +6.545e+02  +6.548e+02  +1e+01  9e-03  2e-12  3e-01  6e+00  0.9890  6e-01   0  0  0 |  0  0
25  +7.003e+02  +7.142e+02  +2e+01  4e-03  2e-12  1e+01  1e+01  0.6078  2e-01   1  1  0 |  0  0
26  +6.232e+02  +7.771e+02  +5e+01  2e-02  3e-12  2e+02  3e+01  0.5706  4e-01   2  3  3 |  0  0
27  +5.916e+02  +1.396e+03  +8e+00  3e-03  2e-12  8e+02  6e+00  0.9890  1e-01   1  1  1 |  0  0
28  +1.468e+03  +1.658e+03  +1e+00  2e-03  3e-12  2e+02  7e-01  0.9890  9e-02   1  1  1 |  0  0
29  +2.046e+03  +2.089e+03  +4e-01  1e-02  4e-13  4e+01  2e-01  0.9890  3e-02   1  1  1 |  0  0
30  +1.720e+03  +1.744e+03  +7e-02  5e-02  3e-14  2e+01  6e-02  0.1873  2e-01   1  1  1 |  0  0
31  +6.708e+02  +6.722e+02  +2e+01  7e-03  2e-12  1e+00  1e+01  0.9890  6e-01   0  0  0 |  0  0
32  +5.985e+02  +6.158e+02  +5e+01  3e-03  9e-13  2e+01  3e+01  0.9260  3e-01   1  1  1 |  0  0
33  +7.045e+02  +9.047e+02  +1e+02  2e-02  2e-12  2e+02  6e+01  0.6591  5e-01   1  2  3 |  0  0
34  +4.246e+02  +8.139e+02  +3e+01  2e-03  1e-13  4e+02  3e+01  0.9890  2e-01   1  1  1 |  0  0
35  +1.140e+03  +1.436e+03  +7e-01  3e-03  7e-13  3e+02  3e+00  0.9600  1e-02   1  1  1 |  0  0
36  +1.597e+03  +1.707e+03  +1e+00  4e-03  3e-14  1e+02  9e-01  0.9890  2e-01   2  2  2 |  0  0
37  +2.184e+03  +2.226e+03  +1e-01  5e-03  1e-12  4e+01  1e-01  0.9890  6e-02   1  2  2 |  0  0
38  +2.540e+03  +2.556e+03  +2e-01  4e-03  2e-12  2e+01  1e-01  0.9890  2e-01   1  1  2 |  0  0
39  +3.055e+03  +3.065e+03  +5e-02  6e-03  2e-12  1e+01  3e-02  0.9890  8e-02   1  2  2 |  0  0
40  +3.450e+03  +3.458e+03  +4e-02  5e-03  2e-12  8e+00  2e-02  0.9890  1e-01   1  1  2 |  0  0
41  +3.932e+03  +3.939e+03  +1e-02  7e-03  2e-12  6e+00  8e-03  0.9890  1e-01   1  1  2 |  0  0
42  +4.347e+03  +4.352e+03  +1e-02  6e-03  2e-12  6e+00  5e-03  0.9890  1e-01   1  1  2 |  0  0
43  +4.796e+03  +4.801e+03  +5e-03  8e-03  2e-12  5e+00  3e-03  0.9890  1e-01   1  1  2 |  0  0
44  +5.206e+03  +5.211e+03  +3e-03  7e-03  2e-12  4e+00  2e-03  0.9890  1e-01   1  1  2 |  0  0
45  +5.633e+03  +5.637e+03  +2e-03  1e-02  2e-12  4e+00  1e-03  0.9890  1e-01   1  1  2 |  0  0
46  +6.028e+03  +6.032e+03  +1e-03  9e-03  2e-12  4e+00  8e-04  0.9890  1e-01   1  1  2 |  0  0
47  +6.430e+03  +6.433e+03  +9e-04  1e-02  2e-12  3e+00  5e-04  0.9890  1e-01   1  1  2 |  0  0
48  +6.811e+03  +6.814e+03  +7e-04  1e-02  2e-12  3e+00  4e-04  0.9890  1e-01   1  1  2 |  0  0
49  +7.188e+03  +7.191e+03  +5e-04  1e-02  2e-12  3e+00  3e-04  0.9890  1e-01   1  1  2 |  0  0
50  +7.551e+03  +7.553e+03  +3e-04  1e-02  2e-12  2e+00  2e-04  0.9890  1e-01   1  1  2 |  0  0
51  +7.908e+03  +7.910e+03  +2e-04  2e-02  2e-12  2e+00  1e-04  0.9890  2e-01   1  1  2 |  0  0
52  +8.251e+03  +8.253e+03  +2e-04  2e-02  2e-12  2e+00  1e-04  0.9890  2e-01   1  1  2 |  0  0
53  +8.586e+03  +8.588e+03  +1e-04  2e-02  2e-12  2e+00  8e-05  0.9890  2e-01   1  1  2 |  0  0
54  +8.911e+03  +8.913e+03  +1e-04  2e-02  2e-12  2e+00  6e-05  0.9890  2e-01   1  1  2 |  0  0
55  +9.228e+03  +9.230e+03  +8e-05  2e-02  2e-12  2e+00  4e-05  0.9890  2e-01   1  1  2 |  0  0
56  +9.534e+03  +9.535e+03  +6e-05  2e-02  2e-12  1e+00  4e-05  0.9890  2e-01   1  1  2 |  0  0
57  +9.779e+03  +9.780e+03  +5e-05  1e-01  2e-12  1e+00  3e-05  0.8357  2e-01   1  1  1 |  0  0
58  +9.766e+03  +9.768e+03  +2e-05  3e-01  2e-12  1e+00  1e-05  0.0001  9e-01   1  1  1 |  0  0
59  +3.518e+03  +3.517e+03  -6e+00  7e+00  9e-13  -5e-01  -3e+00  0.0000  1e+00   1  1  1 |  0  0
Unreliable search direction detected, recovering best iterate (58) and stopping.

Close to PRIMAL INFEASIBLE (within feastol=5.7e-06).
Runtime: 20.262543 seconds.

status: infeasible_inaccurate
obj: None
Filename: ls1.py

Line #    Mem usage    Increment   Line Contents
================================================
     4   50.312 MiB   50.312 MiB   @profile
     5                             def f():
     6                             #    N = 200000
     7   50.312 MiB    0.000 MiB       N = 20000
     8   50.312 MiB    0.000 MiB       M = 100
     9                             
    10   50.312 MiB    0.000 MiB       np.random.seed(123)
    11   65.578 MiB   15.266 MiB       X = np.random.normal(0, 1, size=(N, M))
    12   65.836 MiB    0.258 MiB       y = np.random.normal(0, 1, size=N)
    13                             
    14   65.836 MiB    0.000 MiB       w = cp.Variable(M)
    15   65.836 MiB    0.000 MiB       prob = cp.Problem(cp.Minimize(cp.sum_squares(y - X @ w)))
    16   76.824 MiB   10.988 MiB       prob.solve(solver=cp.ECOS,verbose=True)
    17   76.824 MiB    0.000 MiB       print("status:",prob.status)
    18   76.824 MiB    0.000 MiB       print("obj:",prob.value)

The results are mixed. We certainly don't see this crazy memory allocation anymore. It is now a very small 10 MiB. However the solver is not numerically stable enough to solve this problem.

There is another solver that comes with CVXPY that we can try:

[ec2-user@ip-172-30-0-79 etc]$ python3 -m memory_profiler ls1.py 
----------------------------------------------------------------------------
SCS v2.1.1 - Splitting Conic Solver
(c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 2000002
eps = 1.00e-04, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback = 0, rho_x = 1.00e-03
Variables n = 101, constraints m = 20002
Cones:soc vars: 20002, soc blks: 1
WARN: aa_init returned NULL, no acceleration applied.
Setup time: 7.44e-01s
----------------------------------------------------------------------------
 Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
     0| 1.22e+18  7.87e+18  1.00e+00 -9.64e+18  4.27e+21  4.18e+21  1.92e-02 
   100| 8.68e+17  7.02e+17  5.09e-01  1.71e+21  5.25e+21  3.54e+21  6.67e-01 
   200| 8.04e+17  5.33e+17  4.23e-01  2.16e+21  5.33e+21  3.16e+21  1.31e+00 
   300| 7.37e+17  4.39e+17  3.72e-01  2.39e+21  5.22e+21  2.83e+21  1.96e+00 
   400| 6.73e+17  3.74e+17  3.35e-01  2.51e+21  5.04e+21  2.53e+21  2.59e+00 
   500| 6.14e+17  3.23e+17  3.05e-01  2.57e+21  4.83e+21  2.26e+21  3.22e+00 
   600| 5.60e+17  2.82e+17  2.80e-01  2.59e+21  4.60e+21  2.01e+21  3.85e+00 
   700| 5.11e+17  2.48e+17  2.57e-01  2.58e+21  4.37e+21  1.79e+21  4.50e+00 
   800| 4.67e+17  2.20e+17  2.36e-01  2.56e+21  4.14e+21  1.58e+21  5.15e+00 
   900| 4.27e+17  1.96e+17  2.17e-01  2.52e+21  3.92e+21  1.40e+21  5.79e+00 
  1000| 3.91e+17  1.74e+17  1.99e-01  2.47e+21  3.70e+21  1.23e+21  6.46e+00 
  1100| 3.58e+17  1.56e+17  1.81e-01  2.42e+21  3.49e+21  1.07e+21  7.09e+00 
  1200| 3.28e+17  1.40e+17  1.64e-01  2.36e+21  3.29e+21  9.29e+20  7.72e+00 
  1300| 3.01e+17  1.26e+17  1.48e-01  2.30e+21  3.10e+21  7.98e+20  8.36e+00 
  1400| 2.77e+17  1.14e+17  1.31e-01  2.24e+21  2.92e+21  6.78e+20  9.00e+00 
  1500| 2.55e+17  1.03e+17  1.15e-01  2.18e+21  2.75e+21  5.67e+20  9.65e+00 
  1600| 2.35e+17  9.31e+16  9.88e-02  2.12e+21  2.59e+21  4.65e+20  1.03e+01 
  1700| 2.17e+17  8.44e+16  8.25e-02  2.06e+21  2.43e+21  3.71e+20  1.09e+01 
  1800| 2.00e+17  7.67e+16  6.62e-02  2.01e+21  2.29e+21  2.84e+20  1.16e+01 
  1900| 1.85e+17  6.98e+16  4.97e-02  1.95e+21  2.15e+21  2.03e+20  1.23e+01 
  2000| 1.72e+17  6.37e+16  3.30e-02  1.89e+21  2.02e+21  1.29e+20  1.29e+01 
  2100| 1.59e+17  5.81e+16  1.60e-02  1.84e+21  1.90e+21  5.98e+19  1.36e+01 
  2200| 8.80e-03  1.43e-01  1.60e-05  1.20e+04  1.20e+04  1.94e-16  1.42e+01 
  2300| 6.45e-03  1.37e-01  1.62e-05  1.23e+04  1.23e+04  6.83e-17  1.49e+01 
  2400| 6.25e-03  1.34e-01  1.40e-05  1.26e+04  1.26e+04  2.15e-16  1.55e+01 
  2500| 6.06e-03  1.30e-01  1.21e-05  1.28e+04  1.28e+04  2.25e-16  1.62e+01 
  2600| 5.89e-03  1.27e-01  1.04e-05  1.30e+04  1.30e+04  7.78e-17  1.68e+01 
  2700| 5.71e-03  1.24e-01  8.98e-06  1.33e+04  1.33e+04  8.14e-17  1.75e+01 
  2800| 5.55e-03  1.21e-01  7.70e-06  1.35e+04  1.35e+04  8.46e-17  1.82e+01 
  2900| 5.39e-03  1.18e-01  6.56e-06  1.37e+04  1.37e+04  8.78e-17  1.88e+01 
  3000| 5.24e-03  1.15e-01  5.57e-06  1.39e+04  1.39e+04  9.04e-17  1.95e+01 
  3100| 5.09e-03  1.12e-01  4.68e-06  1.41e+04  1.41e+04  9.40e-17  2.01e+01 
  3200| 4.94e-03  1.09e-01  3.90e-06  1.42e+04  1.42e+04  2.91e-16  2.08e+01 
  3300| 4.80e-03  1.07e-01  3.21e-06  1.44e+04  1.44e+04  9.96e-17  2.14e+01 
  3400| 4.67e-03  1.04e-01  2.60e-06  1.46e+04  1.46e+04  1.03e-16  2.21e+01 
  3500| 4.54e-03  1.01e-01  2.05e-06  1.48e+04  1.48e+04  3.18e-16  2.27e+01 
  3600| 4.41e-03  9.87e-02  1.57e-06  1.49e+04  1.49e+04  1.09e-16  2.33e+01 
  3700| 4.29e-03  9.62e-02  1.14e-06  1.51e+04  1.51e+04  1.12e-16  2.40e+01 
  3800| 4.16e-03  9.37e-02  7.64e-07  1.52e+04  1.52e+04  1.14e-16  2.47e+01 
  3900| 4.05e-03  9.12e-02  4.28e-07  1.54e+04  1.54e+04  1.17e-16  2.53e+01 
  4000| 3.93e-03  8.88e-02  1.30e-07  1.55e+04  1.55e+04  1.20e-16  2.60e+01 
  4100| 3.82e-03  8.65e-02  1.34e-07  1.57e+04  1.57e+04  1.23e-16  2.66e+01 
  4200| 3.72e-03  8.42e-02  3.67e-07  1.58e+04  1.58e+04  1.25e-16  2.73e+01 
  4300| 3.61e-03  8.20e-02  5.72e-07  1.59e+04  1.59e+04  1.28e-16  2.80e+01 
  4400| 3.51e-03  7.98e-02  7.52e-07  1.60e+04  1.60e+04  3.92e-16  2.86e+01 
  4500| 3.41e-03  7.77e-02  9.11e-07  1.62e+04  1.62e+04  1.33e-16  2.93e+01 
  4600| 3.32e-03  7.56e-02  1.05e-06  1.63e+04  1.63e+04  1.36e-16  3.00e+01 
  4700| 3.22e-03  7.35e-02  1.17e-06  1.64e+04  1.64e+04  4.15e-16  3.07e+01 
  4800| 3.13e-03  7.15e-02  1.27e-06  1.65e+04  1.65e+04  1.41e-16  3.13e+01 
  4900| 3.04e-03  6.96e-02  1.36e-06  1.66e+04  1.66e+04  4.30e-16  3.20e+01 
  5000| 2.96e-03  6.77e-02  1.44e-06  1.67e+04  1.67e+04  4.37e-16  3.27e+01 
----------------------------------------------------------------------------
Status: Solved/Inaccurate
Hit max_iters, solution may be inaccurate
Timing: Solve time: 3.27e+01s
Lin-sys: nnz in L factor: 2025055, avg solve time: 5.82e-03s
Cones: avg projection time: 3.02e-05s
Acceleration: avg step time: 7.51e-07s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 0.0000e+00, dist(y, K*) = 3.6380e-12, s'y/|s||y| = -6.2180e-16
primal res: |Ax + s - b|_2 / (1 + |b|_2) = 2.9568e-03
dual res:   |A'y + c|_2 / (1 + |c|_2) = 6.7689e-02
rel gap:    |c'x + b'y| / (1 + |c'x| + |b'y|) = 1.4408e-06
----------------------------------------------------------------------------
c'x = 16701.6612, -b'y = 16701.6131
============================================================================
status: optimal_inaccurate
obj: 16701.661202598487
Filename: ls1.py

Line #    Mem usage    Increment   Line Contents
================================================
     4   50.438 MiB   50.438 MiB   @profile
     5                             def f():
     6                             #    N = 200000
     7   50.438 MiB    0.000 MiB       N = 20000
     8   50.438 MiB    0.000 MiB       M = 100
     9                             
    10   50.438 MiB    0.000 MiB       np.random.seed(123)
    11   65.699 MiB   15.262 MiB       X = np.random.normal(0, 1, size=(N, M))
    12   65.957 MiB    0.258 MiB       y = np.random.normal(0, 1, size=N)
    13                             
    14   65.957 MiB    0.000 MiB       w = cp.Variable(M)
    15   65.957 MiB    0.000 MiB       prob = cp.Problem(cp.Minimize(cp.sum_squares(y - X @ w)))
    16   76.953 MiB   10.996 MiB       prob.solve(solver=cp.SCS,verbose=True)
    17   76.953 MiB    0.000 MiB       print("status:",prob.status)
    18   76.953 MiB    0.000 MiB       print("obj:",prob.value)

That solver also has problems. In this case we ran out of iterations. We can allow for more iterations by adding the argument max_iters=nnnn to the solve statement. SCS is using a first order algorithm. These algorithms typically use a very large number of iterations.

Approach 2: use a different formulation

We can use a different formulation:

Regression II
\[\begin{align}\min_{\color{darkred}w,\color{darkred}r}\>& \|\|\color{darkred}r\|\|^2_2 \\ & \color{darkred}r = \color{darkblue}y - \color{darkblue}X\color{darkred}w \end{align}\]

Here we add a bunch of (free) variables \(r\) for the residuals, and also a bunch of linear constraints. The payback is that we have a much simpler quadratic objective. In addition it is 100% positive (semi-) definite. A reasonable implementation can look like:

import cvxpy as cp
import numpy as np

# N = 200000
N = 20000
M = 100

X = np.random.normal(0, 1, size=(N,M))
y = np.random.normal(0, 1, size=(N,1))

w = cp.Variable((M,1))
r = cp.Variable((N,1))
prob = cp.Problem(cp.Minimize(r.T @ r), [r == y - X @ w])
prob.solve(solver=cp.OSQP,verbose=True)
print("status:",prob.status)
print("obj:",prob.value)

This should work, but it doesn't. We see:

[ec2-user@ip-172-30-0-79 etc]$ python3 ls2a.py 
Traceback (most recent call last):
  File "ls2a.py", line 14, in <module>
    prob.solve(solver=cp.OSQP,verbose=True)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 289, in solve
    return solve_func(self, *args, **kwargs)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 567, in _solve
    self._construct_chains(solver=solver, gp=gp)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 510, in _construct_chains
    raise e
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 499, in _construct_chains
    construct_intermediate_chain(self, candidate_solvers, gp=gp)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/solvers/intermediate_chain.py", line 70, in construct_intermediate_chain
    raise DCPError("Problem does not follow DCP rules. Specifically:\n" + append)
cvxpy.error.DCPError: Problem does not follow DCP rules. Specifically:
The objective is not DCP. Its following subexpressions are not:
var1.T * var1

Bummer. This is a perfectly convex problem!

Some alternative formulations to generate a QP do not help:

cp.sum_squares(r) yields a large allocation when forming a QP
sum([r[i]**2 for i in range(N)] is very slow and very memory hungry
quad_form with an identity matrix is not efficient

So I had no luck in being able to generate a QP problem for OSQP without very large memory allocations.

This did not work out so well. I am not really able to solve relatively large, but easy least squares problems using standard formulations.

Conclusion

CVXPY thinks \(r^Tr = \sum_i r_i^2\) is not convex.
We cannot always generate large problems for the OSQP QP solver due to large dense memory allocations in CVXPY. I probably would consider this a bug: it is never a good idea to physically create and allocate a large dense identity matrix in any somewhat serious code.
CVXPY can generate large instances for the conic solvers ECOS and SCS. But they have some troubles solving this problem.
A commercial solver like Cplex has no problem with this; 2 iterations, 0.2 seconds.
We can conclude that CVXPY (and its collection of standard solvers) is best suited for smaller and medium-sized problems. When tackling large problems you may hit some issues. Of course, many interesting and relevant problems are not that big and can be solved conveniently using CVXPY.

References

Least squares problem run out of memory, https://stackoverflow.com/questions/59315300/least-squares-problem-run-out-of-memory-in-cvxpy

2 comments:

AnonJanuary 9, 2020 at 2:10 PM
Interestingly, the problem gets solved if you call ECOS directly (ecos requires G and h such that h-Gx in K, dim_q is the dimension of the second order cone, and the first element of h-Gx is norm bounded by the other elements (typically it is the last element)):

import ecos
c = np.zeros((M+1,))
c[0] = 1
G = np.zeros((N+1,M+1))
G[0,0] = -1
G[1:, 1:] = X
G = sp.sparse.csc_matrix(G)
h = np.pad(y.flatten(), (1, 0), 'constant')
dims = {'q':[N+1]}
ecos.solve(c,G,h,dims)

Monday, December 16, 2019

CVXPY large memory allocation

Approach 1: use a conic programming solver

Approach 2: use a different formulation

Conclusion

References

2 comments: