## Monday, December 16, 2019

### CVXPY large memory allocation

In [1] a simple regression problem was stated and solved with CVXPY. The number of observations is very large ($$200,000$$), while the number of coefficients to estimate is moderate ($$100$$). The first formulation is simply:

Regression I
\begin{align}\min_{\color{darkred}w}\>& ||\color{darkblue}y-\color{darkblue}X\color{darkred}w||^2_2 \end{align}

In Python code, this can look like:

import cvxpy as cp
import numpy as np

N = 200000
M = 100

X = np.random.normal(0, 1, size=(N, M))
y = np.random.normal(0, 1, size=N)

w = cp.Variable(M)
prob = cp.Problem(cp.Minimize(cp.sum_squares(y - X @ w)))
prob.solve()
print("status:",prob.status)
print("obj:",prob.value)


Unfortunately, this is giving the following runtime error:

[ec2-user@ip-172-30-0-79 etc]$python3 ls0.py Traceback (most recent call last): File "ls0.py", line 12, in <module> prob.solve() File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 289, in solve return solve_func(self, *args, **kwargs) File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 567, in _solve self._construct_chains(solver=solver, gp=gp) File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 510, in _construct_chains raise e File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 501, in _construct_chains self._intermediate_chain.apply(self) File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/chain.py", line 65, in apply problem, inv = r.apply(problem) File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/qp2quad_form/qp2symbolic_qp.py", line 60, in apply return super(Qp2SymbolicQp, self).apply(problem) File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/canonicalization.py", line 58, in apply problem.objective) File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/canonicalization.py", line 96, in canonicalize_tree canon_arg, c = self.canonicalize_tree(arg) File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/canonicalization.py", line 99, in canonicalize_tree canon_expr, c = self.canonicalize_expr(expr, canon_args) File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/canonicalization.py", line 108, in canonicalize_expr return self.canon_methods[type(expr)](expr, args) File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/qp2quad_form/atom_canonicalizers/quad_over_lin_canon.py", line 29, in quad_over_lin_canon return SymbolicQuadForm(t, eye(affine_expr.size)/y, expr), [affine_expr == t] File "/home/ec2-user/.local/lib/python3.7/site-packages/numpy/lib/twodim_base.py", line 201, in eye m = zeros((N, M), dtype=dtype, order=order) MemoryError: Unable to allocate array with shape (200000, 200000) and data type float64 [ec2-user@ip-172-30-0-79 etc]$


This is interesting: CVXPY seems to allocate a $$200,000\times 200,000$$ matrix here. Actually it is an identity matrix! This can be seen from the name eye. This seems to be related to the reformulation into a QP model.

To get this a bit more under control, lets make the size a bit smaller and use a memory profiler. At least we get some information about the memory usage.

[ec2-user@ip-172-30-0-79 etc]$python3 -m memory_profiler ls1.py ----------------------------------------------------------------- OSQP v0.6.0 - Operator Splitting QP Solver (c) Bartolomeo Stellato, Goran Banjac University of Oxford - Stanford University 2019 ----------------------------------------------------------------- problem: variables n = 20100, constraints m = 20000 nnz(P) + nnz(A) = 2040000 settings: linear system solver = qdldl, eps_abs = 1.0e-05, eps_rel = 1.0e-05, eps_prim_inf = 1.0e-04, eps_dual_inf = 1.0e-04, rho = 1.00e-01 (adaptive), sigma = 1.00e-06, alpha = 1.60, max_iter = 10000 check_termination: on (interval 25), scaling: on, scaled_termination: off warm start: on, polish: on, time_limit: off iter objective pri res dua res rho time 1 0.0000e+00 4.38e+00 2.98e+04 1.00e-01 1.03e+00s 50 1.9804e+04 2.13e-09 4.01e-07 1.00e-01 1.65e+00s plsh 1.9804e+04 4.01e-15 1.03e-12 -------- 2.45e+00s status: solved solution polish: successful number of iterations: 50 optimal objective: 19804.0226 run time: 2.45e+00s optimal rho estimate: 1.35e-02 status: optimal obj: 19804.022648294507 Filename: ls1.py Line # Mem usage Increment Line Contents ================================================ 4 50.188 MiB 50.188 MiB @profile 5 def f(): 6 # N = 200000 7 50.188 MiB 0.000 MiB N = 20000 8 50.188 MiB 0.000 MiB M = 100 9 10 50.188 MiB 0.000 MiB np.random.seed(123) 11 65.477 MiB 15.289 MiB X = np.random.normal(0, 1, size=(N, M)) 12 65.707 MiB 0.230 MiB y = np.random.normal(0, 1, size=N) 13 14 65.707 MiB 0.000 MiB w = cp.Variable(M) 15 65.707 MiB 0.000 MiB prob = cp.Problem(cp.Minimize(cp.sum_squares(y - X @ w))) 16 412.086 MiB 346.379 MiB prob.solve(verbose=True) 17 412.086 MiB 0.000 MiB print("status:",prob.status) 18 412.086 MiB 0.000 MiB print("obj:",prob.value)  We see indeed this is solved as QP model (it is solved by the QP solver OSQP), and in the solve statement we have this spike in memory usage. The allocation is measured in MiBs or Mebibytes. A mebibyte is equal to $$2^{20}=1,048,576$$ bytes. It is probably a bad idea to allocate this identity matrix like this as a fully allocated matrix. The better approach would be not to use this matrix at all. A second best approach would be to make this identity matrix a sparse matrix. #### Approach 1: use a conic programming solver This is an easy fix. Just use a solver like ECOS: [ec2-user@ip-172-30-0-79 etc]$ python3 -m memory_profiler ls1.py

ECOS 2.0.7 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS

It     pcost       dcost      gap   pres   dres    k/t    mu     step   sigma     IR    |   BT
0  +0.000e+00  -2.601e-05  +2e+04  5e-01  7e-06  1e+00  1e+04    ---    ---    1  2  - |  -  -
1  -9.312e-01  +2.050e-02  +3e+02  2e-02  1e-07  1e+00  2e+02  0.9829  1e-04   1  2  2 |  0  0
2  -5.192e+00  +2.947e+00  +1e+02  7e-03  5e-08  8e+00  7e+01  0.6186  7e-02   2  3  3 |  0  0
3  +6.518e+02  +8.866e+02  +3e+02  6e-02  4e-07  2e+02  1e+02  0.1800  9e-01   2  2  2 |  0  0
4  +9.646e+02  +9.849e+02  +4e+00  1e-03  8e-09  2e+01  2e+00  0.9826  1e-04   7  7  6 |  0  0
5  +1.130e+03  +1.144e+03  +7e-01  2e-03  1e-09  1e+01  4e-01  0.8836  2e-02   2  3  2 |  0  0
6  +1.493e+03  +1.521e+03  +3e-01  1e-02  4e-10  3e+01  2e-01  0.9890  1e-01   1  1  1 |  0  0
7  +2.057e+03  +2.080e+03  +5e-02  8e-03  9e-11  2e+01  3e-02  0.9271  5e-02   1  2  1 |  0  0
8  +2.134e+03  +2.152e+03  +4e-02  4e-03  4e-11  2e+01  2e-02  0.8136  2e-01   2  2  2 |  0  0
9  +2.244e+03  +2.256e+03  +3e-02  4e-02  2e-11  1e+01  1e-02  0.7107  1e-01   1  1  1 |  0  0
10  +1.995e+03  +2.004e+03  +4e-03  6e-02  2e-11  9e+00  4e-03  0.0005  8e-01   1  1  1 |  0  0
11  +6.609e+02  +6.615e+02  +8e+00  8e-03  2e-12  6e-01  4e+00  0.9890  8e-01   0  0  0 |  0  0
12  +6.293e+02  +6.423e+02  +1e+01  5e-03  2e-12  1e+01  8e+00  0.5325  3e-01   1  1  1 |  0  0
13  +5.873e+02  +7.017e+02  +3e+01  2e-02  3e-12  1e+02  2e+01  0.4675  5e-01   2  3  3 |  0  0
14  +7.549e+02  +1.687e+03  +4e+00  7e-03  4e-12  9e+02  3e+00  0.9890  1e-01   1  1  1 |  0  0
15  +1.292e+03  +1.300e+03  +2e+00  2e-03  3e-12  8e+00  9e-01  0.8813  2e-01   3  2  2 |  0  0
16  +2.141e+03  +2.149e+03  +2e-01  2e-02  2e-12  8e+00  9e-02  0.9890  5e-03   1  1  1 |  0  0
17  +2.819e+03  +2.827e+03  +2e-02  4e-02  4e-12  8e+00  1e-02  0.9052  3e-02   1  1  1 |  0  0
18  +2.757e+03  +2.774e+03  +5e-02  2e-02  2e-12  2e+01  2e-02  0.9746  4e-01   1  2  2 |  0  0
19  +2.615e+03  +2.631e+03  +1e-02  8e-02  2e-12  2e+01  8e-03  0.1350  2e-01   1  1  1 |  0  0
20  +2.002e+03  +2.009e+03  +2e-03  6e-02  3e-13  7e+00  3e-03  0.0029  9e-01   1  1  1 |  0  0
21  +2.884e+03  +2.959e+03  +2e-03  8e-02  3e-12  8e+01  9e-03  0.7080  7e-01   0  0  0 |  0  0
22  +2.861e+03  +2.934e+03  +2e-03  9e-02  3e-12  7e+01  9e-03  0.0007  9e-01   1  1  1 |  0  0
23  +1.847e+03  +1.857e+03  +3e-02  5e-02  2e-13  1e+01  2e-02  0.0026  1e+00   0  0  0 |  0  0
24  +6.545e+02  +6.548e+02  +1e+01  9e-03  2e-12  3e-01  6e+00  0.9890  6e-01   0  0  0 |  0  0
25  +7.003e+02  +7.142e+02  +2e+01  4e-03  2e-12  1e+01  1e+01  0.6078  2e-01   1  1  0 |  0  0
26  +6.232e+02  +7.771e+02  +5e+01  2e-02  3e-12  2e+02  3e+01  0.5706  4e-01   2  3  3 |  0  0
27  +5.916e+02  +1.396e+03  +8e+00  3e-03  2e-12  8e+02  6e+00  0.9890  1e-01   1  1  1 |  0  0
28  +1.468e+03  +1.658e+03  +1e+00  2e-03  3e-12  2e+02  7e-01  0.9890  9e-02   1  1  1 |  0  0
29  +2.046e+03  +2.089e+03  +4e-01  1e-02  4e-13  4e+01  2e-01  0.9890  3e-02   1  1  1 |  0  0
30  +1.720e+03  +1.744e+03  +7e-02  5e-02  3e-14  2e+01  6e-02  0.1873  2e-01   1  1  1 |  0  0
31  +6.708e+02  +6.722e+02  +2e+01  7e-03  2e-12  1e+00  1e+01  0.9890  6e-01   0  0  0 |  0  0
32  +5.985e+02  +6.158e+02  +5e+01  3e-03  9e-13  2e+01  3e+01  0.9260  3e-01   1  1  1 |  0  0
33  +7.045e+02  +9.047e+02  +1e+02  2e-02  2e-12  2e+02  6e+01  0.6591  5e-01   1  2  3 |  0  0
34  +4.246e+02  +8.139e+02  +3e+01  2e-03  1e-13  4e+02  3e+01  0.9890  2e-01   1  1  1 |  0  0
35  +1.140e+03  +1.436e+03  +7e-01  3e-03  7e-13  3e+02  3e+00  0.9600  1e-02   1  1  1 |  0  0
36  +1.597e+03  +1.707e+03  +1e+00  4e-03  3e-14  1e+02  9e-01  0.9890  2e-01   2  2  2 |  0  0
37  +2.184e+03  +2.226e+03  +1e-01  5e-03  1e-12  4e+01  1e-01  0.9890  6e-02   1  2  2 |  0  0
38  +2.540e+03  +2.556e+03  +2e-01  4e-03  2e-12  2e+01  1e-01  0.9890  2e-01   1  1  2 |  0  0
39  +3.055e+03  +3.065e+03  +5e-02  6e-03  2e-12  1e+01  3e-02  0.9890  8e-02   1  2  2 |  0  0
40  +3.450e+03  +3.458e+03  +4e-02  5e-03  2e-12  8e+00  2e-02  0.9890  1e-01   1  1  2 |  0  0
41  +3.932e+03  +3.939e+03  +1e-02  7e-03  2e-12  6e+00  8e-03  0.9890  1e-01   1  1  2 |  0  0
42  +4.347e+03  +4.352e+03  +1e-02  6e-03  2e-12  6e+00  5e-03  0.9890  1e-01   1  1  2 |  0  0
43  +4.796e+03  +4.801e+03  +5e-03  8e-03  2e-12  5e+00  3e-03  0.9890  1e-01   1  1  2 |  0  0
44  +5.206e+03  +5.211e+03  +3e-03  7e-03  2e-12  4e+00  2e-03  0.9890  1e-01   1  1  2 |  0  0
45  +5.633e+03  +5.637e+03  +2e-03  1e-02  2e-12  4e+00  1e-03  0.9890  1e-01   1  1  2 |  0  0
46  +6.028e+03  +6.032e+03  +1e-03  9e-03  2e-12  4e+00  8e-04  0.9890  1e-01   1  1  2 |  0  0
47  +6.430e+03  +6.433e+03  +9e-04  1e-02  2e-12  3e+00  5e-04  0.9890  1e-01   1  1  2 |  0  0
48  +6.811e+03  +6.814e+03  +7e-04  1e-02  2e-12  3e+00  4e-04  0.9890  1e-01   1  1  2 |  0  0
49  +7.188e+03  +7.191e+03  +5e-04  1e-02  2e-12  3e+00  3e-04  0.9890  1e-01   1  1  2 |  0  0
50  +7.551e+03  +7.553e+03  +3e-04  1e-02  2e-12  2e+00  2e-04  0.9890  1e-01   1  1  2 |  0  0
51  +7.908e+03  +7.910e+03  +2e-04  2e-02  2e-12  2e+00  1e-04  0.9890  2e-01   1  1  2 |  0  0
52  +8.251e+03  +8.253e+03  +2e-04  2e-02  2e-12  2e+00  1e-04  0.9890  2e-01   1  1  2 |  0  0
53  +8.586e+03  +8.588e+03  +1e-04  2e-02  2e-12  2e+00  8e-05  0.9890  2e-01   1  1  2 |  0  0
54  +8.911e+03  +8.913e+03  +1e-04  2e-02  2e-12  2e+00  6e-05  0.9890  2e-01   1  1  2 |  0  0
55  +9.228e+03  +9.230e+03  +8e-05  2e-02  2e-12  2e+00  4e-05  0.9890  2e-01   1  1  2 |  0  0
56  +9.534e+03  +9.535e+03  +6e-05  2e-02  2e-12  1e+00  4e-05  0.9890  2e-01   1  1  2 |  0  0
57  +9.779e+03  +9.780e+03  +5e-05  1e-01  2e-12  1e+00  3e-05  0.8357  2e-01   1  1  1 |  0  0
58  +9.766e+03  +9.768e+03  +2e-05  3e-01  2e-12  1e+00  1e-05  0.0001  9e-01   1  1  1 |  0  0
59  +3.518e+03  +3.517e+03  -6e+00  7e+00  9e-13  -5e-01  -3e+00  0.0000  1e+00   1  1  1 |  0  0
Unreliable search direction detected, recovering best iterate (58) and stopping.

Close to PRIMAL INFEASIBLE (within feastol=5.7e-06).
Runtime: 20.262543 seconds.

status: infeasible_inaccurate
obj: None
Filename: ls1.py

Line #    Mem usage    Increment   Line Contents
================================================
4   50.312 MiB   50.312 MiB   @profile
5                             def f():
6                             #    N = 200000
7   50.312 MiB    0.000 MiB       N = 20000
8   50.312 MiB    0.000 MiB       M = 100
9
10   50.312 MiB    0.000 MiB       np.random.seed(123)
11   65.578 MiB   15.266 MiB       X = np.random.normal(0, 1, size=(N, M))
12   65.836 MiB    0.258 MiB       y = np.random.normal(0, 1, size=N)
13
14   65.836 MiB    0.000 MiB       w = cp.Variable(M)
15   65.836 MiB    0.000 MiB       prob = cp.Problem(cp.Minimize(cp.sum_squares(y - X @ w)))
16   76.824 MiB   10.988 MiB       prob.solve(solver=cp.ECOS,verbose=True)
17   76.824 MiB    0.000 MiB       print("status:",prob.status)
18   76.824 MiB    0.000 MiB       print("obj:",prob.value)


The results are mixed. We certainly don't see this crazy memory allocation anymore. It is now a very small 10 MiB. However the solver is not numerically stable enough to solve this problem.

There is another solver that comes with CVXPY that we can try:

[ec2-user@ip-172-30-0-79 etc]python3 -m memory_profiler ls1.py ---------------------------------------------------------------------------- SCS v2.1.1 - Splitting Conic Solver (c) Brendan O'Donoghue, Stanford University, 2012 ---------------------------------------------------------------------------- Lin-sys: sparse-direct, nnz in A = 2000002 eps = 1.00e-04, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00 acceleration_lookback = 0, rho_x = 1.00e-03 Variables n = 101, constraints m = 20002 Cones:soc vars: 20002, soc blks: 1 WARN: aa_init returned NULL, no acceleration applied. Setup time: 7.44e-01s ---------------------------------------------------------------------------- Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s) ---------------------------------------------------------------------------- 0| 1.22e+18 7.87e+18 1.00e+00 -9.64e+18 4.27e+21 4.18e+21 1.92e-02 100| 8.68e+17 7.02e+17 5.09e-01 1.71e+21 5.25e+21 3.54e+21 6.67e-01 200| 8.04e+17 5.33e+17 4.23e-01 2.16e+21 5.33e+21 3.16e+21 1.31e+00 300| 7.37e+17 4.39e+17 3.72e-01 2.39e+21 5.22e+21 2.83e+21 1.96e+00 400| 6.73e+17 3.74e+17 3.35e-01 2.51e+21 5.04e+21 2.53e+21 2.59e+00 500| 6.14e+17 3.23e+17 3.05e-01 2.57e+21 4.83e+21 2.26e+21 3.22e+00 600| 5.60e+17 2.82e+17 2.80e-01 2.59e+21 4.60e+21 2.01e+21 3.85e+00 700| 5.11e+17 2.48e+17 2.57e-01 2.58e+21 4.37e+21 1.79e+21 4.50e+00 800| 4.67e+17 2.20e+17 2.36e-01 2.56e+21 4.14e+21 1.58e+21 5.15e+00 900| 4.27e+17 1.96e+17 2.17e-01 2.52e+21 3.92e+21 1.40e+21 5.79e+00 1000| 3.91e+17 1.74e+17 1.99e-01 2.47e+21 3.70e+21 1.23e+21 6.46e+00 1100| 3.58e+17 1.56e+17 1.81e-01 2.42e+21 3.49e+21 1.07e+21 7.09e+00 1200| 3.28e+17 1.40e+17 1.64e-01 2.36e+21 3.29e+21 9.29e+20 7.72e+00 1300| 3.01e+17 1.26e+17 1.48e-01 2.30e+21 3.10e+21 7.98e+20 8.36e+00 1400| 2.77e+17 1.14e+17 1.31e-01 2.24e+21 2.92e+21 6.78e+20 9.00e+00 1500| 2.55e+17 1.03e+17 1.15e-01 2.18e+21 2.75e+21 5.67e+20 9.65e+00 1600| 2.35e+17 9.31e+16 9.88e-02 2.12e+21 2.59e+21 4.65e+20 1.03e+01 1700| 2.17e+17 8.44e+16 8.25e-02 2.06e+21 2.43e+21 3.71e+20 1.09e+01 1800| 2.00e+17 7.67e+16 6.62e-02 2.01e+21 2.29e+21 2.84e+20 1.16e+01 1900| 1.85e+17 6.98e+16 4.97e-02 1.95e+21 2.15e+21 2.03e+20 1.23e+01 2000| 1.72e+17 6.37e+16 3.30e-02 1.89e+21 2.02e+21 1.29e+20 1.29e+01 2100| 1.59e+17 5.81e+16 1.60e-02 1.84e+21 1.90e+21 5.98e+19 1.36e+01 2200| 8.80e-03 1.43e-01 1.60e-05 1.20e+04 1.20e+04 1.94e-16 1.42e+01 2300| 6.45e-03 1.37e-01 1.62e-05 1.23e+04 1.23e+04 6.83e-17 1.49e+01 2400| 6.25e-03 1.34e-01 1.40e-05 1.26e+04 1.26e+04 2.15e-16 1.55e+01 2500| 6.06e-03 1.30e-01 1.21e-05 1.28e+04 1.28e+04 2.25e-16 1.62e+01 2600| 5.89e-03 1.27e-01 1.04e-05 1.30e+04 1.30e+04 7.78e-17 1.68e+01 2700| 5.71e-03 1.24e-01 8.98e-06 1.33e+04 1.33e+04 8.14e-17 1.75e+01 2800| 5.55e-03 1.21e-01 7.70e-06 1.35e+04 1.35e+04 8.46e-17 1.82e+01 2900| 5.39e-03 1.18e-01 6.56e-06 1.37e+04 1.37e+04 8.78e-17 1.88e+01 3000| 5.24e-03 1.15e-01 5.57e-06 1.39e+04 1.39e+04 9.04e-17 1.95e+01 3100| 5.09e-03 1.12e-01 4.68e-06 1.41e+04 1.41e+04 9.40e-17 2.01e+01 3200| 4.94e-03 1.09e-01 3.90e-06 1.42e+04 1.42e+04 2.91e-16 2.08e+01 3300| 4.80e-03 1.07e-01 3.21e-06 1.44e+04 1.44e+04 9.96e-17 2.14e+01 3400| 4.67e-03 1.04e-01 2.60e-06 1.46e+04 1.46e+04 1.03e-16 2.21e+01 3500| 4.54e-03 1.01e-01 2.05e-06 1.48e+04 1.48e+04 3.18e-16 2.27e+01 3600| 4.41e-03 9.87e-02 1.57e-06 1.49e+04 1.49e+04 1.09e-16 2.33e+01 3700| 4.29e-03 9.62e-02 1.14e-06 1.51e+04 1.51e+04 1.12e-16 2.40e+01 3800| 4.16e-03 9.37e-02 7.64e-07 1.52e+04 1.52e+04 1.14e-16 2.47e+01 3900| 4.05e-03 9.12e-02 4.28e-07 1.54e+04 1.54e+04 1.17e-16 2.53e+01 4000| 3.93e-03 8.88e-02 1.30e-07 1.55e+04 1.55e+04 1.20e-16 2.60e+01 4100| 3.82e-03 8.65e-02 1.34e-07 1.57e+04 1.57e+04 1.23e-16 2.66e+01 4200| 3.72e-03 8.42e-02 3.67e-07 1.58e+04 1.58e+04 1.25e-16 2.73e+01 4300| 3.61e-03 8.20e-02 5.72e-07 1.59e+04 1.59e+04 1.28e-16 2.80e+01 4400| 3.51e-03 7.98e-02 7.52e-07 1.60e+04 1.60e+04 3.92e-16 2.86e+01 4500| 3.41e-03 7.77e-02 9.11e-07 1.62e+04 1.62e+04 1.33e-16 2.93e+01 4600| 3.32e-03 7.56e-02 1.05e-06 1.63e+04 1.63e+04 1.36e-16 3.00e+01 4700| 3.22e-03 7.35e-02 1.17e-06 1.64e+04 1.64e+04 4.15e-16 3.07e+01 4800| 3.13e-03 7.15e-02 1.27e-06 1.65e+04 1.65e+04 1.41e-16 3.13e+01 4900| 3.04e-03 6.96e-02 1.36e-06 1.66e+04 1.66e+04 4.30e-16 3.20e+01 5000| 2.96e-03 6.77e-02 1.44e-06 1.67e+04 1.67e+04 4.37e-16 3.27e+01 ---------------------------------------------------------------------------- Status: Solved/Inaccurate Hit max_iters, solution may be inaccurate Timing: Solve time: 3.27e+01s Lin-sys: nnz in L factor: 2025055, avg solve time: 5.82e-03s Cones: avg projection time: 3.02e-05s Acceleration: avg step time: 7.51e-07s ---------------------------------------------------------------------------- Error metrics: dist(s, K) = 0.0000e+00, dist(y, K*) = 3.6380e-12, s'y/|s||y| = -6.2180e-16 primal res: |Ax + s - b|_2 / (1 + |b|_2) = 2.9568e-03 dual res: |A'y + c|_2 / (1 + |c|_2) = 6.7689e-02 rel gap: |c'x + b'y| / (1 + |c'x| + |b'y|) = 1.4408e-06 ---------------------------------------------------------------------------- c'x = 16701.6612, -b'y = 16701.6131 ============================================================================ status: optimal_inaccurate obj: 16701.661202598487 Filename: ls1.py Line # Mem usage Increment Line Contents ================================================ 4 50.438 MiB 50.438 MiB @profile 5 def f(): 6 # N = 200000 7 50.438 MiB 0.000 MiB N = 20000 8 50.438 MiB 0.000 MiB M = 100 9 10 50.438 MiB 0.000 MiB np.random.seed(123) 11 65.699 MiB 15.262 MiB X = np.random.normal(0, 1, size=(N, M)) 12 65.957 MiB 0.258 MiB y = np.random.normal(0, 1, size=N) 13 14 65.957 MiB 0.000 MiB w = cp.Variable(M) 15 65.957 MiB 0.000 MiB prob = cp.Problem(cp.Minimize(cp.sum_squares(y - X @ w))) 16 76.953 MiB 10.996 MiB prob.solve(solver=cp.SCS,verbose=True) 17 76.953 MiB 0.000 MiB print("status:",prob.status) 18 76.953 MiB 0.000 MiB print("obj:",prob.value)  That solver also has problems. In this case we ran out of iterations. We can allow for more iterations by adding the argument max_iters=nnnn to the solve statement. SCS is using a first order algorithm. These algorithms typically use a very large number of iterations. #### Approach 2: use a different formulation We can use a different formulation: Regression II \begin{align}\min_{\color{darkred}w,\color{darkred}r}\>& ||\color{darkred}r||^2_2 \\ & \color{darkred}r = \color{darkblue}y - \color{darkblue}X\color{darkred}w \end{align} Here we add a bunch of (free) variables $$r$$ for the residuals, and also a bunch of linear constraints. The payback is that we have a much simpler quadratic objective. In addition it is 100% positive (semi-) definite. A reasonable implementation can look like: import cvxpy as cp import numpy as np # N = 200000 N = 20000 M = 100 X = np.random.normal(0, 1, size=(N,M)) y = np.random.normal(0, 1, size=(N,1)) w = cp.Variable((M,1)) r = cp.Variable((N,1)) prob = cp.Problem(cp.Minimize(r.T @ r), [r == y - X @ w]) prob.solve(solver=cp.OSQP,verbose=True) print("status:",prob.status) print("obj:",prob.value)  This should work, but it doesn't. We see: [ec2-user@ip-172-30-0-79 etc] python3 ls2a.py
Traceback (most recent call last):
File "ls2a.py", line 14, in <module>
prob.solve(solver=cp.OSQP,verbose=True)
File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 289, in solve
return solve_func(self, *args, **kwargs)
File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 567, in _solve
self._construct_chains(solver=solver, gp=gp)
File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 510, in _construct_chains
raise e
File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/problems/problem.py", line 499, in _construct_chains
construct_intermediate_chain(self, candidate_solvers, gp=gp)
File "/home/ec2-user/.local/lib/python3.7/site-packages/cvxpy/reductions/solvers/intermediate_chain.py", line 70, in construct_intermediate_chain
raise DCPError("Problem does not follow DCP rules. Specifically:\n" + append)
cvxpy.error.DCPError: Problem does not follow DCP rules. Specifically:
The objective is not DCP. Its following subexpressions are not:
var1.T * var1


Bummer. This is a perfectly convex problem!

Some alternative formulations to generate a QP do not help:

• cp.sum_squares(r) yields a large allocation when forming a QP
• sum([r[i]**2 for i in range(N)] is very slow and very memory hungry
• quad_form with an identity matrix is not efficient

So I had no luck in being able to generate a QP problem for OSQP without very large memory allocations.

This did not work out so well. I am not really able to solve relatively large, but easy least squares problems using standard formulations.

#### Conclusion

• CVXPY thinks $$r^Tr = \sum_i r_i^2$$ is not convex.
• We cannot always generate large problems for the OSQP QP solver due to large dense memory allocations in CVXPY. I probably would consider this a bug: it is never a good idea to physically create and allocate a large dense identity matrix in any somewhat serious code.
• CVXPY can generate large instances for the conic solvers ECOS and SCS. But they have some troubles solving this problem.
• A commercial solver like Cplex has no problem with this; 2 iterations, 0.2 seconds.
• We can conclude that CVXPY (and its collection of standard solvers) is best suited for smaller and medium-sized problems. When tackling large problems you may hit some issues. Of course, many interesting and relevant problems are not that big and can be solved conveniently using CVXPY.

#### References

1. Least squares problem run out of memory, https://stackoverflow.com/questions/59315300/least-squares-problem-run-out-of-memory-in-cvxpy

1. Interestingly, the problem gets solved if you call ECOS directly (ecos requires G and h such that h-Gx in K, dim_q is the dimension of the second order cone, and the first element of h-Gx is norm bounded by the other elements (typically it is the last element)):

import ecos
c = np.zeros((M+1,))
c[0] = 1
G = np.zeros((N+1,M+1))
G[0,0] = -1
G[1:, 1:] = X
G = sp.sparse.csc_matrix(G)
h = np.pad(y.flatten(), (1, 0), 'constant')
dims = {'q':[N+1]}
ecos.solve(c,G,h,dims)

1. Thanks. That is an interesting experiment.