Thursday, November 7, 2019

CVXPY matrix style modeling limits

CVXPY[1,6] is a popular Python based modeling tool for convex models. It rigorously checks the model is convex which is very convenient: many convex solvers are thoroughly confused when passing on a non-convex model. This is a bit different from say passing on a non-convex model to a local NLP solver. In that case the solver will accept it and try to find local solutions.

In addition CVXPY provides many high level functions (e.g. for different norms, etc.). and a very compact matrix based modeling paradigm. Although matrix notation can be very powerful, there are limits. When overdoing things, the notation becomes less intuitive.

Below I will try to formulate four models in CVXPY and see how the matrix notation will work for these examples:

  1. Transportation Model. Very simple but nevertheless interesting to look at.
  2. A linearized non-convex quadratic model with binary variables. The matrix notation becomes a bit more cumbersome.
  3. A sparse network problem. This is not very easily expressed in CVXPY. CVXPY does not support sparse variables (only sparse data).
  4. A matrix balancing problem. This model is very difficult to deal with in CVXPY.  
All these models are candidates for CVXPY: the models are linear or convex and they only use vectors and matrices.

Presentation on CVXPY


Presentation by Steve Diamond [5]. It discusses some of underlying ideas of CVXPY[6] and Disciplined Convex Programming [7].

Example: Transportation model


In this section I'll discuss some modeling issues when implementing a simple transportation model in CVXPY, and compare this to a standard GAMS implementation.

As an example consider the standard transportation model.

Transportation Model
\[\begin{align}\min&\sum_{i,j}\color{darkblue}c_{i,j}\color{darkred}x_{i,j}\\ & \sum_i\color{darkred}x_{i,j} \ge \color{darkblue} d_j && \forall j\\ &\sum_j\color{darkred}x_{i,j}\le \color{darkblue} s_i && \forall i \\ & \color{darkred}x_{i,j} \ge 0 \end{align}\]\[\begin{align}\min\>\>&\mathbf{tr}(\color{darkblue}C^T \color{darkred}X) \\ &\color{darkred}X^T \color{darkblue}e \ge \color{darkblue} d \\ &\color{darkred}X \color{darkblue}e \le \color{darkblue}s\\ & \color{darkred}X\ge 0\end{align}\]

Here \(e\) indicates a column vector of ones of appropriate size. tr indicates the trace of a matrix: the sum of the diagonal elements. The model in matrix notation is identical to the equation based model on the left. The matrix based model is compacter, but arguably a bit more difficult to read for most readers (me included).

One important way to help matrix models to be more readable, is to add a summation function. As a matrix model does not have indices, we need other ways to indicate what to sum over. This is expressed as the (optional) axis argument. This means the model above can be expressed in CVXPY as


import numpy as np
import cvxpy as cp

#----- data -------
capacity = np.array([350, 600])
demand = np.array([325, 300, 275])
distance = np.array([[2.5, 1.7, 1.8],
            [2.5, 1.8, 1.4]])
freight = 90
cost = freight*distance/1000

#------ set up LP data --------
C = cost
d = demand
s = capacity

#---- matrix formulation ----

ei = np.ones(s.shape)
ej = np.ones(d.shape)

X = cp.Variable(C.shape,"X")
prob = cp.Problem(
    cp.Minimize(cp.trace(C.T@X)),
    [X.T@ei >= d, 
     X@ej <= s, 
     X>=0])
prob.solve(verbose=True)
print("status:",prob.status)
print("objective:",prob.value)
print("levels:",X.value)

#---- summations ----

prob2 = cp.Problem(
    cp.Minimize(cp.sum(cp.multiply(C,X))),
    [cp.sum(X,axis=0) >= d,
    cp.sum(X,axis=1) <= s,
    X >= 0])
prob2.solve(verbose=True)
print("status:",prob2.status)
print("objective:",prob2.value)
print("levels:",X.value)


The matrix model follows the mathematical model closely. The second model with the summations requires some explanation. In Python * is for elementwise multiplication and @ for matrix multiplication. In CVXPY however, @, * and matmul are used both for matrix multiplication while multiply is for elementwise multiplication. A sum without axis argument sums over all elements, while axis=0 sums over the first index, and axis=1 sums over the second index.

The data for this model is from [2]. The lack of explicit indexing has another consequence. All data is determined by its position. I.e. we need to remember that position 0 means a canning plant in Seattle or a demand market in New York,

I am not so sure about the readability of these two CVXPY models. I think I still prefer the indexed model.

CVXPY uses OSQP [3] as default LP and QP solver. Here is the log:


-----------------------------------------------------------------
           OSQP v0.6.0  -  Operator Splitting QP Solver
              (c) Bartolomeo Stellato,  Goran Banjac
        University of Oxford  -  Stanford University 2019
-----------------------------------------------------------------
problem:  variables n = 6, constraints m = 11
          nnz(P) + nnz(A) = 18
settings: linear system solver = qdldl,
          eps_abs = 1.0e-05, eps_rel = 1.0e-05,
          eps_prim_inf = 1.0e-04, eps_dual_inf = 1.0e-04,
          rho = 1.00e-01 (adaptive),
          sigma = 1.00e-06, alpha = 1.60, max_iter = 10000
          check_termination: on (interval 25),
          scaling: on, scaled_termination: off
          warm start: on, polish: on, time_limit: off

iter   objective    pri res    dua res    rho        time
   1  -2.4113e+00   3.32e+02   7.31e+00   1.00e-01   5.47e-05s
 200   1.5368e+02   2.77e-03   2.44e-07   1.23e-03   4.30e-02s

status:               solved
solution polish:      unsuccessful
number of iterations: 200
optimal objective:    153.6751
run time:             4.31e-02s
optimal rho estimate: 2.60e-03

status: optimal
objective: 153.67514323621972
levels: [[ 3.17321312e+01  3.00004719e+02  2.71849569e-03]
 [ 2.93267895e+02 -2.77023874e-03  2.74995427e+02]]

The levels indicate OSQP is pretty sloppy here: we see some negative values of the order -1e-3. The real optimal objective is 153.675000 (so even though the solution is slightly infeasible, this did not help in achieving a better objective). Sometimes OSQP does better: if the solution polishing step works. This polishing is a bit like a poor man's crossover: it tries to guess the active constraints. In this case polishing did not work.

The number of iterations is large. This is normal: this solver uses a first order algorithm. They typically require a lot of iterations.


For completeness, the original GAMS model looks like:


Set
   i 'canning plants' / seattle,  san-diego /
   j 'markets'        / new-york, chicago, topeka /;

Parameter
   a(i) 'capacity of plant i in cases'
        / seattle    350
          san-diego  600 /

   b(j) 'demand at market j in cases'
        / new-york   325
          chicago    300
          topeka     275 /;

Table d(i,j) 'distance in thousands of miles'
              new-york  chicago  topeka
   seattle         2.5      1.7     1.8
   san-diego       2.5      1.8     1.4;

Scalar f 'freight in dollars per case per thousand miles' / 90 /;

Parameter c(i,j) 'transport cost in thousands of dollars per case';
c(i,j) = f*d(i,j)/1000;

Variable
   x(i,j) 'shipment quantities in cases'
   z      'total transportation costs in thousands of dollars';

Positive Variable x;

Equation
   cost      'define objective function'
   supply(i) 'observe supply limit at plant i'
   demand(j) 'satisfy demand at market j';

cost..      z =e= sum((i,j), c(i,j)*x(i,j));

supply(i).. sum(j, x(i,j)) =l= a(i);

demand(j).. sum(i, x(i,j)) =g= b(j);

Model transport / all /;

solve transport using lp minimizing z;

display x.l, x.m;


The main differences with the CVXPY model are:

  • GAMS indexes by names (set elements), CVXPY uses positions in a matrix or vector
  • GAMS is equation based while CVXPY uses matrices
  • For this model, the CVXPY representation is very compact but depending on the formulation it requires more than casual familiarity with matrix notation. 
  • Arguably, the most important feature of a modeling tool is readability. I have a preference to the GAMS notation here: it is closer to the original mathematical model in a notation I am used to.
  • When printing the results, GAMS is a bit more intuitive:  


GAMS:

----     66 VARIABLE x.L  shipment quantities in cases

             new-york     chicago      topeka

seattle        50.000     300.000
san-diego     275.000                 275.000


Python:
[[ 3.17321312e+01  3.00004719e+02  2.71849569e-03]
 [ 2.93267895e+02 -2.77023874e-03  2.74995427e+02]]

Example: non-convex binary quadratic optimization


Consider the binary quadratic programming problem (in indexed format and in matrix notation):

Binary Quadratic Model
\[\begin{align}\min&\sum_{i,j} \color{darkred}x_i \color{darkblue}q_{i,j}\color{darkred}x_j\\ & \color{darkred}x_i \in \{0,1\} \end{align}\]\[\begin{align}\min\>\>& \color{darkred}x^T\color{darkblue}Q\color{darkred}x \\ &\color{darkred}x \in \{0,1\} \end{align}\]


We don't assume the Q matrix is positive (semi) definite or even symmetric. This makes the problem non-convex. Interestingly, when we feed this problem into solvers like Cplex or Gurobi, they have no problem in finding the optimal solution. The reason is that they apply a trick to make this problem linear.

We can linearize the binary product \(y_{i,j} = x_i x_j\) by \[\begin{align} & y_{i,j} \le x_i \\& y_{i,j} \le x_j \\& y_{i,j} \ge x_i + x_j -1 \\ & x_i, y_{i,j} \in \{0,1\}\end{align}\] If we want we can relax \(y\) to be continuous between 0 and 1.

After applying this linearization, we have:

Linearized Binary Quadratic Model
\[\begin{align}\min&\sum_{i,j} \color{darkblue}q_{i,j}\color{darkred}y_{i,j}\\ & \color{darkred}y_{i,j} \le \color{darkred}x_i\\ & \color{darkred}y_{i,j} \le \color{darkred}x_j\\ & \color{darkred}y_{i,j} \ge \color{darkred}x_i+ \color{darkred}x_j - 1\\ & \color{darkred}x_i,\color{darkred}y_{i,j} \in \{0,1\} \end{align}\]\[\begin{align}\min\>\>& \mathbf{tr}(\color{darkblue}Q^T\color{darkred}Y) \\ & \color{darkred}Y \le \color{darkred}x \cdot\color{darkblue}e^T \\ & \color{darkred}Y \le \color{darkblue}e \cdot\color{darkred}x^T \\& \color{darkred}Y \ge \color{darkred}x \cdot\color{darkblue}e^T + \color{darkblue}e \cdot\color{darkred}x^T - \color{darkblue}e \cdot \color{darkblue}e^T\\ &\color{darkred}x, \color{darkred}Y \in \{0,1\} \end{align}\]

The matrix form of the objective is similar to the one we saw in the section on the transportation problem. This part can be written with a summation. The constraints are a little bit more complicated due to the outer products. I don't think there is a much easier and more intuitive way to express these outer products than with matrix multiplications. If you are not used to these matrix expressions, this may not be so readable.

Although a solver like Cplex and Gurobi can solve the quadratic formulation directly, CVXPY will complain with the message:


cvxpy.error.DCPError: Problem does not follow DCP rules. Specifically:
The objective is not DCP. Its following subexpressions are not:x * [[-6.56505736e+00  6.86533416e+00  1.00750712e+00 -3.97724192e+00
  -4.15575766e+00 -5.51894266e+00 -3.00338992e+00  7.12540694e+00
  -8.65772554e+00  4.21338000e-03]
 [ 9.96235254e+00  1.57466756e+00  9.82266078e+00  5.24500934e+00
  -7.38615034e+00  2.79437518e+00 -6.80964272e+00 -4.99838934e+00
   3.37857218e+00 -1.29287238e+00]
 [-2.80599468e+00 -2.97117264e+00 -7.37016820e+00 -6.99796424e+00
   1.78227300e+00  6.61785624e+00 -5.38368524e+00  3.31468920e+00
   5.51715212e+00 -3.92683046e+00]
 [-7.79015418e+00  4.76973200e-02 -6.79654476e+00  7.44924622e+00
  -4.69770910e+00 -4.28371356e+00  1.87911844e+00  4.45438142e+00
   2.56497354e+00 -7.24042700e-01]
 [-1.73386012e+00 -7.64609286e+00 -3.71575466e+00 -9.06896972e+00
  -3.22899456e+00 -6.35800814e+00  2.91454254e+00  1.21491094e+00
   5.39923440e+00 -4.04388272e+00]
 [ 3.22212522e+00  5.11643348e+00  2.54894998e+00 -4.32271604e+00
  -8.27150752e+00 -7.94970662e+00  2.82502302e+00  9.06189960e-01
  -9.36950296e+00  5.84721284e+00]
 [-8.54466004e+00 -6.48677902e+00  5.12652260e-01  5.00415338e+00
  -6.43752572e+00 -9.31718028e+00  1.70262346e+00  2.42459968e+00
  -2.21276200e+00 -2.82571694e+00]
 [-5.13930766e+00 -5.07156922e+00 -7.38994394e+00  8.66899440e+00
  -2.40124188e+00  5.66800922e+00 -3.99931484e+00 -7.49033556e+00
   4.97748210e+00 -8.61535074e+00]
 [-5.95968886e+00 -9.89868284e+00 -4.60773896e+00 -2.97050000e-03
  -6.97428262e+00 -6.51661090e+00 -3.38724532e+00 -3.66187892e+00
  -3.55826090e+00  9.27953282e+00]
 [ 9.87204410e+00 -2.60193890e+00 -2.54222866e+00  5.43956660e+00
  -2.06631716e+00  8.26192650e+00 -7.60844540e+00  4.70957778e+00
  -8.89163050e+00  1.52599610e+00]] * x


CVXPY is doing rigorous checks to make sure the problem is convex, and refuses models that are found to be non-convex.

The linearized formulation can look like:


import numpy as np
import cvxpy as cp


# --------  data ---------


Q = np.array([
 [-6.56505736,  6.86533416,  1.00750712, -3.97724192, -4.15575766, -5.51894266, -3.00338992,  7.12540694, -8.65772554,  0.00421338],
 [ 9.96235254,  1.57466756,  9.82266078,  5.24500934, -7.38615034,  2.79437518, -6.80964272, -4.99838934,  3.37857218, -1.29287238],
 [-2.80599468, -2.97117264, -7.3701682 , -6.99796424,  1.782273  ,  6.61785624, -5.38368524,  3.3146892 ,  5.51715212, -3.92683046],
 [-7.79015418,  0.04769732, -6.79654476,  7.44924622, -4.6977091 , -4.28371356,  1.87911844,  4.45438142,  2.56497354, -0.7240427 ],
 [-1.73386012, -7.64609286, -3.71575466, -9.06896972, -3.22899456, -6.35800814,  2.91454254,  1.21491094,  5.3992344 , -4.04388272],
 [ 3.22212522,  5.11643348,  2.54894998, -4.32271604, -8.27150752, -7.94970662,  2.82502302,  0.90618996, -9.36950296,  5.84721284],
 [-8.54466004, -6.48677902,  0.51265226,  5.00415338, -6.43752572, -9.31718028,  1.70262346,  2.42459968, -2.212762  , -2.82571694],
 [-5.13930766, -5.07156922, -7.38994394,  8.6689944 , -2.40124188,  5.66800922, -3.99931484, -7.49033556,  4.9774821 , -8.61535074],
 [-5.95968886, -9.89868284, -4.60773896, -0.0029705 , -6.97428262, -6.5166109 , -3.38724532, -3.66187892, -3.5582609 ,  9.27953282],
 [ 9.8720441 , -2.6019389 , -2.54222866,  5.4395666 , -2.06631716,  8.2619265 , -7.6084454 ,  4.70957778, -8.8916305 ,  1.5259961 ]])


n = Q.shape[0]


# ---- linearized model, matrix format -----

x = cp.Variable((n,1),"x",boolean=True)
Y = cp.Variable((n,n),"Y")
e = np.ones((n,1))


prob = cp.Problem(cp.Minimize(cp.trace(Q.T@Y)),
  [Y <= x@e.T,
   Y <= e@x.T,
   Y >= x@e.T + e@x.T - e@e.T,
   Y >= 0,
   Y <= 1])
prob.solve(solver=cp.GLPK_MI,verbose=True)
print("status:",prob.status)
print("objective:",prob.value)
print("levels:",x.value)


Notes:

  • The objective can be replaced by cp.sum(cp.multiply(Q,Y)),
  • I relaxed the Y variables,
  • We use glpk as the MIP solver,
  • CVXPY comes with an integer solver called ECOS_BB. This solver seems to choke on this problem, 
  • I have declared \(x\) as an explicit \((n \times 1)\)  column vector instead of just a vector of length \(n\). Sometimes CVXPY is picky about this when doing things like matrix multiplication.
  • I was not able to enter this model directly: I has to use a piece of paper first to get from the indexed version to the matrix version. There was an extra manual translation step needed. This step, of course, can easily lead to errors. 

An alternative approach is to convexify the problem. To do this we add a diagonal matrix to our Q matrix (such that it becomes positive definite) and compensate by subtracting the same amount using a linear part of the objective. We use here that for binary \(x_i\) we have \(x_i^2 = x_i\). The steps are:


  1. Make \(Q\) symmetric by: \[Q := 0.5(Q+Q^T)\] This will give the same solution but now we can calculate real valued eigenvalues.
  2. Calculate the smallest (or most negative) eigenvalue \[\lambda_1 := \min_i \lambda_i\]
  3. If \(\lambda_1 < 0\) replace \(Q\) by \[Q  := Q + (|\lambda_1|+\mathit{tol}) I\] and also introduce a linear part in the objective: \[\min\>x^TQx - (|\lambda_1|+\mathit{tol}) \sum_i x_i\]
  4. If \(\lambda_1 \ge 0\) solve the original problem. We may use a tolerance to make sure the smallest eigenvalue is actually \(\lambda_1\ge \mathit{tol}\). 

The Python/CVXPY code can look like:


# ------- convexified model -------------

Q = 0.5*(Q+Q.T)           # make Q symmetric
w, v = np.linalg.eig(Q)   # eigen decomposition
print("Eigenvalues\n",w)
w1 = min(w)               # first eigen value
print("Smallest eigenvalue",w1)
tol = 1.0e-5              # tolerance 
f = 0                     # factor for diagonal perturbation
if w1<tol:                 
  f = -w1 + tol
Q2 = Q + f*np.eye(n)

prob = cp.Problem(cp.Minimize(cp.quad_form(x,Q2)-f*sum(x)),[])
prob.solve(solver=cp.ECOS_BB,verbose=False)
print("##### MIQP formulation")
print("status:",prob.status)
print("objective:",prob.value)
print("levels:\n",x.value)


This model is solved now as a convex MIQP.

The original GAMS model for this problem was as follows:


set i /i1*i10/;
alias(i,j);

parameter q(i,j);
q(i,j) = uniform(-10,10);

binary variable x(i);
variable z;

equation obj;

obj.. z =e= sum((i,j), x(i)*q(i,j)*x(j));

model m /obj/;
option miqcp=cplex,optcr=0;
solve m minimizing z using miqcp;
display z.l,x.l;


Cplex will automatically linearize this model.

CVXPY Sparse Variables


CVXPY has some severe limitations on how variables can look like.

First we cannot use three (or more) dimensional variables. So something like x[i,j,k] is not supported. A declaration like:

X = cp.Variable((n,n,n),"X")

gives:

     ValueError: Expressions of dimension greater than 2 are not supported.

This is a rather severe restriction. Many practical model have variables exceeding 2 dimensions. Of course matrix notation becomes impractical for symbols with more than 2 dimensions. Which is probably the reason why CVXPY ony wants to handle scalars, vectors and matrices.

Furthermore sparse variables are not supported either. Everything is fully allocated. As an example consider the toy model:



n = 100
X = cp.Variable((n,n),"X")
prob = cp.Problem(
    cp.Minimize(0),
    [X[0,0]==1]) 


Solver Log:
problem:  variables n = 10000, constraints m = 1


My hope was that I could just declare a large variable matrix and that CVXPY would only export the used variables to the solver. Instead of the solver seeing a model with one variable, it receives a model with 10,000 variables.


Example: a Sparse Network Model


A linear programming formulation for a max-flow network problem can look like:

Max-flow Sparse Network Model
\[\begin{align}\max\>\>&\color{darkred}f\\ & \sum_{j|\color{darkblue}{\mathit{arc}}(j,i)}\color{darkred}x_{j,i} = \sum_{j|\color{darkblue}{\mathit{arc}}(i,j)}\color{darkred}x_{i,j} + \color{darkred}f\cdot \color{darkblue}b_i && \forall i \\ & 0 \le \color{darkred}x_{i,j} \le \color{darkblue}{\mathit{capacity}_i} && \forall i,j|\color{darkblue}{\mathit{arc}}(i,j) \end{align}\]

Here \(arc(i,j)\) indicates whether link \(i \rightarrow j\) exists. The sparse data vector \(b_i\) is defined by \[b_i = \begin{cases} -1 & \text{if node $i$ is the source node}\\ +1 & \text{if node $i$ is the sink node}\\ 0 & \text{otherwise} \end{cases}\]

This mathematical model translates directly into a GAMS model:


$ontext

   max flow network example

   Data from example in
     Mitsuo Gen, Runwei Cheng, Lin Lin
     Network Models and Optimization: Multiobjective Genetic Algorithm Approach
     Springer, 2008

   Erwin Kalvelagen, Amsterdam Optimization, May 2008

$offtext


sets
   i 'nodes' /node1*node11/
   source(i)   /node1/
   sink(i)    /node11/
;

alias(i,j);

parameter capacity(i,j) /
   node1.node2   60
   node1.node3   60
   node1.node4   60
   node2.node3   30
   node2.node5   40
   node2.node6   30
   node3.node4   30
   node3.node6   50
   node3.node7   30
   node4.node7   40
   node5.node8   60
   node6.node5   20
   node6.node8   30
   node6.node9   40
   node6.node10  30
   node7.node6   20
   node7.node10  40
   node8.node9   30
   node8.node11  60
   node9.node10  30
   node9.node11  50
   node10.node11 50
/;



set arcs(i,j);
arcs(i,j)$capacity(i,j) = yes;
display arcs;

parameter rhs(i);
rhs(source) = -1;
rhs(sink) = 1;

variables
   x(i,j) 'flow along arcs'
   f      'total flow'
;

positive variables x;
x.up(i,j) = capacity(i,j);

equations
   flowbal(i)  'flow balance'
;

flowbal(i)..   sum(arcs(j,i), x(j,i)) - sum(arcs(i,j), x(i,j)) =e= f*rhs(i);

model m /flowbal/;

solve m maximizing f using lp;


The GAMS model exploits that GAMS stores data sparsely. The variables x(i,j) are only allocated when they are used inside the equations. This usage is restricted to cases where arcs(i,j) exist. I.e. the number of variables x(i,j) is 22 instead of  \(11 \times 11\).

As we discussed in the previous section, CVXPY does not support sparse variables like GAMS. So instead of variables x(i,j) we'll use x[k] where k indicates the arc number. CVXPY supports sparse data matrices through scipy.sparse. In the code below we set up a sparse matrix A with entries as follows:

  • A[i,k] = -1 if arc \(k\) represents an outgoing link \(i \rightarrow j\)
  • A[i,k] = +1 if arc \(k\) represents an incoming link \(j \rightarrow i\)

With this we can formulate the model:

Matrix based max-flow model
\[\begin{align}\max\>\>&\color{darkred}f\\ & \color{darkblue}A \cdot \color{darkred}x = \color{darkred}f \cdot \color{darkblue} b \\ & 0 \le \color{darkred}x \le \color{darkblue}{\mathit{capacity}} \end{align}\]

The network logic is hidden in the sparse matrix \(A\). From a modeling point of view, this model is at a too high abstraction level: we have lost all structure we want to be visible and reason about. The CVXPY implementation is as follows:


import numpy as np
import scipy.sparse as sparse
import cvxpy as cp


# ------ data --------
data = {
  'nodes':['A','B','C','D','E','F','G','H','I','J','K'],
  'from':['A','A','A','B','B','B','C','C','C','D','E',
          'F','F','F','F','G','G','H','H','I','I','J'],
  'to':  ['B','C','D','C','E','F','D','F','G','G','H',  
          'E','H','I','J','F','J','I','K','J','K','K'],
  'capacity': [60,60,60,30,40,30,30,50,30,40,60,20,30,40,30,20,40,30,60,30,50,50], 
  'source' : 'A',
  'sink' : 'K'
}

numnodes = len(data['nodes'])
numarcs = len(data['capacity'])

print("Number of nodes: {}".format(numnodes))
print("Number of arcs: {}".format(numarcs))

# ------ lp data --------

# map node name to index
map = dict(zip(data['nodes'],range(numnodes)))

# coefficients
irow = np.zeros(2*numarcs,int)
jcol = np.zeros(2*numarcs,int)
val = np.zeros(2*numarcs)

# arc k: i->j has coefficient -1 in row i, column k
#                             +1        j
for k in range(numarcs):
   i = map[data['from'][k]]
   j = map[data['to'][k]]
   kk = 2*k
   irow[kk] = i
   jcol[kk] = k
   val[kk] = -1
   kk = 2*k+1
   irow[kk] = j
   jcol[kk] = k
   val[kk] = 1

A = sparse.csc_matrix((val,(irow,jcol)))

b = np.zeros(numnodes)
b[map[data['source']]] = -1
b[map[data['sink']]] = 1           
        
cap = data['capacity']
        
# ------ lp model --------
        
x = cp.Variable(numarcs,"x")
f = cp.Variable(1,"f")

prob = cp.Problem(cp.Maximize(f),
     [A@x == f*b, x >= 0, x <= cap])
prob.solve(verbose=True)
print(prob)


With this experiment, we have confirmed that sparse data matrices work just fine with CVXPY.  However, this makes the model not that straightforward anymore. This is more complicated than the corresponding GAMS model. I am able to write down the GAMS immediately, but this CVXPY implementation required some real work (and some thinking).

See [4] for an alternative approach.

Example: Matrix Balancing


In this example we want to estimate the inner part of a matrix subject to row- and column-total constraints. This problem is frequently encountered in economic modeling. An additional constraint is that we want to maintain the sparsity pattern of the matrix. Basically the model is:

Matrix Balancing Model
\[\begin{align}\min\>\>&\mathbf{dist}(\color{darkred}A,\color{darkblue}A^0)\\ & \sum_i\color{darkred}a_{i,j} = \color{darkblue} v_j && \forall j\\ &\sum_j\color{darkred}a_{i,j} = \color{darkblue} u_i && \forall i \\ & \color{darkblue}a^0_{i,j}=0 \Rightarrow\color{darkred}a_{i,j} = 0 \end{align}\]

There are different possibilities for the distance function. E.g. it can be a quadratic function \[\mathbf{dist}(A,A^0) = \sum_{i,j} (a_{i,j}-a^0_{i,j})^2\] or in this case an entropy function \[\mathbf{dist}(A,A^0) =\sum_{i,j} a_{i,j}\log\left(\frac{a_{i,j}}{a^0_{i,j}}\right)\]
Often the implication is enforced by just ignoring or skipping all elements \(a_{i,j}\) where \(a^0_{i,j}=0\). This leads again to a sparse representation of the variables. In GAMS this is quite easy.


$ontext

   Example from

   Using PROC IML to do Matrix Balancing
   Carol Alderman, University of Kansas
   Institute for Public Policy and Business Research
   MidWest SAS Users Group MWSUG 1992

$offtext


sets
   p 'products' /pA*pI/
   s 'salesmen' /s1*s10/
;

table A0(*,*) 'estimated matrix, known totals'

            s1   s2   s3   s4   s5   s6   s7   s8   s9  s10   rowTotal
      pA   230  375  375  100       685  215        50            2029
      pB   330  405  419  175   90  504  515       240  105       2798
      pC   268  225  242        30  790  301   44  100            1998
      pD   595  380  638  275   30  685  605   88  100  160       3566
      pE   340  360  440  200   30  755  475   44  150            2794
      pF   132  190  200            432  130                      1071
      pG   309  330  350  125       612  474        50   50       2305
      pH   365  400  330  150   50  575  600   44  150  110       2747
      pI   210  250  308  125       720  256       100   50       2015

colTotal  2772 2910 3300 1150  240 5760 3526  220  950  495
;

alias (p,i);
alias (s,j);

variables
  A(i,j) 'new values'
  z      'objective (minimized)'
;

equations
   objective
   rowsum(i)
   colsum(j)
;

objective.. z =e= sum((i,j)$A0(i,j), A(i,j)*log(A(i,j)/A0(i,j)));
rowsum(i).. sum(j$A0(i,j), A(i,j)) =e= A0(i,'rowTotal');
colsum(j).. sum(i$A0(i,j), A(i,j)) =e= A0('colTotal',j);

A.L(i,j) = A0(i,j);
A.lo(i,j)$A0(i,j) = 0.0001;

model m /all/;
solve m minimizing z using nlp;

display A.L,z.l;

When solved with the general purpose NLP solver CONOPT, we get the following results:


----     58 VARIABLE A.L  new values

            s1          s2          s3          s4          s5          s6          s7          s8          s9

pA     229.652     374.869     375.099     100.028                 686.238     212.542                  50.572
pB     330.587     406.192     420.492     175.626      94.300     506.575     510.790                 243.545
pC     267.313     224.685     241.809                  31.297     790.595     297.246      44.017     101.037
pD     595.262     380.610     639.416     275.614      31.391     687.580     599.252      88.299     101.341
pE     339.659     360.058     440.341     200.158      31.346     756.750     469.809      44.086     151.793
pF     130.330     187.815     197.821                             427.953     127.080
pG     309.478     330.895     351.165     125.418                 614.984     470.016                  50.727
pH     360.594     395.631     326.596     148.455      51.665     569.947     586.867      43.598     150.111
pI     209.123     249.246     307.260     124.701                 719.378     252.398                 100.874

 +         s10

pB     109.894
pD     167.233
pG      52.318
pH     113.535
pI      52.019


----     58 VARIABLE z.L                   =      -15.769  objective (minimized)


CVXPY does not allow to skip elements like GAMS does. Well, unless we build the whole model in scalar mode: element by element. That is not very attractive so let's try another way. It will be a bit of a struggle. Let's define a (data) matrix D as \[d_{i,j} = \begin{cases} 1 & \text{if $a^0_{i,j} \ne 0$}\\ 0 & \text{if $a^0_{i,j} = 0$}\end{cases}\] This matrix can be used to skip linear terms that are not needed. The objective is another story. CVXPY has the function entr() which is defined by: \(-x\log(x)\). We expand the objective as: \[\sum_{i,j} a_{i,j}\log\left(\frac{a_{i,j}}{a^0_{i,j}}\right) = \sum_{i,j} - \mathbf{entr}(a_{i,j})  - a_{i,j}\log(a^0_{i,j})\] Finally we insert \(d\) to ignore the zero's: \[ \min \sum_{i,j} - d_{i,j} \mathbf{entr}(a_{i,j}) - d_{i,j} a_{i,j}\log(a^0_{i,j}+1-d_{i,j})\] This is quite some gymnastics to shoehorn our model into an acceptable CVXPY format.



import numpy as np
import cvxpy as cp

# --------  data ----------
A0 =  [[ 230 , 375 , 375 , 100 ,   0 , 685 , 215 ,   0 ,  50 ,   0 ],
       [ 330 , 405 , 419 , 175 ,  90 , 504 , 515 ,   0 , 240 , 105 ],
       [ 268 , 225 , 242 ,   0 ,  30 , 790 , 301 ,  44 , 100 ,   0 ],
       [ 595 , 380 , 638 , 275 ,  30 , 685 , 605 ,  88 , 100 , 160 ],
       [ 340 , 360 , 440 , 200 ,  30 , 755 , 475 ,  44 , 150 ,   0 ],
       [ 132 , 190 , 200 ,   0 ,   0 , 432 , 130 ,   0 ,   0 ,   0 ],
       [ 309 , 330 , 350 , 125 ,   0 , 612 , 474 ,   0 ,  50 ,  50 ],
       [ 365 , 400 , 330 , 150 ,  50 , 575 , 600 ,  44 , 150 , 110 ],
       [ 210 , 250 , 308 , 125 ,   0 , 720 , 256 ,   0 , 100 ,  50 ]]

u = [2029,2798,1998,3566,2794,1071,2305,2747,2015]
v = [2772,2910,3300,1150,240,5760,3526,220,950,495]

m = len(u)
n = len(v)

# --------  model ----------

D = np.sign(A0)

Dloga0 = D * np.log(A0+np.ones_like(D)-D)

A = cp.Variable((m,n),"A")

obj = cp.Minimize(cp.sum(cp.multiply(D,-cp.entr(A)) - cp.multiply(A,Dloga0)))
cons = [cp.sum(cp.multiply(D,A),axis=1)==u,
        cp.sum(cp.multiply(D,A),axis=0)==v,
        A >= 0]
prob = cp.Problem(obj,cons)
prob.solve(solver=cp.SCS,verbose=True,max_iters=200000)


Solving this tiny model is very difficult. It is using exponential cones and that is not very robustly implemented in the solvers. With SCS and a lot of iterations, we finally see:



101400| 7.74e-05  9.08e-05  7.16e-04 -1.56e+01 -1.56e+01  3.30e-11  1.14e+02
101500| 7.72e-05  9.10e-05  5.74e-04 -1.54e+01 -1.54e+01  3.30e-11  1.14e+02
101600| 7.69e-05  9.13e-05  5.30e-04 -1.51e+01 -1.51e+01  7.77e-11  1.14e+02
101700| 7.66e-05  9.17e-05  4.02e-04 -1.49e+01 -1.49e+01  5.64e-11  1.14e+02
101800| 7.62e-05  9.22e-05  3.96e-04 -1.46e+01 -1.46e+01  3.30e-11  1.14e+02
101900| 7.59e-05  9.28e-05  2.10e-04 -1.44e+01 -1.44e+01  3.30e-11  1.14e+02
101980| 7.56e-05  9.33e-05  3.35e-05 -1.42e+01 -1.42e+01  1.17e-11  1.14e+02
----------------------------------------------------------------------------
Status: Solved
Timing: Solve time: 1.14e+02s
    Lin-sys: nnz in L factor: 1089, avg solve time: 1.87e-05s
    Cones: avg projection time: 1.06e-03s
    Acceleration: avg step time: 2.62e-07s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 4.8795e-06, dist(y, K*) = 0.0000e+00, s'y/|s||y| = -5.3311e-12
primal res: |Ax + s - b|_2 / (1 + |b|_2) = 7.5554e-05
dual res:   |A'y + c|_2 / (1 + |c|_2) = 9.3272e-05
rel gap:    |c'x + b'y| / (1 + |c'x| + |b'y|) = 3.3548e-05
----------------------------------------------------------------------------
c'x = -14.2093, -b'y = -14.2103
============================================================================

The objective is in the neighborhood of what CONOPT found for the GAMS model (CONOPT required just a few iterations).

This model is not exactly a showcase for CVXPY.

Conclusion


CVXPY can be very convenient to model certain classes of models: convex, structured and easy too write in matrix notation. For other models it is in all likelihood not very suited. We saw some examples where things become really hairy when using CVXPY while the underlying mathematical model is quite simple. CVXPY should really be considered as a special purpose modeling tool and you may want to consider other tools when the model does not really fits CVXPY's matrix philosophy. Also this shows how an equation based modeling tool like AMPL or GAMS is more versatile: we can express many more models in a natural way without much trickery. Furthermore, we saw that pushing matrix notation too far can lead to models that are less readable.

References


  1. CVXPY website, https://www.cvxpy.org/
  2. GAMS transportation model example, https://www.gams.com/products/simple-example/
  3. OQSP first-order QP solver, https://osqp.org/
  4. CVXPY network example, https://www.cvxpy.org/examples/applications/OOCO.html
  5. Convex Optimization in Python with CVXPY | SciPy 2018 | Steven Diamond, https://www.youtube.com/watch?v=kXqu-TqEl7Q
  6. Steven Diamond and Stephen Boyd, CVXPY: A Python-Embedded Modeling Language for Convex Optimization, Journal of Machine Learning Research, 2016, vol 17, no 83, pp 1-5.
  7. Grant M., Boyd S., Ye Y. (2006) Disciplined Convex Programming. In: Liberti L., Maculan N. (eds) Global Optimization. Nonconvex Optimization and Its Applications, vol 84. Springer, Boston, MA, https://web.stanford.edu/~boyd/papers/pdf/disc_cvx_prog.pdf
  8. CVXPY: Efficiently writing constraints for pairwise sums, https://stackoverflow.com/questions/59672262/cvxpy-efficiently-writing-constraints-for-pairwise-sums. Example of difficulty in matrix formulations. This is similar to the binary QP example above.

2 comments:

  1. Thank you for the great tutorial!
    Just wanted to point out that in the newest version of cvxpy (v1.6) the two limitations that you mentioned in this article have been added.
    N-dimensional variables: https://www.cvxpy.org/tutorial/advanced/index.html#n-dimensional-expressions
    Sparsity variables: https://www.cvxpy.org/tutorial/constraints/index.html#sparsity-attribute

    ReplyDelete