In addition CVXPY provides many high level functions (e.g. for different norms, etc.). and a very compact matrix based modeling paradigm. Although matrix notation can be very powerful, there are limits. When overdoing things, the notation becomes less intuitive.
Below I will try to formulate four models in CVXPY and see how the matrix notation will work for these examples:
- Transportation Model. Very simple but nevertheless interesting to look at.
- A linearized non-convex quadratic model with binary variables. The matrix notation becomes a bit more cumbersome.
- A sparse network problem. This is not very easily expressed in CVXPY. CVXPY does not support sparse variables (only sparse data).
- A matrix balancing problem. This model is very difficult to deal with in CVXPY.
Presentation on CVXPY
Presentation by Steve Diamond [5]. It discusses some of underlying ideas of CVXPY[6] and Disciplined Convex Programming [7].
Example: Transportation model
In this section I'll discuss some modeling issues when implementing a simple transportation model in CVXPY, and compare this to a standard GAMS implementation.
As an example consider the standard transportation model.
Transportation Model | |
---|---|
\[\begin{align}\min&\sum_{i,j}\color{darkblue}c_{i,j}\color{darkred}x_{i,j}\\ & \sum_i\color{darkred}x_{i,j} \ge \color{darkblue} d_j && \forall j\\ &\sum_j\color{darkred}x_{i,j}\le \color{darkblue} s_i && \forall i \\ & \color{darkred}x_{i,j} \ge 0 \end{align}\] | \[\begin{align}\min\>\>&\mathbf{tr}(\color{darkblue}C^T \color{darkred}X) \\ &\color{darkred}X^T \color{darkblue}e \ge \color{darkblue} d \\ &\color{darkred}X \color{darkblue}e \le \color{darkblue}s\\ & \color{darkred}X\ge 0\end{align}\] |
Here \(e\) indicates a column vector of ones of appropriate size. tr indicates the trace of a matrix: the sum of the diagonal elements. The model in matrix notation is identical to the equation based model on the left. The matrix based model is compacter, but arguably a bit more difficult to read for most readers (me included).
One important way to help matrix models to be more readable, is to add a summation function. As a matrix model does not have indices, we need other ways to indicate what to sum over. This is expressed as the (optional) axis argument. This means the model above can be expressed in CVXPY as
import numpy as np import cvxpy as cp #----- data ------- capacity = np.array([350, 600]) demand = np.array([325, 300, 275]) distance = np.array([[2.5, 1.7, 1.8], [2.5, 1.8, 1.4]]) freight = 90 cost = freight*distance/1000 #------ set up LP data -------- C = cost d = demand s = capacity #---- matrix formulation ---- ei = np.ones(s.shape) ej = np.ones(d.shape) X = cp.Variable(C.shape,"X") prob = cp.Problem( cp.Minimize(cp.trace(C.T@X)), [X.T@ei >= d, X@ej <= s, X>=0]) prob.solve(verbose=True) print("status:",prob.status) print("objective:",prob.value) print("levels:",X.value) #---- summations ---- prob2 = cp.Problem( cp.Minimize(cp.sum(cp.multiply(C,X))), [cp.sum(X,axis=0) >= d, cp.sum(X,axis=1) <= s, X >= 0]) prob2.solve(verbose=True) print("status:",prob2.status) print("objective:",prob2.value) print("levels:",X.value)
The matrix model follows the mathematical model closely. The second model with the summations requires some explanation. In Python * is for elementwise multiplication and @ for matrix multiplication. In CVXPY however, @, * and matmul are used both for matrix multiplication while multiply is for elementwise multiplication. A sum without axis argument sums over all elements, while axis=0 sums over the first index, and axis=1 sums over the second index.
The data for this model is from [2]. The lack of explicit indexing has another consequence. All data is determined by its position. I.e. we need to remember that position 0 means a canning plant in Seattle or a demand market in New York,
I am not so sure about the readability of these two CVXPY models. I think I still prefer the indexed model.
CVXPY uses OSQP [3] as default LP and QP solver. Here is the log:
----------------------------------------------------------------- OSQP v0.6.0 - Operator Splitting QP Solver (c) Bartolomeo Stellato, Goran Banjac University of Oxford - Stanford University 2019 ----------------------------------------------------------------- problem: variables n = 6, constraints m = 11 nnz(P) + nnz(A) = 18 settings: linear system solver = qdldl, eps_abs = 1.0e-05, eps_rel = 1.0e-05, eps_prim_inf = 1.0e-04, eps_dual_inf = 1.0e-04, rho = 1.00e-01 (adaptive), sigma = 1.00e-06, alpha = 1.60, max_iter = 10000 check_termination: on (interval 25), scaling: on, scaled_termination: off warm start: on, polish: on, time_limit: off iter objective pri res dua res rho time 1 -2.4113e+00 3.32e+02 7.31e+00 1.00e-01 5.47e-05s 200 1.5368e+02 2.77e-03 2.44e-07 1.23e-03 4.30e-02s status: solved solution polish: unsuccessful number of iterations: 200 optimal objective: 153.6751 run time: 4.31e-02s optimal rho estimate: 2.60e-03 status: optimal objective: 153.67514323621972 levels: [[ 3.17321312e+01 3.00004719e+02 2.71849569e-03] [ 2.93267895e+02 -2.77023874e-03 2.74995427e+02]]
The levels indicate OSQP is pretty sloppy here: we see some negative values of the order -1e-3. The real optimal objective is 153.675000 (so even though the solution is slightly infeasible, this did not help in achieving a better objective). Sometimes OSQP does better: if the solution polishing step works. This polishing is a bit like a poor man's crossover: it tries to guess the active constraints. In this case polishing did not work.
The number of iterations is large. This is normal: this solver uses a first order algorithm. They typically require a lot of iterations.
For completeness, the original GAMS model looks like:
Set i 'canning plants' / seattle, san-diego / j 'markets' / new-york, chicago, topeka /; Parameter a(i) 'capacity of plant i in cases' / seattle 350 san-diego 600 / b(j) 'demand at market j in cases' / new-york 325 chicago 300 topeka 275 /; Table d(i,j) 'distance in thousands of miles' new-york chicago topeka seattle 2.5 1.7 1.8 san-diego 2.5 1.8 1.4; Scalar f 'freight in dollars per case per thousand miles' / 90 /; Parameter c(i,j) 'transport cost in thousands of dollars per case'; c(i,j) = f*d(i,j)/1000; Variable x(i,j) 'shipment quantities in cases' z 'total transportation costs in thousands of dollars'; Positive Variable x; Equation cost 'define objective function' supply(i) 'observe supply limit at plant i' demand(j) 'satisfy demand at market j'; cost.. z =e= sum((i,j), c(i,j)*x(i,j)); supply(i).. sum(j, x(i,j)) =l= a(i); demand(j).. sum(i, x(i,j)) =g= b(j); Model transport / all /; solve transport using lp minimizing z; display x.l, x.m;
The main differences with the CVXPY model are:
- GAMS indexes by names (set elements), CVXPY uses positions in a matrix or vector
- GAMS is equation based while CVXPY uses matrices
- For this model, the CVXPY representation is very compact but depending on the formulation it requires more than casual familiarity with matrix notation.
- Arguably, the most important feature of a modeling tool is readability. I have a preference to the GAMS notation here: it is closer to the original mathematical model in a notation I am used to.
- When printing the results, GAMS is a bit more intuitive:
GAMS: ---- 66 VARIABLE x.L shipment quantities in cases new-york chicago topeka seattle 50.000 300.000 san-diego 275.000 275.000 Python: [[ 3.17321312e+01 3.00004719e+02 2.71849569e-03] [ 2.93267895e+02 -2.77023874e-03 2.74995427e+02]]
Example: non-convex binary quadratic optimization
Consider the binary quadratic programming problem (in indexed format and in matrix notation):
Binary Quadratic Model | |
---|---|
\[\begin{align}\min&\sum_{i,j} \color{darkred}x_i \color{darkblue}q_{i,j}\color{darkred}x_j\\ & \color{darkred}x_i \in \{0,1\} \end{align}\] | \[\begin{align}\min\>\>& \color{darkred}x^T\color{darkblue}Q\color{darkred}x \\ &\color{darkred}x \in \{0,1\} \end{align}\] |
We don't assume the Q matrix is positive (semi) definite or even symmetric. This makes the problem non-convex. Interestingly, when we feed this problem into solvers like Cplex or Gurobi, they have no problem in finding the optimal solution. The reason is that they apply a trick to make this problem linear.
We can linearize the binary product \(y_{i,j} = x_i x_j\) by \[\begin{align} & y_{i,j} \le x_i \\& y_{i,j} \le x_j \\& y_{i,j} \ge x_i + x_j -1 \\ & x_i, y_{i,j} \in \{0,1\}\end{align}\] If we want we can relax \(y\) to be continuous between 0 and 1.
After applying this linearization, we have:
Linearized Binary Quadratic Model | |
---|---|
\[\begin{align}\min&\sum_{i,j} \color{darkblue}q_{i,j}\color{darkred}y_{i,j}\\ & \color{darkred}y_{i,j} \le \color{darkred}x_i\\ & \color{darkred}y_{i,j} \le \color{darkred}x_j\\ & \color{darkred}y_{i,j} \ge \color{darkred}x_i+ \color{darkred}x_j - 1\\ & \color{darkred}x_i,\color{darkred}y_{i,j} \in \{0,1\} \end{align}\] | \[\begin{align}\min\>\>& \mathbf{tr}(\color{darkblue}Q^T\color{darkred}Y) \\ & \color{darkred}Y \le \color{darkred}x \cdot\color{darkblue}e^T \\ & \color{darkred}Y \le \color{darkblue}e \cdot\color{darkred}x^T \\& \color{darkred}Y \ge \color{darkred}x \cdot\color{darkblue}e^T + \color{darkblue}e \cdot\color{darkred}x^T - \color{darkblue}e \cdot \color{darkblue}e^T\\ &\color{darkred}x, \color{darkred}Y \in \{0,1\} \end{align}\] |
The matrix form of the objective is similar to the one we saw in the section on the transportation problem. This part can be written with a summation. The constraints are a little bit more complicated due to the outer products. I don't think there is a much easier and more intuitive way to express these outer products than with matrix multiplications. If you are not used to these matrix expressions, this may not be so readable.
Although a solver like Cplex and Gurobi can solve the quadratic formulation directly, CVXPY will complain with the message:
cvxpy.error.DCPError: Problem does not follow DCP rules. Specifically: The objective is not DCP. Its following subexpressions are not:x * [[-6.56505736e+00 6.86533416e+00 1.00750712e+00 -3.97724192e+00 -4.15575766e+00 -5.51894266e+00 -3.00338992e+00 7.12540694e+00 -8.65772554e+00 4.21338000e-03] [ 9.96235254e+00 1.57466756e+00 9.82266078e+00 5.24500934e+00 -7.38615034e+00 2.79437518e+00 -6.80964272e+00 -4.99838934e+00 3.37857218e+00 -1.29287238e+00] [-2.80599468e+00 -2.97117264e+00 -7.37016820e+00 -6.99796424e+00 1.78227300e+00 6.61785624e+00 -5.38368524e+00 3.31468920e+00 5.51715212e+00 -3.92683046e+00] [-7.79015418e+00 4.76973200e-02 -6.79654476e+00 7.44924622e+00 -4.69770910e+00 -4.28371356e+00 1.87911844e+00 4.45438142e+00 2.56497354e+00 -7.24042700e-01] [-1.73386012e+00 -7.64609286e+00 -3.71575466e+00 -9.06896972e+00 -3.22899456e+00 -6.35800814e+00 2.91454254e+00 1.21491094e+00 5.39923440e+00 -4.04388272e+00] [ 3.22212522e+00 5.11643348e+00 2.54894998e+00 -4.32271604e+00 -8.27150752e+00 -7.94970662e+00 2.82502302e+00 9.06189960e-01 -9.36950296e+00 5.84721284e+00] [-8.54466004e+00 -6.48677902e+00 5.12652260e-01 5.00415338e+00 -6.43752572e+00 -9.31718028e+00 1.70262346e+00 2.42459968e+00 -2.21276200e+00 -2.82571694e+00] [-5.13930766e+00 -5.07156922e+00 -7.38994394e+00 8.66899440e+00 -2.40124188e+00 5.66800922e+00 -3.99931484e+00 -7.49033556e+00 4.97748210e+00 -8.61535074e+00] [-5.95968886e+00 -9.89868284e+00 -4.60773896e+00 -2.97050000e-03 -6.97428262e+00 -6.51661090e+00 -3.38724532e+00 -3.66187892e+00 -3.55826090e+00 9.27953282e+00] [ 9.87204410e+00 -2.60193890e+00 -2.54222866e+00 5.43956660e+00 -2.06631716e+00 8.26192650e+00 -7.60844540e+00 4.70957778e+00 -8.89163050e+00 1.52599610e+00]] * x
CVXPY is doing rigorous checks to make sure the problem is convex, and refuses models that are found to be non-convex.
The linearized formulation can look like:
import numpy as np import cvxpy as cp # -------- data --------- Q = np.array([ [-6.56505736, 6.86533416, 1.00750712, -3.97724192, -4.15575766, -5.51894266, -3.00338992, 7.12540694, -8.65772554, 0.00421338], [ 9.96235254, 1.57466756, 9.82266078, 5.24500934, -7.38615034, 2.79437518, -6.80964272, -4.99838934, 3.37857218, -1.29287238], [-2.80599468, -2.97117264, -7.3701682 , -6.99796424, 1.782273 , 6.61785624, -5.38368524, 3.3146892 , 5.51715212, -3.92683046], [-7.79015418, 0.04769732, -6.79654476, 7.44924622, -4.6977091 , -4.28371356, 1.87911844, 4.45438142, 2.56497354, -0.7240427 ], [-1.73386012, -7.64609286, -3.71575466, -9.06896972, -3.22899456, -6.35800814, 2.91454254, 1.21491094, 5.3992344 , -4.04388272], [ 3.22212522, 5.11643348, 2.54894998, -4.32271604, -8.27150752, -7.94970662, 2.82502302, 0.90618996, -9.36950296, 5.84721284], [-8.54466004, -6.48677902, 0.51265226, 5.00415338, -6.43752572, -9.31718028, 1.70262346, 2.42459968, -2.212762 , -2.82571694], [-5.13930766, -5.07156922, -7.38994394, 8.6689944 , -2.40124188, 5.66800922, -3.99931484, -7.49033556, 4.9774821 , -8.61535074], [-5.95968886, -9.89868284, -4.60773896, -0.0029705 , -6.97428262, -6.5166109 , -3.38724532, -3.66187892, -3.5582609 , 9.27953282], [ 9.8720441 , -2.6019389 , -2.54222866, 5.4395666 , -2.06631716, 8.2619265 , -7.6084454 , 4.70957778, -8.8916305 , 1.5259961 ]]) n = Q.shape[0] # ---- linearized model, matrix format ----- x = cp.Variable((n,1),"x",boolean=True) Y = cp.Variable((n,n),"Y") e = np.ones((n,1)) prob = cp.Problem(cp.Minimize(cp.trace(Q.T@Y)), [Y <= x@e.T, Y <= e@x.T, Y >= x@e.T + e@x.T - e@e.T, Y >= 0, Y <= 1]) prob.solve(solver=cp.GLPK_MI,verbose=True) print("status:",prob.status) print("objective:",prob.value) print("levels:",x.value)
Notes:
- The objective can be replaced by cp.sum(cp.multiply(Q,Y)),
- I relaxed the Y variables,
- We use glpk as the MIP solver,
- CVXPY comes with an integer solver called ECOS_BB. This solver seems to choke on this problem,
- I have declared \(x\) as an explicit \((n \times 1)\) column vector instead of just a vector of length \(n\). Sometimes CVXPY is picky about this when doing things like matrix multiplication.
- I was not able to enter this model directly: I has to use a piece of paper first to get from the indexed version to the matrix version. There was an extra manual translation step needed. This step, of course, can easily lead to errors.
An alternative approach is to convexify the problem. To do this we add a diagonal matrix to our Q matrix (such that it becomes positive definite) and compensate by subtracting the same amount using a linear part of the objective. We use here that for binary \(x_i\) we have \(x_i^2 = x_i\). The steps are:
- Make \(Q\) symmetric by: \[Q := 0.5(Q+Q^T)\] This will give the same solution but now we can calculate real valued eigenvalues.
- Calculate the smallest (or most negative) eigenvalue \[\lambda_1 := \min_i \lambda_i\]
- If \(\lambda_1 < 0\) replace \(Q\) by \[Q := Q + (|\lambda_1|+\mathit{tol}) I\] and also introduce a linear part in the objective: \[\min\>x^TQx - (|\lambda_1|+\mathit{tol}) \sum_i x_i\]
- If \(\lambda_1 \ge 0\) solve the original problem. We may use a tolerance to make sure the smallest eigenvalue is actually \(\lambda_1\ge \mathit{tol}\).
The Python/CVXPY code can look like:
# ------- convexified model ------------- Q = 0.5*(Q+Q.T) # make Q symmetric w, v = np.linalg.eig(Q) # eigen decomposition print("Eigenvalues\n",w) w1 = min(w) # first eigen value print("Smallest eigenvalue",w1) tol = 1.0e-5 # tolerance f = 0 # factor for diagonal perturbation if w1<tol: f = -w1 + tol Q2 = Q + f*np.eye(n) prob = cp.Problem(cp.Minimize(cp.quad_form(x,Q2)-f*sum(x)),[]) prob.solve(solver=cp.ECOS_BB,verbose=False) print("##### MIQP formulation") print("status:",prob.status) print("objective:",prob.value) print("levels:\n",x.value)
This model is solved now as a convex MIQP.
The original GAMS model for this problem was as follows:
set i /i1*i10/; alias(i,j); parameter q(i,j); q(i,j) = uniform(-10,10); binary variable x(i); variable z; equation obj; obj.. z =e= sum((i,j), x(i)*q(i,j)*x(j)); model m /obj/; option miqcp=cplex,optcr=0; solve m minimizing z using miqcp; display z.l,x.l;
Cplex will automatically linearize this model.
CVXPY Sparse Variables
CVXPY has some severe limitations on how variables can look like.
First we cannot use three (or more) dimensional variables. So something like x[i,j,k] is not supported. A declaration like:
X = cp.Variable((n,n,n),"X")
gives:
ValueError: Expressions of dimension greater than 2 are not supported.
This is a rather severe restriction. Many practical model have variables exceeding 2 dimensions. Of course matrix notation becomes impractical for symbols with more than 2 dimensions. Which is probably the reason why CVXPY ony wants to handle scalars, vectors and matrices.
Furthermore sparse variables are not supported either. Everything is fully allocated. As an example consider the toy model:
n = 100 X = cp.Variable((n,n),"X") prob = cp.Problem( cp.Minimize(0), [X[0,0]==1]) Solver Log: problem: variables n = 10000, constraints m = 1
My hope was that I could just declare a large variable matrix and that CVXPY would only export the used variables to the solver. Instead of the solver seeing a model with one variable, it receives a model with 10,000 variables.
Example: a Sparse Network Model
A linear programming formulation for a max-flow network problem can look like:
Max-flow Sparse Network Model |
---|
\[\begin{align}\max\>\>&\color{darkred}f\\ & \sum_{j|\color{darkblue}{\mathit{arc}}(j,i)}\color{darkred}x_{j,i} = \sum_{j|\color{darkblue}{\mathit{arc}}(i,j)}\color{darkred}x_{i,j} + \color{darkred}f\cdot \color{darkblue}b_i && \forall i \\ & 0 \le \color{darkred}x_{i,j} \le \color{darkblue}{\mathit{capacity}_i} && \forall i,j|\color{darkblue}{\mathit{arc}}(i,j) \end{align}\] |
Here \(arc(i,j)\) indicates whether link \(i \rightarrow j\) exists. The sparse data vector \(b_i\) is defined by \[b_i = \begin{cases} -1 & \text{if node $i$ is the source node}\\ +1 & \text{if node $i$ is the sink node}\\ 0 & \text{otherwise} \end{cases}\]
This mathematical model translates directly into a GAMS model:
$ontext max flow network example Data from example in Mitsuo Gen, Runwei Cheng, Lin Lin Network Models and Optimization: Multiobjective Genetic Algorithm Approach Springer, 2008 Erwin Kalvelagen, Amsterdam Optimization, May 2008 $offtext sets i 'nodes' /node1*node11/ source(i) /node1/ sink(i) /node11/ ; alias(i,j); parameter capacity(i,j) / node1.node2 60 node1.node3 60 node1.node4 60 node2.node3 30 node2.node5 40 node2.node6 30 node3.node4 30 node3.node6 50 node3.node7 30 node4.node7 40 node5.node8 60 node6.node5 20 node6.node8 30 node6.node9 40 node6.node10 30 node7.node6 20 node7.node10 40 node8.node9 30 node8.node11 60 node9.node10 30 node9.node11 50 node10.node11 50 /; set arcs(i,j); arcs(i,j)$capacity(i,j) = yes; display arcs; parameter rhs(i); rhs(source) = -1; rhs(sink) = 1; variables x(i,j) 'flow along arcs' f 'total flow' ; positive variables x; x.up(i,j) = capacity(i,j); equations flowbal(i) 'flow balance' ; flowbal(i).. sum(arcs(j,i), x(j,i)) - sum(arcs(i,j), x(i,j)) =e= f*rhs(i); model m /flowbal/; solve m maximizing f using lp;
The GAMS model exploits that GAMS stores data sparsely. The variables x(i,j) are only allocated when they are used inside the equations. This usage is restricted to cases where arcs(i,j) exist. I.e. the number of variables x(i,j) is 22 instead of \(11 \times 11\).
As we discussed in the previous section, CVXPY does not support sparse variables like GAMS. So instead of variables x(i,j) we'll use x[k] where k indicates the arc number. CVXPY supports sparse data matrices through scipy.sparse. In the code below we set up a sparse matrix A with entries as follows:
- A[i,k] = -1 if arc \(k\) represents an outgoing link \(i \rightarrow j\)
- A[i,k] = +1 if arc \(k\) represents an incoming link \(j \rightarrow i\)
With this we can formulate the model:
Matrix based max-flow model |
---|
\[\begin{align}\max\>\>&\color{darkred}f\\ & \color{darkblue}A \cdot \color{darkred}x = \color{darkred}f \cdot \color{darkblue} b \\ & 0 \le \color{darkred}x \le \color{darkblue}{\mathit{capacity}} \end{align}\] |
The network logic is hidden in the sparse matrix \(A\). From a modeling point of view, this model is at a too high abstraction level: we have lost all structure we want to be visible and reason about. The CVXPY implementation is as follows:
import numpy as np import scipy.sparse as sparse import cvxpy as cp # ------ data -------- data = { 'nodes':['A','B','C','D','E','F','G','H','I','J','K'], 'from':['A','A','A','B','B','B','C','C','C','D','E', 'F','F','F','F','G','G','H','H','I','I','J'], 'to': ['B','C','D','C','E','F','D','F','G','G','H', 'E','H','I','J','F','J','I','K','J','K','K'], 'capacity': [60,60,60,30,40,30,30,50,30,40,60,20,30,40,30,20,40,30,60,30,50,50], 'source' : 'A', 'sink' : 'K' } numnodes = len(data['nodes']) numarcs = len(data['capacity']) print("Number of nodes: {}".format(numnodes)) print("Number of arcs: {}".format(numarcs)) # ------ lp data -------- # map node name to index map = dict(zip(data['nodes'],range(numnodes))) # coefficients irow = np.zeros(2*numarcs,int) jcol = np.zeros(2*numarcs,int) val = np.zeros(2*numarcs) # arc k: i->j has coefficient -1 in row i, column k # +1 j for k in range(numarcs): i = map[data['from'][k]] j = map[data['to'][k]] kk = 2*k irow[kk] = i jcol[kk] = k val[kk] = -1 kk = 2*k+1 irow[kk] = j jcol[kk] = k val[kk] = 1 A = sparse.csc_matrix((val,(irow,jcol))) b = np.zeros(numnodes) b[map[data['source']]] = -1 b[map[data['sink']]] = 1 cap = data['capacity'] # ------ lp model -------- x = cp.Variable(numarcs,"x") f = cp.Variable(1,"f") prob = cp.Problem(cp.Maximize(f), [A@x == f*b, x >= 0, x <= cap]) prob.solve(verbose=True) print(prob)
With this experiment, we have confirmed that sparse data matrices work just fine with CVXPY. However, this makes the model not that straightforward anymore. This is more complicated than the corresponding GAMS model. I am able to write down the GAMS immediately, but this CVXPY implementation required some real work (and some thinking).
See [4] for an alternative approach.
Example: Matrix Balancing
In this example we want to estimate the inner part of a matrix subject to row- and column-total constraints. This problem is frequently encountered in economic modeling. An additional constraint is that we want to maintain the sparsity pattern of the matrix. Basically the model is:
Matrix Balancing Model |
---|
\[\begin{align}\min\>\>&\mathbf{dist}(\color{darkred}A,\color{darkblue}A^0)\\ & \sum_i\color{darkred}a_{i,j} = \color{darkblue} v_j && \forall j\\ &\sum_j\color{darkred}a_{i,j} = \color{darkblue} u_i && \forall i \\ & \color{darkblue}a^0_{i,j}=0 \Rightarrow\color{darkred}a_{i,j} = 0 \end{align}\] |
There are different possibilities for the distance function. E.g. it can be a quadratic function \[\mathbf{dist}(A,A^0) = \sum_{i,j} (a_{i,j}-a^0_{i,j})^2\] or in this case an entropy function \[\mathbf{dist}(A,A^0) =\sum_{i,j} a_{i,j}\log\left(\frac{a_{i,j}}{a^0_{i,j}}\right)\]
Often the implication is enforced by just ignoring or skipping all elements \(a_{i,j}\) where \(a^0_{i,j}=0\). This leads again to a sparse representation of the variables. In GAMS this is quite easy.
$ontext Example from Using PROC IML to do Matrix Balancing Carol Alderman, University of Kansas Institute for Public Policy and Business Research MidWest SAS Users Group MWSUG 1992 $offtext sets p 'products' /pA*pI/ s 'salesmen' /s1*s10/ ; table A0(*,*) 'estimated matrix, known totals' s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 rowTotal pA 230 375 375 100 685 215 50 2029 pB 330 405 419 175 90 504 515 240 105 2798 pC 268 225 242 30 790 301 44 100 1998 pD 595 380 638 275 30 685 605 88 100 160 3566 pE 340 360 440 200 30 755 475 44 150 2794 pF 132 190 200 432 130 1071 pG 309 330 350 125 612 474 50 50 2305 pH 365 400 330 150 50 575 600 44 150 110 2747 pI 210 250 308 125 720 256 100 50 2015 colTotal 2772 2910 3300 1150 240 5760 3526 220 950 495 ; alias (p,i); alias (s,j); variables A(i,j) 'new values' z 'objective (minimized)' ; equations objective rowsum(i) colsum(j) ; objective.. z =e= sum((i,j)$A0(i,j), A(i,j)*log(A(i,j)/A0(i,j))); rowsum(i).. sum(j$A0(i,j), A(i,j)) =e= A0(i,'rowTotal'); colsum(j).. sum(i$A0(i,j), A(i,j)) =e= A0('colTotal',j); A.L(i,j) = A0(i,j); A.lo(i,j)$A0(i,j) = 0.0001; model m /all/; solve m minimizing z using nlp; display A.L,z.l;
When solved with the general purpose NLP solver CONOPT, we get the following results:
---- 58 VARIABLE A.L new values s1 s2 s3 s4 s5 s6 s7 s8 s9 pA 229.652 374.869 375.099 100.028 686.238 212.542 50.572 pB 330.587 406.192 420.492 175.626 94.300 506.575 510.790 243.545 pC 267.313 224.685 241.809 31.297 790.595 297.246 44.017 101.037 pD 595.262 380.610 639.416 275.614 31.391 687.580 599.252 88.299 101.341 pE 339.659 360.058 440.341 200.158 31.346 756.750 469.809 44.086 151.793 pF 130.330 187.815 197.821 427.953 127.080 pG 309.478 330.895 351.165 125.418 614.984 470.016 50.727 pH 360.594 395.631 326.596 148.455 51.665 569.947 586.867 43.598 150.111 pI 209.123 249.246 307.260 124.701 719.378 252.398 100.874 + s10 pB 109.894 pD 167.233 pG 52.318 pH 113.535 pI 52.019 ---- 58 VARIABLE z.L = -15.769 objective (minimized)
CVXPY does not allow to skip elements like GAMS does. Well, unless we build the whole model in scalar mode: element by element. That is not very attractive so let's try another way. It will be a bit of a struggle. Let's define a (data) matrix D as \[d_{i,j} = \begin{cases} 1 & \text{if $a^0_{i,j} \ne 0$}\\ 0 & \text{if $a^0_{i,j} = 0$}\end{cases}\] This matrix can be used to skip linear terms that are not needed. The objective is another story. CVXPY has the function entr() which is defined by: \(-x\log(x)\). We expand the objective as: \[\sum_{i,j} a_{i,j}\log\left(\frac{a_{i,j}}{a^0_{i,j}}\right) = \sum_{i,j} - \mathbf{entr}(a_{i,j}) - a_{i,j}\log(a^0_{i,j})\] Finally we insert \(d\) to ignore the zero's: \[ \min \sum_{i,j} - d_{i,j} \mathbf{entr}(a_{i,j}) - d_{i,j} a_{i,j}\log(a^0_{i,j}+1-d_{i,j})\] This is quite some gymnastics to shoehorn our model into an acceptable CVXPY format.
import numpy as np import cvxpy as cp # -------- data ---------- A0 = [[ 230 , 375 , 375 , 100 , 0 , 685 , 215 , 0 , 50 , 0 ], [ 330 , 405 , 419 , 175 , 90 , 504 , 515 , 0 , 240 , 105 ], [ 268 , 225 , 242 , 0 , 30 , 790 , 301 , 44 , 100 , 0 ], [ 595 , 380 , 638 , 275 , 30 , 685 , 605 , 88 , 100 , 160 ], [ 340 , 360 , 440 , 200 , 30 , 755 , 475 , 44 , 150 , 0 ], [ 132 , 190 , 200 , 0 , 0 , 432 , 130 , 0 , 0 , 0 ], [ 309 , 330 , 350 , 125 , 0 , 612 , 474 , 0 , 50 , 50 ], [ 365 , 400 , 330 , 150 , 50 , 575 , 600 , 44 , 150 , 110 ], [ 210 , 250 , 308 , 125 , 0 , 720 , 256 , 0 , 100 , 50 ]] u = [2029,2798,1998,3566,2794,1071,2305,2747,2015] v = [2772,2910,3300,1150,240,5760,3526,220,950,495] m = len(u) n = len(v) # -------- model ---------- D = np.sign(A0) Dloga0 = D * np.log(A0+np.ones_like(D)-D) A = cp.Variable((m,n),"A") obj = cp.Minimize(cp.sum(cp.multiply(D,-cp.entr(A)) - cp.multiply(A,Dloga0))) cons = [cp.sum(cp.multiply(D,A),axis=1)==u, cp.sum(cp.multiply(D,A),axis=0)==v, A >= 0] prob = cp.Problem(obj,cons) prob.solve(solver=cp.SCS,verbose=True,max_iters=200000)
Solving this tiny model is very difficult. It is using exponential cones and that is not very robustly implemented in the solvers. With SCS and a lot of iterations, we finally see:
101400| 7.74e-05 9.08e-05 7.16e-04 -1.56e+01 -1.56e+01 3.30e-11 1.14e+02 101500| 7.72e-05 9.10e-05 5.74e-04 -1.54e+01 -1.54e+01 3.30e-11 1.14e+02 101600| 7.69e-05 9.13e-05 5.30e-04 -1.51e+01 -1.51e+01 7.77e-11 1.14e+02 101700| 7.66e-05 9.17e-05 4.02e-04 -1.49e+01 -1.49e+01 5.64e-11 1.14e+02 101800| 7.62e-05 9.22e-05 3.96e-04 -1.46e+01 -1.46e+01 3.30e-11 1.14e+02 101900| 7.59e-05 9.28e-05 2.10e-04 -1.44e+01 -1.44e+01 3.30e-11 1.14e+02 101980| 7.56e-05 9.33e-05 3.35e-05 -1.42e+01 -1.42e+01 1.17e-11 1.14e+02 ---------------------------------------------------------------------------- Status: Solved Timing: Solve time: 1.14e+02s Lin-sys: nnz in L factor: 1089, avg solve time: 1.87e-05s Cones: avg projection time: 1.06e-03s Acceleration: avg step time: 2.62e-07s ---------------------------------------------------------------------------- Error metrics: dist(s, K) = 4.8795e-06, dist(y, K*) = 0.0000e+00, s'y/|s||y| = -5.3311e-12 primal res: |Ax + s - b|_2 / (1 + |b|_2) = 7.5554e-05 dual res: |A'y + c|_2 / (1 + |c|_2) = 9.3272e-05 rel gap: |c'x + b'y| / (1 + |c'x| + |b'y|) = 3.3548e-05 ---------------------------------------------------------------------------- c'x = -14.2093, -b'y = -14.2103 ============================================================================
The objective is in the neighborhood of what CONOPT found for the GAMS model (CONOPT required just a few iterations).
This model is not exactly a showcase for CVXPY.
Conclusion
CVXPY can be very convenient to model certain classes of models: convex, structured and easy too write in matrix notation. For other models it is in all likelihood not very suited. We saw some examples where things become really hairy when using CVXPY while the underlying mathematical model is quite simple. CVXPY should really be considered as a special purpose modeling tool and you may want to consider other tools when the model does not really fits CVXPY's matrix philosophy. Also this shows how an equation based modeling tool like AMPL or GAMS is more versatile: we can express many more models in a natural way without much trickery. Furthermore, we saw that pushing matrix notation too far can lead to models that are less readable.
References
- CVXPY website, https://www.cvxpy.org/
- GAMS transportation model example, https://www.gams.com/products/simple-example/
- OQSP first-order QP solver, https://osqp.org/
- CVXPY network example, https://www.cvxpy.org/examples/applications/OOCO.html
- Convex Optimization in Python with CVXPY | SciPy 2018 | Steven Diamond, https://www.youtube.com/watch?v=kXqu-TqEl7Q
- Steven Diamond and Stephen Boyd, CVXPY: A Python-Embedded Modeling Language for Convex Optimization, Journal of Machine Learning Research, 2016, vol 17, no 83, pp 1-5.
- Grant M., Boyd S., Ye Y. (2006) Disciplined Convex Programming. In: Liberti L., Maculan N. (eds) Global Optimization. Nonconvex Optimization and Its Applications, vol 84. Springer, Boston, MA, https://web.stanford.edu/~boyd/papers/pdf/disc_cvx_prog.pdf
- CVXPY: Efficiently writing constraints for pairwise sums, https://stackoverflow.com/questions/59672262/cvxpy-efficiently-writing-constraints-for-pairwise-sums. Example of difficulty in matrix formulations. This is similar to the binary QP example above.
Nice Tutorial Thank you.
ReplyDeleteThank you for the great tutorial!
ReplyDeleteJust wanted to point out that in the newest version of cvxpy (v1.6) the two limitations that you mentioned in this article have been added.
N-dimensional variables: https://www.cvxpy.org/tutorial/advanced/index.html#n-dimensional-expressions
Sparsity variables: https://www.cvxpy.org/tutorial/constraints/index.html#sparsity-attribute