Tuesday, December 10, 2019

Nonlinear variant of a knapsack problem

In [1] a problem is posed:

Original Problem
\[\begin{align}\max & \sum_i \log_{100}(\color{darkblue}w_i) \cdot \color{darkred}x_i \\ & \frac{\sum_i \color{darkblue}w_i \color{darkred}x_i}{\sqrt{\color{darkred}k}} \le 10,000\\ & \color{darkred} k = \sum_i \color{darkred}x_i \\ & \color{darkred}x_i \in \{0,1\} \end{align}\]

The data vector \(w_i\) is assumed to integer valued with \(w_i\ge 1\). Hence the logarithm can be evaluated without a problem. Also we can assume that the number of items can be around 1,000.

I don't think I have ever seen \(\log_{100}()\) being used in a model. Most programming (and modeling) languages only support natural logarithms \(\ln()\) and may be \(\log_{10}()\). We can convert things by: \[\log_{100}(x) = \frac{\ln(x)}{\ln(100)}\] This means the objective can be written as \[\max \frac{1}{\ln(100)} \sum_i \ln(w_i) x_i\] Essentially, the \(\log_{100}()\) function just adds a scaling factor. We can simplify the objective to \[\max \sum_i \ln(w_i)x_i\] (The objective value will be different, but the optimal solution will be the same). Note that this is a linear objective.

In this post I'll discuss a few different formulations (MINLP, MIQCP and MIP), and report my findings.

MINLP Formulation


The constraint can be rewritten as: \[\sum_i w_i x_i \le 10,000 \sqrt{k}\] If we ignore the all zero solution, we can assume \(k\ge 1\). This bound will make sure the square root function is always differentiable. With this, we have a standard MINLP (Mixed Integer Nonlinear Programming) model.

MINLP Model
\[\begin{align}\max & \sum_i \ln(\color{darkblue}w_i) \color{darkred}x_i \\ & \sum_i \color{darkblue}w_i \color{darkred}x_i \le 10,000\sqrt{\color{darkred}k}\\ & \color{darkred} k = \sum_i \color{darkred}x_i \\ & \color{darkred}x_i \in \{0,1\} \\ & \color{darkred} k \ge 1 \end{align}\]

This model has only a single, well behaved, non-linearity. Literally, there is only one nonlinear nonzero element.  In addition, the model is convex. So we don't expect any problems. For \(w_i\),  I generated 1000 random integer values from the interval \([1,10000]\).

MINLP Results
SolverObjTimeNotes
Dicopt1057.13550.63 NLP, 2 MIP subproblems
SBB1056.936195Node limit exceeded
Bonmin1056.9472600Time limit exceeded
Bonmin1057.135516Option: bonmin.algorithm B-OA

The outer-approximation based algorithms (Dicopt, Bonmin with B-OA option) do much better than the branch & bound algorithms (SBB, default Bonmin). Even the global solvers do better:

Global Solver Results
SolverObjTime
Baron1057.13552
Antigone1057.13551
Couenne1057.13559

MIQCP Formulation


The problem can also be formulated as a convex MIQCP (Mixed Integer Quadratically Constrained Programming) model:


MIQCP Model
\[\begin{align}\max & \sum_i \ln(\color{darkblue}w_i) \color{darkred}x_i \\ & \color{darkred}y^2 \le \color{darkred}k\\ & \color{darkred} k = \sum_i \color{darkred}x_i \\ & \color{darkred}y = \frac{\sum_i \color{darkblue}w_i \color{darkred}x_i}{10,000} \\ & \color{darkred}x_i \in \{0,1\} \\ & \color{darkred} k \ge 1, \color{darkred}y \ge 0 \end{align}\]


Cplex has two ways to form relaxations:

  • MIQCPstrat 1: SOCP-based branch & bound
  • MIQCPstrat 2: OA-based branch & cut
The results are shown in the next section. We see again that the Outer Approximation approach works better.


MIP Formulation


Finally, we can also linearize this model by observing that \(k\in \{1,\dots,1000\}\). So we don't really have a continuous function \(f(k)=\sqrt{k}\), but rather only need function values at the integer points. We can exploit this by making this explicit. We can write: \[\begin{align} & k = \sum_i i\cdot \delta_i \\ & \sqrt{k} = \sum_i \sqrt{i}\cdot \delta_i \\ & \sum_i \delta_i = 1 \\ & \delta_i \in \{0,1\}\end{align}\] This is essentially a SOS1 (Special Ordered Set of Type 1) structure implementing a table lookup. The MIP model looks like:


MIP Model
\[\begin{align}\max & \sum_i \ln(\color{darkblue}w_i) \color{darkred}x_i \\ & \color{darkred} k = \sum_i \color{darkred}x_i \\ & \sum_i \color{darkblue}w_i \color{darkred} x_i \le 10,000 \color{darkred}q \\ & \color{darkred} k = \sum_i i \cdot \color{darkred}\delta_i \\ & \color{darkred} q = \sum_i \sqrt{i} \cdot \color{darkred}\delta_i \\ & \sum_i \color{darkred} \delta_i = 1 \\ & \color{darkred}x_i, \color{darkred}\delta_i \in \{0,1\} \\ & \color{darkred} k, \color{darkred} q \ge 1 \end{align}\]

When we solve this problem we see:

MIQCP and MIP Results
ModelSolverObjTimeNotes
MIQCPCplex1057.1355600SOCP-based branch & bound, time limit
MIQCPCplex1057.13550.3OA-based branch & cut
MIPCplex1057.13550.6

Conclusion: the question in the original post was: how to solve this problem? Here we proposed three different models: an MINLP, MIQCP and MIP model. All these models can be solved quickly. It is noted that the quadratic and linear models are not approximations: they give the same solution as the original MINLP model. Pure non-linear branch & bound methods are having a bit of a problem with the MINLP model, but Outer-Approximation works very well.

References


  1. How do we solve a variant of the knapsack problem in which the capacity of the knapsack keeps increasing as we add more items into the knapsack?, https://stackoverflow.com/questions/59242370/how-do-we-solve-a-variant-of-the-knapsack-problem-in-which-the-capacity-of-the-k
  2. Pierre Bonami, Andrea Tramontani, Recent improvement to MISOCP in CPLEX, Informs 2015, http://www.optimizationdirect.com/data/informs2015_bonami.pdf

Appendix: automatic reformulation with CVXPY


CVXPY accepts the MINLP model, and can automatically reformulate this into a MISOCP model:


x = cp.Variable(n,integer=True)
obj = cp.Maximize(np.log(w)*x)
con = w*x <= 10000 * cp.sqrt(cp.sum(x)) 
prob = cp.Problem(obj,[con, x >= 0, x <= 1])
prob.solve(solver=cp.ECOS_BB, max_iters=10000)
print("status:",prob.status)
print("Obj:",prob.value)

Unfortunately, the MISOCP solver that comes with CVXPY is not the most reliable one. It shows:


status: optimal_inaccurate
Obj: 1057.1085712222387
X: [-1.52640229e-10 -4.44776117e-11 -6.08671797e-11 -9.71628737e-11
 -9.95050649e-11 -1.23087763e-10 -8.63183870e-11 -4.39955343e-11
  1.00000000e+00 -6.54372650e-11 -3.95051292e-11 -5.86051991e-11
...

The reported objective is close to what we have found earlier.

Saturday, December 7, 2019

Elementwise vs matrix multiplication


Introduction


In this post I want to demonstrate some interesting issues in how CVXR and CVXPY deal with elementwise vs matrix multiplication and compare that to the host languages (R and Python, respectively). CVXR and CVXPY deviate a bit from what their host languages do. When implementing optimization models with these tools, it is important to be aware of these details.

Elementwise and matrix multiplication 


There are two often used methods to perform the multiplication of matrices (and vectors). The first is simply elementwise multiplication: \[c_{i,j} = a_{i,j} \cdot b_{i,j}\] In mathematics, this is sometimes referred to as the Hadamard product. The notation is typically: \[C = A \circ B\] but sometimes we see: \[C = A \odot B\] This product only works when \(A\) and \(B\) have the same shape. I.e. \[\begin{matrix} C&=&A&\circ&B\\ (m \times n) &&(m \times n)&&(m \times n)\end{matrix}\]

The standard matrix multiplication \(C=A\cdot B\) or \[c_{i,j} = \sum_k a_{i,k} b_{k,j}\] has a different rule for conformance:  \[\begin{matrix} C&=&A&\cdot&B\\ (m \times n) &&(m \times k)&&(k \times n)\end{matrix}\] Most of the time, the dot operator is dropped and we write \(C = A B\).

In optimization modeling both forms are used a lot.

R, CVXR


R has two operators for multiplication:
  • * for elementwise multiplication
  • %*% for matrix multiplication

A vector is considered as a column vector (i.e. an \(n\)-vector is like a \(n \times 1\) matrix).  There is one special thing in R, as shown here:


> m <- 2
> n <- 3
> A <- matrix(c(1,2,3,4,5,6),m,n)
> A
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
> 
> x <- c(1,2,3)
>
 
#
# matrix multiplication
# 
> A %*% x
     [,1]
[1,]   22
[2,]   28
> 

#
# elementwise multiplication
#
> A * A
     [,1] [,2] [,3]
[1,]    1    9   25
[2,]    4   16   36

#
# but this also works
#
> A * x
     [,1] [,2] [,3]
[1,]    1    9   10
[2,]    4    4   18
>


The last multiplication is surprising as \(A\) is a \(2 \times 3\) matrix and \(x\) is different in shape. Well, R may extend and recycle vectors to make them as large as needed. In this case \(x\) is duplicated and then considered as a  \(2 \times 3\) matrix. More or less like:


> x
[1] 1 2 3
> matrix(x,2,3)
     [,1] [,2] [,3]
[1,]    1    3    2
[2,]    2    1    3


The modeling tool CVXR follows R notation and implements both * and %*% (for elementwise and matrix multiplication). However, CVXR is not implementing the extending and recycling of vectors that are too small.

Concept: recycling


Just to emphasize the concept of recycling. If an operation requires two vectors of the same length, R may make the shorter vector longer by recycling (duplicating). Here is an example:


> a <- 1:10
> b <- 1:2
> c <- a + b
> c
 [1]  2  4  4  6  6  8  8 10 10 12


In this example the vector a has elements 1 through 10. The vector b is too short, so it is recycled. When added to a, b is functionally equal to rep(c(1,2),5). When multiples of b do not exactly fit a, we effectively have a fractional duplication number. E.g. when we use b <- 1:3, we get a message:
Warning message:
In a + b : longer object length is not a multiple of shorter object length

This recycling trick is somewhat unique to R (I don't know about other languages doing this).

Example


In [2] the product: \[b_{i,j} = a_{i,j} \cdot x_i \] is implemented in R with the elementwise product:


> m <- 2
> n <- 3
> A <- matrix(c(1,2,3,4,5,6),m,n)
> A
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
> x <- c(1,2)
> B <- A*x
> B
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    4    8   12


This again uses recycling of \(x\). CVXR is not doing this automatically. We can see this here:


> library(CVXR)
> x <- Variable(m)
> B <- Variable(m,n)
> e <- rep(1,n)
> problem <- Problem(Minimize(0),
+                    list(x == c(1,2), 
+                         B == A * x ))
Error in sum_shapes(lapply(object@args, function(arg) { : 
  Incompatible dimensions


Here \(x\) is now a CVXR variable. As the vector \(x\) is not recycled, we end up with two different shapes and elementwise multiplication is refused. So how do we do something like this in CVXR?

The recycling operation:


> x
[1] 1 2
> matrix(x,m,n)
     [,1] [,2] [,3]
[1,]    1    1    1
[2,]    2    2    2


can be expressed in matrix notation as: \[x \cdot e^T\] where \(e\) is a (column) vector of ones. This is sometimes called an outerproduct. I.e. we can write our assignment as \[B = A \circ (x \cdot e^T)\] In a CVXR model this can look like:


> library(CVXR)
> x <- Variable(m)
> B <- Variable(m,n)
> e <- rep(1,n)
> problem <- Problem(Minimize(0),
+                    list(x == c(1,2), 
+                         B == A * (x %*% t(e))))
> sol <- solve(problem)
> sol$status
[1] "optimal"
> sol$getValue(x)
     [,1]
[1,]    1
[2,]    2
> sol$getValue(B)
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    4    8   12


Note: the constraint B == A * (x %*% t(e)) has both a matrix multiplication and a elementwise multiplication. This is rather funky.

Conclusion: if your matrix operations rely on recycling, you will need to rework things a bit to have this work correctly in CVXR. CVXR does not do recycling.

Python, CVXPY


In the previous section we saw that there are subtle differences between R's and CVXR's elementwise multiplication semantics.

Let's now look at Python and CVXPY.

Since Python 3.5 we have two multiplication operators:

  • * for elementwise multiplication 
  • @ for matrix multiplication 

CVXPY has different rules:

  • *, @  and matmul for matrix multiplication
  • multiply for elementwise multiplication

Example



import numpy as np
import cvxpy as cp

#
# In Python/numpy * indicates elementwise multiplication
#
A = np.array([[1,2],[3,4]])
B = np.array([[1,1],[2,2]])
C = A*B
print(A)
# output:
# [[ 1 2]
#  [ 3 4]]
print(C)
# output:
# [[ 1 2]
#  [ 6 8]]

#
# In CVXPY * indicates matrix multiplication
#
A = cp.Variable((2,2))
C = cp.Variable((2,2))
prob = cp.Problem(cp.Minimize(0),
                  [A == [[1,2],[3,4]],
                   C == A*B])
prob.solve(verbose=True)
print(A.value)
# output:
# [[1. 3.]
#  [2. 4.]]
print(C.value)
# [[ 7.  7.]
#  [10. 10.]]

Here we see some differences between Python/Numpy and CVXPY. First the interpretation of Python lists (the values for A) is different. And secondly: the semantics of * are different. This may cause some confusion.

References


  1. CVXR, Convex Optimization in R, https://cvxr.rbind.io/
  2. CVXR Elementwise multiplication of matrix with vector, https://stackoverflow.com/questions/59224555/cvxr-elementwise-multiplication-of-matrix-with-vector

Tuesday, December 3, 2019

Opt Art



New book by TSP and Domino art creator Robert Bosch [1].

Content:

  1. Optimization and the Visual Arts?
  2. Truchet Tiles
  3. Linear Optimization and the Lego Problem
  4. The Linear Assignment problem and Cartoon Mosaics
  5. Domino Mosaics
  6. From the TSP to Continuous Line Drawings
  7. TSP Art with Side Constraints
  8. Knight's Tours
  9. Labyrinth Design with Tiling and Pattern Matching
  10. Mosaics with Side Constraints
  11. Game-of-life Mosaics
Yes, it contains color pictures.

TSP Art


An example of a TSP drawing (from [2]):

25,000-city TSPortrait of George Dantzig [2,3]
Original fotograph


I guess more cities are needed to prevent his teeth from disappearing.


Domino Art


You can submit your picture and create a domino mosaic through NEOS [4].  I tried this with the Dantzig picture. With a "high quality" output image this creates a large model:

Reduced MIP has 13915 rows, 1511565 columns, and 4534695 nonzeros.
Reduced MIP has 1511565 binaries, 0 generals, 0 SOSs, and 0 indicators.

This is not small at all. Although the model is large, it is easy to solve. It only took 11 nodes (most of the work was in the root node). The model is an assignment problem with side constraints [5]. Basically the costs are the difference in gray scale value between the picture and the domino piece for each cell. The constraints are: exactly one domino in each cell and place all dominos. NEOS solves this by processing the picture in Matlab and then solving the model with GAMS/CPLEX. The output PNG file is also notably large: 2.8 MB. Too large to include in this blog. A screen grab looks like:

Screen capture of hi-res image

After zooming in we see indeed the dominos:

Zooming in

With some effort we can recognize part of the frame of the eye glasses here.

References

Wednesday, November 27, 2019

Gurobi v9.0.


Now including elevator music!


 Major new features:

  • Non-convex quadratic solver
    • Supports non-convexities both in objective and constraints
    • Not quite sure if MIQCP is supported (I assume it is, but I think this was not mentioned explicitly)
    • Cplex had already support for (some) non-convex quadratic models, so Gurobi is catching up here.
  • Performance improvements
    • MIP: 18% faster overall and 26% on difficult models (more than 100 seconds)
    • MIQP: 24% faster
    (these numbers are from the email I received -- not from the movie). Quite impressive numbers. Performance does not seem to plateau (yet). Of course we see this also for Cplex: it keeps improving. There is some healthy competition here.

References


Friday, November 22, 2019

SDP Model: imputing a covariance matrix

Missing values


Missing values are a well-known problem in statistics. The simplest approach is just to delete all data cases that have missing values. Another approach is to repair things by filling in reasonable values. This is called imputation. Imputation strategies can be very sophisticated (and complex).

Statistical tools have often direct support for representing missing values. E.g. the R language has NA (not available). The modeling language GAMS also has NA. Python, however, has no explicit support for missing values [2]. By convention, the special floating point value NaN (Not a Number) is used to indicate missing values for floating point numbers. One real disadvantage is that NaN is not available for other data types than floating point numbers. Furthermore, the object None can be used to indicate missing values. As None is an object, that approach often yields a performance penalty. It is noted that the numpy library has some facilities to deal with missing data through masks (this is really different from NA in R and GAMS).

Imputing missing values in a covariance matrix


In [1] a semi-definite programming (SDP) model is proposed to deal with a covariance matrix with some missing values by imputation. The constraint to be added is that the covariance matrix should remain positive-semi definite (PSD). A covariance matrix should be in theory PSD, but in practice it can happen it is not. The resulting model is stated as:

SDP Model: Imputation of Covariance Matrix 
\[\begin{align} \text{minimize}\>& 0\\ \text{subject to}\>&\color{darkred}{\Sigma}_{i,j} = \widetilde{\color{darkblue}{\Sigma}}_{i,j} && (i,j)\notin \color{darkblue}M\\ & \color{darkred}{\Sigma} \succeq 0 \end{align} \]

The notation is taken from [1].  Here \(\widetilde{\Sigma}\) is the (original) covariance matrix with missing data in locations \((i,j)\in M\). The variable \(\Sigma\) is the new covariance matrix with missing data filled in such that \(\Sigma\) is positive semi-definite. This last condition is denoted by \(\Sigma \succeq 0\). In this model there is no objective, as indicated by minimizing zero.

CVXPY implementation


There is no code provided for this model in [1]. So let me give it a try. CVXPY does not have good support for things like \(\forall (i,j) \notin M\).  I can see two approaches:

  • Expand the constraint into scalar form. Essentially, a DIY approach.
  • Use a binary data matrix \(M_{i,j} \in \{0,1\}\) indicating the missing values and write \[(e\cdot e^T-M) \circ \Sigma = \widetilde{\Sigma}_0\] where \(\circ\) is elementwise multiplication (a.k.a. Hadamard product), \(e\) is a column vector of ones of appropriate size, and \(\widetilde{\Sigma}_0\) is \(\widetilde{\Sigma}\) but with NaN's replaced by zeros.

In addition, let's add a regularizing objective: minimize sum of squares of \(\Sigma_{i,j}\).

The Python code for these two models is:


import numpy as np
import pandas as pd 
import cvxpy as cp

#------------ data ----------------

cov = np.array([
 [ 0.300457, -0.158889,  0.080241, -0.143750,  0.072844, -0.032968,  0.077836,  0.049272],
 [-0.158889,  0.399624,  np.nan,    0.109056,  0.082858, -0.045462, -0.124045, -0.132096],
 [ 0.080241,  np.nan,    np.nan,   -0.031902, -0.081455,  0.098212,  0.243131,  0.120404],
 [-0.143750,  0.109056, -0.031902,  0.386109, -0.058051,  0.060246,  0.082420,  0.125786],
 [ 0.072844,  0.082858, -0.081455, -0.058051,  np.nan,    np.nan,   -0.119530, -0.054881],
 [-0.032968, -0.045462,  0.098212,  0.060246,  np.nan,    0.400641,  0.051103,  0.007308],
 [ 0.077836, -0.124045,  0.243131,  0.082420, -0.119530,  0.051103,  0.543407,  0.121709],
 [ 0.049272, -0.132096,  0.120404,  0.125786, -0.054881,  0.007308,  0.121709,  0.481395]     
])
print("Covariance data with NaNs")
print(pd.DataFrame(cov))

M = 1*np.isnan(cov)
print("M (indicator for missing values)")
print(pd.DataFrame(M))

dim = np.shape(cov)
n = dim[0]

#----------- model 1 -----------------

Sigma = cp.Variable(dim, symmetric=True)

prob = cp.Problem(
    cp.Minimize(cp.sum_squares(Sigma)),
    [ Sigma[i,j] == cov[i,j] for i in range(n) for j in range(n) if M[i,j]==0 ] +
     [ Sigma >> 0 ]
    )
prob.solve(solver=cp.SCS,verbose=True)

print("Status:",prob.status)
print("Objective:",prob.value)
print(pd.DataFrame(Sigma.value))

#----------- model 2 -----------------

e = np.ones((n,1))
cov0 =  np.nan_to_num(cov,copy=True)

prob2 = cp.Problem(
#    cp.Minimize(cp.trace(Sigma.T@Sigma)),  <--- not recognized as convex
    cp.Minimize(cp.norm(Sigma,"fro")**2),
    [ cp.multiply(e@e.T - M,Sigma) == cov0,
      Sigma >> 0 ]
    )
prob2.solve(solver=cp.SCS,verbose=True)

print("Status:",prob2.status)
print("Objective:",prob2.value)
print(pd.DataFrame(Sigma.value))


Notes:

  • Model 1 has a (long) list of scalar constraints. The objective is \[\min\>\sum_{i,j} \Sigma_{i,j}^2\] Sorry for the possible confusion between the symbols for summation and covariance.
  • CVXPY uses the notation Sigma >> 0 to indicate \(\Sigma \succeq 0\) (i.e. \(\Sigma\) should be positive semi-definite).
  • We added the condition that \(\Sigma\) should be symmetric in the variable statement. This seems to be needed. Without this, the solver may return a non-symmetric matrix. I suspect that in that case, the matrix \(0.5(\Sigma+\Sigma^T)\) rather than \(\Sigma\) itself is required to be positive definite.
  • Model 2 is an attempt to use matrix notation. The objective can be stated as \[\min\>\mathbf{tr}(\Sigma^T\Sigma)\] but that is not recognized as being convex. As alternative I used the Frobenius norm: \[||A||_F =\sqrt{ \sum_{i,j} a_{i,j}^2}\]
  • The function np.nan_to_num converts NaN values to zeros.
  • The function cp.multiply performs elementwise multiplication (as opposed to matrix multiplication).
  • I don't think we can easily only pass only the upper triangular part of the covariance matrix to the solver. For large problems this would save some effort (cpu time and memory).
  • In a traditional optimization model we would have just \(|M|\) decision variables (corresponding to the missing values). Here, in the scalar model, we have \(n^2\) variables and \(n^2-|M|\) constraints.


The results are:


Covariance data with NaNs
          0         1         2         3         4         5         6  \
0  0.300457 -0.158889  0.080241 -0.143750  0.072844 -0.032968  0.077836   
1 -0.158889  0.399624       NaN  0.109056  0.082858 -0.045462 -0.124045   
2  0.080241       NaN       NaN -0.031902 -0.081455  0.098212  0.243131   
3 -0.143750  0.109056 -0.031902  0.386109 -0.058051  0.060246  0.082420   
4  0.072844  0.082858 -0.081455 -0.058051       NaN       NaN -0.119530   
5 -0.032968 -0.045462  0.098212  0.060246       NaN  0.400641  0.051103   
6  0.077836 -0.124045  0.243131  0.082420 -0.119530  0.051103  0.543407   
7  0.049272 -0.132096  0.120404  0.125786 -0.054881  0.007308  0.121709   

          7  
0  0.049272  
1 -0.132096  
2  0.120404  
3  0.125786  
4 -0.054881  
5  0.007308  
6  0.121709  
7  0.481395  
M (indicator for missing values)
   0  1  2  3  4  5  6  7
0  0  0  0  0  0  0  0  0
1  0  0  1  0  0  0  0  0
2  0  1  1  0  0  0  0  0
3  0  0  0  0  0  0  0  0
4  0  0  0  0  1  1  0  0
5  0  0  0  0  1  0  0  0
6  0  0  0  0  0  0  0  0
7  0  0  0  0  0  0  0  0
----------------------------------------------------------------------------
 SCS v2.1.1 - Splitting Conic Solver
 (c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 160
eps = 1.00e-04, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback = 10, rho_x = 1.00e-03
Variables n = 37, constraints m = 160
Cones: primal zero / dual free vars: 58
 soc vars: 66, soc blks: 1
 sd vars: 36, sd blks: 1
Setup time: 9.69e-03s
----------------------------------------------------------------------------
 Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
     0| 4.05e+19  7.57e+19  1.00e+00 -3.12e+19  1.92e+20  1.21e+20  1.53e-02 
    40| 2.74e-10  1.01e-09  4.51e-11  1.71e+00  1.71e+00  1.96e-17  1.88e-02 
----------------------------------------------------------------------------
Status: Solved
Timing: Solve time: 1.89e-02s
 Lin-sys: nnz in L factor: 357, avg solve time: 1.54e-06s
 Cones: avg projection time: 1.52e-04s
 Acceleration: avg step time: 1.66e-05s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 4.7898e-17, dist(y, K*) = 1.5753e-09, s'y/|s||y| = 3.7338e-12
primal res: |Ax + s - b|_2 / (1 + |b|_2) = 2.7439e-10
dual res:   |A'y + c|_2 / (1 + |c|_2) = 1.0103e-09
rel gap:    |c'x + b'y| / (1 + |c'x| + |b'y|) = 4.5078e-11
----------------------------------------------------------------------------
c'x = 1.7145, -b'y = 1.7145
============================================================================
Status: optimal
Objective: 1.714544257213233
          0         1         2         3         4         5         6  \
0  0.300457 -0.158889  0.080241 -0.143750  0.072844 -0.032968  0.077836   
1 -0.158889  0.399624 -0.084196  0.109056  0.082858 -0.045462 -0.124045   
2  0.080241 -0.084196  0.198446 -0.031902 -0.081455  0.098212  0.243131   
3 -0.143750  0.109056 -0.031902  0.386109 -0.058051  0.060246  0.082420   
4  0.072844  0.082858 -0.081455 -0.058051  0.135981 -0.041927 -0.119530   
5 -0.032968 -0.045462  0.098212  0.060246 -0.041927  0.400641  0.051103   
6  0.077836 -0.124045  0.243131  0.082420 -0.119530  0.051103  0.543407   
7  0.049272 -0.132096  0.120404  0.125786 -0.054881  0.007308  0.121709   

          7  
0  0.049272  
1 -0.132096  
2  0.120404  
3  0.125786  
4 -0.054881  
5  0.007308  
6  0.121709  
7  0.481395  
----------------------------------------------------------------------------
 SCS v2.1.1 - Splitting Conic Solver
 (c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 162
eps = 1.00e-04, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback = 10, rho_x = 1.00e-03
Variables n = 38, constraints m = 168
Cones: primal zero / dual free vars: 64
 soc vars: 68, soc blks: 2
 sd vars: 36, sd blks: 1
Setup time: 1.02e-02s
----------------------------------------------------------------------------
 Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
     0| 3.67e+19  5.42e+19  1.00e+00 -2.45e+19  1.28e+20  1.04e+20  9.84e-03 
    40| 5.85e-10  1.47e-09  7.31e-10  1.71e+00  1.71e+00  8.09e-17  1.29e-02 
----------------------------------------------------------------------------
Status: Solved
Timing: Solve time: 1.31e-02s
 Lin-sys: nnz in L factor: 368, avg solve time: 2.56e-06s
 Cones: avg projection time: 3.03e-05s
 Acceleration: avg step time: 2.45e-05s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 4.4409e-16, dist(y, K*) = 1.5216e-09, s'y/|s||y| = 4.2866e-12
primal res: |Ax + s - b|_2 / (1 + |b|_2) = 5.8496e-10
dual res:   |A'y + c|_2 / (1 + |c|_2) = 1.4729e-09
rel gap:    |c'x + b'y| / (1 + |c'x| + |b'y|) = 7.3074e-10
----------------------------------------------------------------------------
c'x = 1.7145, -b'y = 1.7145
============================================================================
Status: optimal
Objective: 1.714544261472336
          0         1         2         3         4         5         6  \
0  0.300457 -0.158889  0.080241 -0.143750  0.072844 -0.032968  0.077836   
1 -0.158889  0.399624 -0.084196  0.109056  0.082858 -0.045462 -0.124045   
2  0.080241 -0.084196  0.198446 -0.031902 -0.081455  0.098212  0.243131   
3 -0.143750  0.109056 -0.031902  0.386109 -0.058051  0.060246  0.082420   
4  0.072844  0.082858 -0.081455 -0.058051  0.135981 -0.041927 -0.119530   
5 -0.032968 -0.045462  0.098212  0.060246 -0.041927  0.400641  0.051103   
6  0.077836 -0.124045  0.243131  0.082420 -0.119530  0.051103  0.543407   
7  0.049272 -0.132096  0.120404  0.125786 -0.054881  0.007308  0.121709   

          7  
0  0.049272  
1 -0.132096  
2  0.120404  
3  0.125786  
4 -0.054881  
5  0.007308  
6  0.121709  
7  0.481395  

As a sanity check we can confirm that the eigenvalues of the solution matrix are non-negative:


w,v = np.linalg.eig(Sigma.value)
print(w)

[9.46355900e-01 6.34465779e-01 2.35993549e-10 5.30366506e-02
 1.69999646e-01 2.29670882e-01 4.36623248e-01 3.75907704e-01]


Practice: propagation of NAs and pairwise covariances


I don't think this is a practical way of dealing with missing values. First of all missing values in the original data will propagate in the covariance matrix. A single NA in the data leads to lots of NAs in the covariance matrix.


----     28 PARAMETER cov  effect of a single NA in the data

            j1          j2          j3          j4          j5          j6          j7          j8

j1    0.300457   -0.158889          NA   -0.143750    0.072844   -0.032968    0.077836    0.049272
j2   -0.158889    0.399624          NA    0.109056    0.082858   -0.045462   -0.124045   -0.132096
j3          NA          NA          NA          NA          NA          NA          NA          NA
j4   -0.143750    0.109056          NA    0.386109   -0.058051    0.060246    0.082420    0.125786
j5    0.072844    0.082858          NA   -0.058051    0.354627   -0.129507   -0.119530   -0.054881
j6   -0.032968   -0.045462          NA    0.060246   -0.129507    0.400641    0.051103    0.007308
j7    0.077836   -0.124045          NA    0.082420   -0.119530    0.051103    0.543407    0.121709
j8    0.049272   -0.132096          NA    0.125786   -0.054881    0.007308    0.121709    0.481395


This propagation is the result of the fact that any operation on NA yields NA. I.e.\[ \mathbf{NA}+ x=\mathbf{NA}\cdot x = \mathbf{NA}\] When applying the standard formula for the covariance: \[cov_{j,k} = \frac{1}{N-1} \sum_i (x_{i,j}-\mu_j)(x_{i,k}-\mu_k) \] we see this propagation in action. The above results were obtained using GAMS using this exact formula. This problem is of course difficult to fix after the fact in the covariance matrix. Just too much damage has been done.

A second problem with our SDP model is that we are not staying close to reasonable values for missing correlations. The model only looks at the PSD constraint.

Basically, the answer is that we need to look more carefully at the original data.

A simple remedy is just to throw away the record with the NA. If you have lots of data and relatively few NAs in the data, this is a reasonable approach. However there is a trick we can use. Instead of throwing away a whole row of observations in case we have an NA, we inspect pairs of columns \((j,k)\) individually. For the two columns \(j\) and \(k\) throw away the observations with NAs in these columns and then calculate the covariance \(cov_{j,k}\). Repeat for all combinations \((j,k)\) with \(j \lt k\). This means the number of usable observations is varying for different columbinations \((j,k)\). R has this pairwise covariance calculation built-in.


> cov(a)
              [,1]         [,2]         [,3]        [,4]          [,5]          [,6]         [,7]          [,8]
[1,]  0.3269524261 -0.022335922  0.024062915 0.026460677 -0.0003735916  0.0021383397 0.0544727640 -0.0008417817
[2,] -0.0223359222  0.313444259  0.036135413 0.027115454 -0.0045955942  0.0286659334 0.0558610843  0.0222590384
[3,]  0.0240629148  0.036135413  0.308443182 0.003663338  0.0014232064 -0.0158431246 0.0308769925 -0.0177244600
[4,]  0.0264606771  0.027115454  0.003663338 0.322801448  0.0057221934  0.0175051722 0.0152804438  0.0034349411
[5,] -0.0003735916 -0.004595594  0.001423206 0.005722193  0.2920368646 -0.0069213567 0.0227153919  0.0163823701
[6,]  0.0021383397  0.028665933 -0.015843125 0.017505172 -0.0069213567  0.3095935603 0.0009359271  0.0506571760
[7,]  0.0544727640  0.055861084  0.030876993 0.015280444  0.0227153919  0.0009359271 0.3635080311  0.0322080200
[8,] -0.0008417817  0.022259038 -0.017724460 0.003434941  0.0163823701  0.0506571760 0.0322080200  0.2992700098
> a[2,3]=NA
> cov(a)
              [,1]         [,2] [,3]        [,4]          [,5]          [,6]         [,7]          [,8]
[1,]  0.3269524261 -0.022335922   NA 0.026460677 -0.0003735916  0.0021383397 0.0544727640 -0.0008417817
[2,] -0.0223359222  0.313444259   NA 0.027115454 -0.0045955942  0.0286659334 0.0558610843  0.0222590384
[3,]            NA           NA   NA          NA            NA            NA           NA            NA
[4,]  0.0264606771  0.027115454   NA 0.322801448  0.0057221934  0.0175051722 0.0152804438  0.0034349411
[5,] -0.0003735916 -0.004595594   NA 0.005722193  0.2920368646 -0.0069213567 0.0227153919  0.0163823701
[6,]  0.0021383397  0.028665933   NA 0.017505172 -0.0069213567  0.3095935603 0.0009359271  0.0506571760
[7,]  0.0544727640  0.055861084   NA 0.015280444  0.0227153919  0.0009359271 0.3635080311  0.0322080200
[8,] -0.0008417817  0.022259038   NA 0.003434941  0.0163823701  0.0506571760 0.0322080200  0.2992700098
> cov(a,use="pairwise")
              [,1]         [,2]         [,3]        [,4]          [,5]          [,6]         [,7]          [,8]
[1,]  0.3269524261 -0.022335922  0.024077969 0.026460677 -0.0003735916  0.0021383397 0.0544727640 -0.0008417817
[2,] -0.0223359222  0.313444259  0.036895996 0.027115454 -0.0045955942  0.0286659334 0.0558610843  0.0222590384
[3,]  0.0240779693  0.036895996  0.311573754 0.003377392  0.0013694087 -0.0162231609 0.0310202082 -0.0180617800
[4,]  0.0264606771  0.027115454  0.003377392 0.322801448  0.0057221934  0.0175051722 0.0152804438  0.0034349411
[5,] -0.0003735916 -0.004595594  0.001369409 0.005722193  0.2920368646 -0.0069213567 0.0227153919  0.0163823701
[6,]  0.0021383397  0.028665933 -0.016223161 0.017505172 -0.0069213567  0.3095935603 0.0009359271  0.0506571760
[7,]  0.0544727640  0.055861084  0.031020208 0.015280444  0.0227153919  0.0009359271 0.3635080311  0.0322080200
[8,] -0.0008417817  0.022259038 -0.018061780 0.003434941  0.0163823701  0.0506571760 0.0322080200  0.2992700098
> 


The disadvantage with pairwise covariances is that it is possible (even theoretically) that the final covariance matrix is not positive-semi definite. We can repair this with R's nearPD function. The next section discusses some strategies we can use to achieve the same as this function.

Finding a nearby positive definite matrix


An algorithm to find a positive definite matrix \(A\) that is close to the original matrix \(A_0\) is as follows:


  1. Form an eigen decomposition of \(A_0\) using something like \[(\lambda,V) := \mathbf{eig}(A_0)\]
  2. Replace negative eigenvalues: \[\lambda_i := \max(\lambda_i, 10^{-6})\]
  3. Reassemble the matrix: \[A := V \cdot \mathbf{diag}(\lambda) \cdot V^T\]

We can also formulate this as a Semi-Definite Program:

Closest Positive Semi-Definite Matrix
\[\begin{align} \min\>& ||\color{darkred}A - \color{darkblue}A_0|| \\ & \color{darkred}A \succeq 0 \end{align} \]

The eigen decomposition approach will not find the optimal closest matrix (in terms of squared deviations). Let's compare the results of these two approaches on a small problem. Here is the Python code:


import numpy as np
import pandas as pd
import cvxpy as cp

A0 = [[ 0.130457, -0.158889,  0.080241, -0.143750,  0.072844, -0.032968,  0.077836,  0.049272],
      [-0.158889,  0.229624, -0.219293,  0.109056,  0.082858, -0.045462, -0.124045, -0.132096],
      [ 0.080241, -0.219293,  0.308231, -0.031902, -0.081455,  0.098212,  0.243131,  0.120404],
      [-0.143750,  0.109056, -0.031902,  0.216109, -0.058051,  0.060246,  0.082420,  0.125786],
      [ 0.072844,  0.082858, -0.081455, -0.058051,  0.184627, -0.129507, -0.119530, -0.054881],
      [-0.032968, -0.045462,  0.098212,  0.060246, -0.129507,  0.230641,  0.051103,  0.007308],
      [ 0.077836, -0.124045,  0.243131,  0.082420, -0.119530,  0.051103,  0.373407,  0.121709],
      [ 0.049272, -0.132096,  0.120404,  0.125786, -0.054881,  0.007308,  0.121709,  0.311395]]


#--------------- eigenvalue approach ------------------

# eigen decomposition
u,v= np.linalg.eig(A0)
print("Eigenvalues:")
print(np.sort(u))

# replace negative eigenvalues
u = np.maximum(1.0e-6,u)

# reassemble matrix
A = v @ np.diag(u) @ v.T
print("*** Sum of squared differences:",np.sum((A-A0)**2))

#--------------- SDP model ------------------

dim = np.shape(A0)
A = cp.Variable(dim,symmetric=True)

prob = cp.Problem(cp.Minimize(cp.norm(A-A0,"fro")**2),
                  [A >> 0])
SSE = prob.solve(solver=cp.SCS, verbose=True)
print("*** Sum of squared differences:",SSE)


I used as objective \[\min\> ||A-A_0||_F^2\] which is just the sum of squares. Obviously, I could have used the cp.sum_squares function instead. The difference between the two methods, in terms of the sum of squares, is very small:


Eigenvalues:
[-0.06620331 -0.022672    0.06166647  0.0944498   0.21259723  0.30736467
  0.49401996  0.90326818]
*** Sum of squared differences: 0.004897075316543839
----------------------------------------------------------------------------
 SCS v2.1.1 - Splitting Conic Solver
 (c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 104
eps = 1.00e-04, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback = 10, rho_x = 1.00e-03
Variables n = 38, constraints m = 104
Cones: soc vars: 68, soc blks: 2
 sd vars: 36, sd blks: 1
Setup time: 8.19e-03s
----------------------------------------------------------------------------
 Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
     0| 4.97e+19  4.58e+19  1.00e+00 -3.53e+19  2.93e+19  6.25e+19  1.27e-02 
    20| 1.92e-10  3.09e-10  6.99e-10  4.90e-03  4.90e-03  6.36e-17  1.41e-02 
----------------------------------------------------------------------------
Status: Solved
Timing: Solve time: 1.42e-02s
 Lin-sys: nnz in L factor: 246, avg solve time: 1.96e-06s
 Cones: avg projection time: 4.11e-05s
 Acceleration: avg step time: 2.01e-05s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 9.6655e-10, dist(y, K*) = 1.9088e-09, s'y/|s||y| = -1.7574e-12
primal res: |Ax + s - b|_2 / (1 + |b|_2) = 1.9164e-10
dual res:   |A'y + c|_2 / (1 + |c|_2) = 3.0911e-10
rel gap:    |c'x + b'y| / (1 + |c'x| + |b'y|) = 6.9872e-10
----------------------------------------------------------------------------
c'x = 0.0049, -b'y = 0.0049
============================================================================
*** Sum of squared differences: 0.004896897751827667


Conclusion


The model presented in [1] is interesting: it is not quite obvious how to implement it in matrix notation for use in CVXPY (and the code shown below the example in [1] is not directly related). However, it should be mentioned that better methods are available to address the underlying problem: how to handle missing values in a covariance matrix. Pairwise construction of the covariance matrix is an easy fix. In case a non-PSD covariance matrix is created, we can repair this is different ways.

References


  1. Semidefinite program, https://www.cvxpy.org/examples/basic/sdp.html
  2. Missing Data Functionality in NumPy, https://docs.scipy.org/doc/numpy-1.10.1/neps/missing-data.html
  3. Covariance matrix not positive definite in portfolio models, https://yetanothermathprogrammingconsultant.blogspot.com/2018/04/covariance-matrix-not-positive-definite.html
  4. Higham, Nick (2002) Computing the nearest correlation matrix - a problem from finance; IMA Journal of Numerical Analysis 22, 329–343