Yet Another Math Programming Consultant: November 2019

Wednesday, November 27, 2019

Gurobi v9.0.

Now including elevator music!

Major new features:

Non-convex quadratic solver

Supports non-convexities both in objective and constraints
Not quite sure if MIQCP is supported (I assume it is, but I think this was not mentioned explicitly)
Cplex had already support for (some) non-convex quadratic models, so Gurobi is catching up here.

Performance improvements

MIP: 18% faster overall and 26% on difficult models (more than 100 seconds)
MIQP: 24% faster

References

What's New (for Gurobi), https://www.gurobi.com/products/gurobi-optimizer/whats-new-current-release/
2019-11 - INFORMS Seattle - Whats new in CPLEX 12.10, https://ibm.ent.box.com/s/81yeflq972r6gmhckobl403pwwpuih4a

Friday, November 22, 2019

SDP Model: imputing a covariance matrix

Missing values

Missing values are a well-known problem in statistics. The simplest approach is just to delete all data cases that have missing values. Another approach is to repair things by filling in reasonable values. This is called imputation. Imputation strategies can be very sophisticated (and complex).

Statistical tools have often direct support for representing missing values. E.g. the R language has NA (not available). The modeling language GAMS also has NA. Python, however, has no explicit support for missing values [2]. By convention, the special floating point value NaN (Not a Number) is used to indicate missing values for floating point numbers. One real disadvantage is that NaN is not available for other data types than floating point numbers. Furthermore, the object None can be used to indicate missing values. As None is an object, that approach often yields a performance penalty. It is noted that the numpy library has some facilities to deal with missing data through masks (this is really different from NA in R and GAMS).

Imputing missing values in a covariance matrix

In [1] a semi-definite programming (SDP) model is proposed to deal with a covariance matrix with some missing values by imputation. The constraint to be added is that the covariance matrix should remain positive-semi definite (PSD). A covariance matrix should be in theory PSD, but in practice it can happen it is not. The resulting model is stated as:

SDP Model: Imputation of Covariance Matrix
\[\begin{align} \text{minimize}\>& 0\\ \text{subject to}\>&\color{darkred}{\Sigma}_{i,j} = \widetilde{\color{darkblue}{\Sigma}}_{i,j} && (i,j)\notin \color{darkblue}M\\ & \color{darkred}{\Sigma} \succeq 0 \end{align} \]

The notation is taken from [1]. Here $\widetilde{\Sigma}$ is the (original) covariance matrix with missing data in locations $(i,j)\in M$. The variable $\Sigma$ is the new covariance matrix with missing data filled in such that $\Sigma$ is positive semi-definite. This last condition is denoted by $\Sigma \succeq 0$. In this model there is no objective, as indicated by minimizing zero.

CVXPY implementation

There is no code provided for this model in [1]. So let me give it a try. CVXPY does not have good support for things like $\forall (i,j) \notin M$. I can see two approaches:

Expand the constraint into scalar form. Essentially, a DIY approach.
Use a binary data matrix $M_{i,j} \in \{0,1\}$ indicating the missing values and write \[(e\cdot e^T-M) \circ \Sigma = \widetilde{\Sigma}_0\] where $\circ$ is elementwise multiplication (a.k.a. Hadamard product), $e$ is a column vector of ones of appropriate size, and $\widetilde{\Sigma}_0$ is $\widetilde{\Sigma}$ but with NaN's replaced by zeros.

In addition, let's add a regularizing objective: minimize sum of squares of $\Sigma_{i,j}$.

The Python code for these two models is:

import numpy as np
import pandas as pd 
import cvxpy as cp

#------------ data ----------------

cov = np.array([
 [ 0.300457, -0.158889,  0.080241, -0.143750,  0.072844, -0.032968,  0.077836,  0.049272],
 [-0.158889,  0.399624,  np.nan,    0.109056,  0.082858, -0.045462, -0.124045, -0.132096],
 [ 0.080241,  np.nan,    np.nan,   -0.031902, -0.081455,  0.098212,  0.243131,  0.120404],
 [-0.143750,  0.109056, -0.031902,  0.386109, -0.058051,  0.060246,  0.082420,  0.125786],
 [ 0.072844,  0.082858, -0.081455, -0.058051,  np.nan,    np.nan,   -0.119530, -0.054881],
 [-0.032968, -0.045462,  0.098212,  0.060246,  np.nan,    0.400641,  0.051103,  0.007308],
 [ 0.077836, -0.124045,  0.243131,  0.082420, -0.119530,  0.051103,  0.543407,  0.121709],
 [ 0.049272, -0.132096,  0.120404,  0.125786, -0.054881,  0.007308,  0.121709,  0.481395]     
])
print("Covariance data with NaNs")
print(pd.DataFrame(cov))

M = 1*np.isnan(cov)
print("M (indicator for missing values)")
print(pd.DataFrame(M))

dim = np.shape(cov)
n = dim[0]

#----------- model 1 -----------------

Sigma = cp.Variable(dim, symmetric=True)

prob = cp.Problem(
    cp.Minimize(cp.sum_squares(Sigma)),
    [ Sigma[i,j] == cov[i,j] for i in range(n) for j in range(n) if M[i,j]==0 ] +
     [ Sigma >> 0 ]
    )
prob.solve(solver=cp.SCS,verbose=True)

print("Status:",prob.status)
print("Objective:",prob.value)
print(pd.DataFrame(Sigma.value))

#----------- model 2 -----------------

e = np.ones((n,1))
cov0 =  np.nan_to_num(cov,copy=True)

prob2 = cp.Problem(
#    cp.Minimize(cp.trace(Sigma.T@Sigma)),  <--- not recognized as convex
    cp.Minimize(cp.norm(Sigma,"fro")**2),
    [ cp.multiply(e@e.T - M,Sigma) == cov0,
      Sigma >> 0 ]
    )
prob2.solve(solver=cp.SCS,verbose=True)

print("Status:",prob2.status)
print("Objective:",prob2.value)
print(pd.DataFrame(Sigma.value))

Notes:

Model 1 has a (long) list of scalar constraints. The objective is \[\min\>\sum_{i,j} \Sigma_{i,j}^2\] Sorry for the possible confusion between the symbols for summation and covariance.
CVXPY uses the notation Sigma >> 0 to indicate $\Sigma \succeq 0$ (i.e. $\Sigma$ should be positive semi-definite).
We added the condition that $\Sigma$ should be symmetric in the variable statement. This seems to be needed. Without this, the solver may return a non-symmetric matrix. I suspect that in that case, the matrix $0.5(\Sigma+\Sigma^T)$ rather than $\Sigma$ itself is required to be positive definite.
Model 2 is an attempt to use matrix notation. The objective can be stated as \[\min\>\mathbf{tr}(\Sigma^T\Sigma)\] but that is not recognized as being convex. As alternative I used the Frobenius norm: \[||A||_F =\sqrt{ \sum_{i,j} a_{i,j}^2}\]
The function np.nan_to_num converts NaN values to zeros.
The function cp.multiply performs elementwise multiplication (as opposed to matrix multiplication).
I don't think we can easily only pass only the upper triangular part of the covariance matrix to the solver. For large problems this would save some effort (cpu time and memory).
In a traditional optimization model we would have just $|M|$ decision variables (corresponding to the missing values). Here, in the scalar model, we have $n^2$ variables and $n^2-|M|$ constraints.

The results are:

Covariance data with NaNs
          0         1         2         3         4         5         6  \
0  0.300457 -0.158889  0.080241 -0.143750  0.072844 -0.032968  0.077836   
1 -0.158889  0.399624       NaN  0.109056  0.082858 -0.045462 -0.124045   
2  0.080241       NaN       NaN -0.031902 -0.081455  0.098212  0.243131   
3 -0.143750  0.109056 -0.031902  0.386109 -0.058051  0.060246  0.082420   
4  0.072844  0.082858 -0.081455 -0.058051       NaN       NaN -0.119530   
5 -0.032968 -0.045462  0.098212  0.060246       NaN  0.400641  0.051103   
6  0.077836 -0.124045  0.243131  0.082420 -0.119530  0.051103  0.543407   
7  0.049272 -0.132096  0.120404  0.125786 -0.054881  0.007308  0.121709   

          7  
0  0.049272  
1 -0.132096  
2  0.120404  
3  0.125786  
4 -0.054881  
5  0.007308  
6  0.121709  
7  0.481395  
M (indicator for missing values)
   0  1  2  3  4  5  6  7
0  0  0  0  0  0  0  0  0
1  0  0  1  0  0  0  0  0
2  0  1  1  0  0  0  0  0
3  0  0  0  0  0  0  0  0
4  0  0  0  0  1  1  0  0
5  0  0  0  0  1  0  0  0
6  0  0  0  0  0  0  0  0
7  0  0  0  0  0  0  0  0
----------------------------------------------------------------------------
 SCS v2.1.1 - Splitting Conic Solver
 (c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 160
eps = 1.00e-04, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback = 10, rho_x = 1.00e-03
Variables n = 37, constraints m = 160
Cones: primal zero / dual free vars: 58
 soc vars: 66, soc blks: 1
 sd vars: 36, sd blks: 1
Setup time: 9.69e-03s
----------------------------------------------------------------------------
 Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
     0| 4.05e+19  7.57e+19  1.00e+00 -3.12e+19  1.92e+20  1.21e+20  1.53e-02 
    40| 2.74e-10  1.01e-09  4.51e-11  1.71e+00  1.71e+00  1.96e-17  1.88e-02 
----------------------------------------------------------------------------
Status: Solved
Timing: Solve time: 1.89e-02s
 Lin-sys: nnz in L factor: 357, avg solve time: 1.54e-06s
 Cones: avg projection time: 1.52e-04s
 Acceleration: avg step time: 1.66e-05s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 4.7898e-17, dist(y, K*) = 1.5753e-09, s'y/|s||y| = 3.7338e-12
primal res: |Ax + s - b|_2 / (1 + |b|_2) = 2.7439e-10
dual res:   |A'y + c|_2 / (1 + |c|_2) = 1.0103e-09
rel gap:    |c'x + b'y| / (1 + |c'x| + |b'y|) = 4.5078e-11
----------------------------------------------------------------------------
c'x = 1.7145, -b'y = 1.7145
============================================================================
Status: optimal
Objective: 1.714544257213233
          0         1         2         3         4         5         6  \
0  0.300457 -0.158889  0.080241 -0.143750  0.072844 -0.032968  0.077836   
1 -0.158889  0.399624 -0.084196  0.109056  0.082858 -0.045462 -0.124045   
2  0.080241 -0.084196  0.198446 -0.031902 -0.081455  0.098212  0.243131   
3 -0.143750  0.109056 -0.031902  0.386109 -0.058051  0.060246  0.082420   
4  0.072844  0.082858 -0.081455 -0.058051  0.135981 -0.041927 -0.119530   
5 -0.032968 -0.045462  0.098212  0.060246 -0.041927  0.400641  0.051103   
6  0.077836 -0.124045  0.243131  0.082420 -0.119530  0.051103  0.543407   
7  0.049272 -0.132096  0.120404  0.125786 -0.054881  0.007308  0.121709   

          7  
0  0.049272  
1 -0.132096  
2  0.120404  
3  0.125786  
4 -0.054881  
5  0.007308  
6  0.121709  
7  0.481395  
----------------------------------------------------------------------------
 SCS v2.1.1 - Splitting Conic Solver
 (c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 162
eps = 1.00e-04, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback = 10, rho_x = 1.00e-03
Variables n = 38, constraints m = 168
Cones: primal zero / dual free vars: 64
 soc vars: 68, soc blks: 2
 sd vars: 36, sd blks: 1
Setup time: 1.02e-02s
----------------------------------------------------------------------------
 Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
     0| 3.67e+19  5.42e+19  1.00e+00 -2.45e+19  1.28e+20  1.04e+20  9.84e-03 
    40| 5.85e-10  1.47e-09  7.31e-10  1.71e+00  1.71e+00  8.09e-17  1.29e-02 
----------------------------------------------------------------------------
Status: Solved
Timing: Solve time: 1.31e-02s
 Lin-sys: nnz in L factor: 368, avg solve time: 2.56e-06s
 Cones: avg projection time: 3.03e-05s
 Acceleration: avg step time: 2.45e-05s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 4.4409e-16, dist(y, K*) = 1.5216e-09, s'y/|s||y| = 4.2866e-12
primal res: |Ax + s - b|_2 / (1 + |b|_2) = 5.8496e-10
dual res:   |A'y + c|_2 / (1 + |c|_2) = 1.4729e-09
rel gap:    |c'x + b'y| / (1 + |c'x| + |b'y|) = 7.3074e-10
----------------------------------------------------------------------------
c'x = 1.7145, -b'y = 1.7145
============================================================================
Status: optimal
Objective: 1.714544261472336
          0         1         2         3         4         5         6  \
0  0.300457 -0.158889  0.080241 -0.143750  0.072844 -0.032968  0.077836   
1 -0.158889  0.399624 -0.084196  0.109056  0.082858 -0.045462 -0.124045   
2  0.080241 -0.084196  0.198446 -0.031902 -0.081455  0.098212  0.243131   
3 -0.143750  0.109056 -0.031902  0.386109 -0.058051  0.060246  0.082420   
4  0.072844  0.082858 -0.081455 -0.058051  0.135981 -0.041927 -0.119530   
5 -0.032968 -0.045462  0.098212  0.060246 -0.041927  0.400641  0.051103   
6  0.077836 -0.124045  0.243131  0.082420 -0.119530  0.051103  0.543407   
7  0.049272 -0.132096  0.120404  0.125786 -0.054881  0.007308  0.121709   

          7  
0  0.049272  
1 -0.132096  
2  0.120404  
3  0.125786  
4 -0.054881  
5  0.007308  
6  0.121709  
7  0.481395

As a sanity check we can confirm that the eigenvalues of the solution matrix are non-negative:

w,v = np.linalg.eig(Sigma.value)
print(w)

[9.46355900e-01 6.34465779e-01 2.35993549e-10 5.30366506e-02
 1.69999646e-01 2.29670882e-01 4.36623248e-01 3.75907704e-01]

Practice: propagation of NAs and pairwise covariances

I don't think this is a practical way of dealing with missing values. First of all missing values in the original data will propagate in the covariance matrix. A single NA in the data leads to lots of NAs in the covariance matrix.

----     28 PARAMETER cov  effect of a single NA in the data

            j1          j2          j3          j4          j5          j6          j7          j8

j1    0.300457   -0.158889          NA   -0.143750    0.072844   -0.032968    0.077836    0.049272
j2   -0.158889    0.399624          NA    0.109056    0.082858   -0.045462   -0.124045   -0.132096
j3          NA          NA          NA          NA          NA          NA          NA          NA
j4   -0.143750    0.109056          NA    0.386109   -0.058051    0.060246    0.082420    0.125786
j5    0.072844    0.082858          NA   -0.058051    0.354627   -0.129507   -0.119530   -0.054881
j6   -0.032968   -0.045462          NA    0.060246   -0.129507    0.400641    0.051103    0.007308
j7    0.077836   -0.124045          NA    0.082420   -0.119530    0.051103    0.543407    0.121709
j8    0.049272   -0.132096          NA    0.125786   -0.054881    0.007308    0.121709    0.481395

This propagation is the result of the fact that any operation on NA yields NA. I.e.\[ \mathbf{NA}+ x=\mathbf{NA}\cdot x = \mathbf{NA}\] When applying the standard formula for the covariance: \[cov_{j,k} = \frac{1}{N-1} \sum_i (x_{i,j}-\mu_j)(x_{i,k}-\mu_k) \] we see this propagation in action. The above results were obtained using GAMS using this exact formula. This problem is of course difficult to fix after the fact in the covariance matrix. Just too much damage has been done.

A second problem with our SDP model is that we are not staying close to reasonable values for missing correlations. The model only looks at the PSD constraint.

Basically, the answer is that we need to look more carefully at the original data.

A simple remedy is just to throw away the record with the NA. If you have lots of data and relatively few NAs in the data, this is a reasonable approach. However there is a trick we can use. Instead of throwing away a whole row of observations in case we have an NA, we inspect pairs of columns $(j,k)$ individually. For the two columns $j$ and $k$ throw away the observations with NAs in these columns and then calculate the covariance $cov_{j,k}$. Repeat for all combinations $(j,k)$ with $j \lt k$. This means the number of usable observations is varying for different columbinations $(j,k)$. R has this pairwise covariance calculation built-in.

> cov(a)
              [,1]         [,2]         [,3]        [,4]          [,5]          [,6]         [,7]          [,8]
[1,]  0.3269524261 -0.022335922  0.024062915 0.026460677 -0.0003735916  0.0021383397 0.0544727640 -0.0008417817
[2,] -0.0223359222  0.313444259  0.036135413 0.027115454 -0.0045955942  0.0286659334 0.0558610843  0.0222590384
[3,]  0.0240629148  0.036135413  0.308443182 0.003663338  0.0014232064 -0.0158431246 0.0308769925 -0.0177244600
[4,]  0.0264606771  0.027115454  0.003663338 0.322801448  0.0057221934  0.0175051722 0.0152804438  0.0034349411
[5,] -0.0003735916 -0.004595594  0.001423206 0.005722193  0.2920368646 -0.0069213567 0.0227153919  0.0163823701
[6,]  0.0021383397  0.028665933 -0.015843125 0.017505172 -0.0069213567  0.3095935603 0.0009359271  0.0506571760
[7,]  0.0544727640  0.055861084  0.030876993 0.015280444  0.0227153919  0.0009359271 0.3635080311  0.0322080200
[8,] -0.0008417817  0.022259038 -0.017724460 0.003434941  0.0163823701  0.0506571760 0.0322080200  0.2992700098
> a[2,3]=NA
> cov(a)
              [,1]         [,2] [,3]        [,4]          [,5]          [,6]         [,7]          [,8]
[1,]  0.3269524261 -0.022335922   NA 0.026460677 -0.0003735916  0.0021383397 0.0544727640 -0.0008417817
[2,] -0.0223359222  0.313444259   NA 0.027115454 -0.0045955942  0.0286659334 0.0558610843  0.0222590384
[3,]            NA           NA   NA          NA            NA            NA           NA            NA
[4,]  0.0264606771  0.027115454   NA 0.322801448  0.0057221934  0.0175051722 0.0152804438  0.0034349411
[5,] -0.0003735916 -0.004595594   NA 0.005722193  0.2920368646 -0.0069213567 0.0227153919  0.0163823701
[6,]  0.0021383397  0.028665933   NA 0.017505172 -0.0069213567  0.3095935603 0.0009359271  0.0506571760
[7,]  0.0544727640  0.055861084   NA 0.015280444  0.0227153919  0.0009359271 0.3635080311  0.0322080200
[8,] -0.0008417817  0.022259038   NA 0.003434941  0.0163823701  0.0506571760 0.0322080200  0.2992700098
> cov(a,use="pairwise")
              [,1]         [,2]         [,3]        [,4]          [,5]          [,6]         [,7]          [,8]
[1,]  0.3269524261 -0.022335922  0.024077969 0.026460677 -0.0003735916  0.0021383397 0.0544727640 -0.0008417817
[2,] -0.0223359222  0.313444259  0.036895996 0.027115454 -0.0045955942  0.0286659334 0.0558610843  0.0222590384
[3,]  0.0240779693  0.036895996  0.311573754 0.003377392  0.0013694087 -0.0162231609 0.0310202082 -0.0180617800
[4,]  0.0264606771  0.027115454  0.003377392 0.322801448  0.0057221934  0.0175051722 0.0152804438  0.0034349411
[5,] -0.0003735916 -0.004595594  0.001369409 0.005722193  0.2920368646 -0.0069213567 0.0227153919  0.0163823701
[6,]  0.0021383397  0.028665933 -0.016223161 0.017505172 -0.0069213567  0.3095935603 0.0009359271  0.0506571760
[7,]  0.0544727640  0.055861084  0.031020208 0.015280444  0.0227153919  0.0009359271 0.3635080311  0.0322080200
[8,] -0.0008417817  0.022259038 -0.018061780 0.003434941  0.0163823701  0.0506571760 0.0322080200  0.2992700098
>

The disadvantage with pairwise covariances is that it is possible (even theoretically) that the final covariance matrix is not positive-semi definite. We can repair this with R's nearPD function. The next section discusses some strategies we can use to achieve the same as this function.

Finding a nearby positive definite matrix

An algorithm to find a positive definite matrix $A$ that is close to the original matrix $A_0$ is as follows:

Form an eigen decomposition of $A_0$ using something like \[(\lambda,V) := \mathbf{eig}(A_0)\]
Replace negative eigenvalues: \[\lambda_i := \max(\lambda_i, 10^{-6})\]
Reassemble the matrix: \[A := V \cdot \mathbf{diag}(\lambda) \cdot V^T\]

We can also formulate this as a Semi-Definite Program:

Closest Positive Semi-Definite Matrix
\[\begin{align} \min\>& \|\|\color{darkred}A - \color{darkblue}A_0\|\| \\ & \color{darkred}A \succeq 0 \end{align} \]

The eigen decomposition approach will not find the optimal closest matrix (in terms of squared deviations). Let's compare the results of these two approaches on a small problem. Here is the Python code:

import numpy as np
import pandas as pd
import cvxpy as cp

A0 = [[ 0.130457, -0.158889,  0.080241, -0.143750,  0.072844, -0.032968,  0.077836,  0.049272],
      [-0.158889,  0.229624, -0.219293,  0.109056,  0.082858, -0.045462, -0.124045, -0.132096],
      [ 0.080241, -0.219293,  0.308231, -0.031902, -0.081455,  0.098212,  0.243131,  0.120404],
      [-0.143750,  0.109056, -0.031902,  0.216109, -0.058051,  0.060246,  0.082420,  0.125786],
      [ 0.072844,  0.082858, -0.081455, -0.058051,  0.184627, -0.129507, -0.119530, -0.054881],
      [-0.032968, -0.045462,  0.098212,  0.060246, -0.129507,  0.230641,  0.051103,  0.007308],
      [ 0.077836, -0.124045,  0.243131,  0.082420, -0.119530,  0.051103,  0.373407,  0.121709],
      [ 0.049272, -0.132096,  0.120404,  0.125786, -0.054881,  0.007308,  0.121709,  0.311395]]


#--------------- eigenvalue approach ------------------

# eigen decomposition
u,v= np.linalg.eig(A0)
print("Eigenvalues:")
print(np.sort(u))

# replace negative eigenvalues
u = np.maximum(1.0e-6,u)

# reassemble matrix
A = v @ np.diag(u) @ v.T
print("*** Sum of squared differences:",np.sum((A-A0)**2))

#--------------- SDP model ------------------

dim = np.shape(A0)
A = cp.Variable(dim,symmetric=True)

prob = cp.Problem(cp.Minimize(cp.norm(A-A0,"fro")**2),
                  [A >> 0])
SSE = prob.solve(solver=cp.SCS, verbose=True)
print("*** Sum of squared differences:",SSE)

I used as objective \[\min\> ||A-A_0||_F^2\] which is just the sum of squares. Obviously, I could have used the cp.sum_squares function instead. The difference between the two methods, in terms of the sum of squares, is very small:

Eigenvalues:
[-0.06620331 -0.022672    0.06166647  0.0944498   0.21259723  0.30736467
  0.49401996  0.90326818]
*** Sum of squared differences: 0.004897075316543839
----------------------------------------------------------------------------
 SCS v2.1.1 - Splitting Conic Solver
 (c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 104
eps = 1.00e-04, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback = 10, rho_x = 1.00e-03
Variables n = 38, constraints m = 104
Cones: soc vars: 68, soc blks: 2
 sd vars: 36, sd blks: 1
Setup time: 8.19e-03s
----------------------------------------------------------------------------
 Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
     0| 4.97e+19  4.58e+19  1.00e+00 -3.53e+19  2.93e+19  6.25e+19  1.27e-02 
    20| 1.92e-10  3.09e-10  6.99e-10  4.90e-03  4.90e-03  6.36e-17  1.41e-02 
----------------------------------------------------------------------------
Status: Solved
Timing: Solve time: 1.42e-02s
 Lin-sys: nnz in L factor: 246, avg solve time: 1.96e-06s
 Cones: avg projection time: 4.11e-05s
 Acceleration: avg step time: 2.01e-05s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 9.6655e-10, dist(y, K*) = 1.9088e-09, s'y/|s||y| = -1.7574e-12
primal res: |Ax + s - b|_2 / (1 + |b|_2) = 1.9164e-10
dual res:   |A'y + c|_2 / (1 + |c|_2) = 3.0911e-10
rel gap:    |c'x + b'y| / (1 + |c'x| + |b'y|) = 6.9872e-10
----------------------------------------------------------------------------
c'x = 0.0049, -b'y = 0.0049
============================================================================
*** Sum of squared differences: 0.004896897751827667

Conclusion

The model presented in [1] is interesting: it is not quite obvious how to implement it in matrix notation for use in CVXPY (and the code shown below the example in [1] is not directly related). However, it should be mentioned that better methods are available to address the underlying problem: how to handle missing values in a covariance matrix. Pairwise construction of the covariance matrix is an easy fix. In case a non-PSD covariance matrix is created, we can repair this is different ways.

References

Semidefinite program, https://www.cvxpy.org/examples/basic/sdp.html
Missing Data Functionality in NumPy, https://docs.scipy.org/doc/numpy-1.10.1/neps/missing-data.html
Covariance matrix not positive definite in portfolio models, https://yetanothermathprogrammingconsultant.blogspot.com/2018/04/covariance-matrix-not-positive-definite.html
Higham, Nick (2002) Computing the nearest correlation matrix - a problem from finance; IMA Journal of Numerical Analysis 22, 329–343

Wednesday, November 20, 2019

MIP solver stopping criteria

For larger MIP models we often don't wait for a proven optimal solution. This just takes too long, and actually we are spending a lot of time in proving optimality without much return in terms of better solutions. There are a number of stopping criteria that are typically available:

Time Limit : stop after $x$ seconds (or hours)
Relative Gap: stop if gap between best possible bound and best found integer solution becomes less than $x\%$. Different solvers use different definitions (especially regarding the denominator).
Absolute Gap: similar to relative gap, but can be used when the relative gap cannot be computed (division by zero or small number).
Node Limit: stop on number of explored branch & bound nodes.
Iteration Limit: stop on number of Simplex iterations. This number can be huge.

I have ordered these stopping criteria in how useful they are (to me). Time limit is by far the most important: just tell the solver how long we are willing to wait. Stopping on an achieved gap is also useful. I don't remember ever using a node or iteration limit.

If you specify several limits, typically a solver will stop as soon as it hits any one of the specified limits. In other words: multiple stopping criteria are combined in an "or" fashion.

When stopping on a time limit, it is still important to inspect the final gap. A small gap gives us a guaranteed quality assurance about the solution.

For large models the tail is often very long, and we probably see hardly any movement: no new integer solutions are found and the best bound is moving very slowly (and moving less over time). So I really want to stop if there is not much hope for a better solution.

I would suggest another possible stopping criterion:

stop if the time since the last new (and improving) integer solution exceeds a time limit

If the time since that last new integer solution is large, we can infer that the probability of finding a better solution is small. We can also interpret this as resetting the clock after each new integer solution. I don't think any solver has this. Of course, for some solvers, we can implement this ourselves using some callback function.

Monday, November 11, 2019

Interesting constraint

Problem statement

In [1] the following problem is proposed:

I'm trying to solve a knapsack-style optimization problem with additional complexity.

Here is a simple example. I'm trying to select 5 items that maximize value. 2 of the items must be orange, one must be blue, one must be yellow, and one must be red. This is straightforward. However, I want to add a constraint that the selected yellow, red, and orange items can only have one shape in common with the selected blue item.

Data

The example data looks like:

 item   color     shape  value
    A    blue    circle   0.454
    B  yellow    square   0.570
    C     red  triangle   0.789
    D     red    circle   0.718
    E     red    square   0.828
    F  orange    square   0.709
    G    blue    circle   0.696
    H  orange    square   0.285
    I  orange    square   0.698
    J  orange  triangle   0.861
    K    blue  triangle   0.658
    L  yellow    circle   0.819
    M    blue    square   0.352
    N  orange    circle   0.883
    O  yellow  triangle   0.755

The complicated constraint is a bit difficult to grasp right away. The two solutions listed below may help to understand how I interpreted this.

Derived data

Let's see if we can model this. First we slice and dice the data a bit to make the modeling a bit easier. Here is some derived data:

----     65 SET i  item

A,    B,    C,    D,    E,    F,    G,    H,    I,    J,    K,    L,    M,    N,    O


----     65 SET c  color

blue  ,    yellow,    red   ,    orange


----     65 SET s  shape

circle  ,    square  ,    triangle


----     65 SET ICS  (i,c,s) combinations

              circle      square    triangle

A.blue           YES
B.yellow                     YES
C.red                                    YES
D.red            YES
E.red                        YES
F.orange                     YES
G.blue           YES
H.orange                     YES
I.orange                     YES
J.orange                                 YES
K.blue                                   YES
L.yellow         YES
M.blue                       YES
N.orange         YES
O.yellow                                 YES


----     65 SET IC  (i,c) combinations

         blue      yellow         red      orange

A         YES
B                     YES
C                                 YES
D                                 YES
E                                 YES
F                                             YES
G         YES
H                                             YES
I                                             YES
J                                             YES
K         YES
L                     YES
M         YES
N                                             YES
O                     YES


----     65 SET IS  (i,s) combinations

       circle      square    triangle

A         YES
B                     YES
C                                 YES
D         YES
E                     YES
F                     YES
G         YES
H                     YES
I                     YES
J                                 YES
K                                 YES
L         YES
M                     YES
N         YES
O                                 YES


----     65 SET blue  blue colored items

A,    G,    K,    M


----     65 PARAMETER value  value of item

A 0.454,    B 0.570,    C 0.789,    D 0.718,    E 0.828,    F 0.709,    G 0.696,    H 0.285,    I 0.698,    J 0.861
K 0.658,    L 0.819,    M 0.352,    N 0.883,    O 0.755

I often introduce data in more convenient formats. This will help us in writing the model in a more readable way. We can develop and debug this derived data before we work on the model equations itself.

Note that the set CS(c,s) is complete. However, I will assume that there is a possibility that this set has some missing entries. In other words, I will not assume that all combinations of colors and shapes exist in the data.

High-level model I

Let's introduce the following zero-one variables:\[\begin{align} & x_i = \begin{cases} 1 & \text{if item $i$ is selected}\\ 0 & \text{otherwise}\end{cases} \\ & y_{c,s} = \begin{cases} 1 & \text{if items with color $c$ and shape $s$ are selected}\\ 0 & \text{otherwise} \end{cases}\end{align}\]

My high-level model is:

High-level Model 1
\[\begin{align}\max & \sum_i \color{darkblue}{\mathit{Value}}_i \cdot \color{darkred}x_i \\ &\sum_i \color{darkred}x_i = \color{darkblue}{\mathit{NumItems}}\\ &\sum_{i \| \color{darkblue}{\mathit{IC}}(i,c)} \color{darkred}x_i = \color{darkblue}{\mathit{NumColor}}_c && \forall c\\ & \color{darkred}y_{c,s} = \max_{i\|\color{darkblue}{\mathit{ICS}}(i,c,s)} \color{darkred}x_i && \forall c,s\|\color{darkblue}{\mathit{CS}}(c,s)\\ & \color{darkred}y_{\color{darkblue}{\mathit{blue}},s} = 1 \Rightarrow \sum_{c\|\color{darkblue}{\mathit{YRO}}(c)} \color{darkred}y_{c,s} \le 1 && \forall s \\&\color{darkred}x_i, \color{darkred}y_{c,s} \in \{0,1\} \end{align}\]

High-level Model 1

\[\begin{align}\max & \sum_i \color{darkblue}{\mathit{Value}}_i \cdot \color{darkred}x_i \\ &\sum_i \color{darkred}x_i = \color{darkblue}{\mathit{NumItems}}\\ &\sum_{i | \color{darkblue}{\mathit{IC}}(i,c)} \color{darkred}x_i = \color{darkblue}{\mathit{NumColor}}_c && \forall c\\ & \color{darkred}y_{c,s} = \max_{i|\color{darkblue}{\mathit{ICS}}(i,c,s)} \color{darkred}x_i && \forall c,s|\color{darkblue}{\mathit{CS}}(c,s)\\ & \color{darkred}y_{\color{darkblue}{\mathit{blue}},s} = 1 \Rightarrow \sum_{c|\color{darkblue}{\mathit{YRO}}(c)} \color{darkred}y_{c,s} \le 1 && \forall s \\&\color{darkred}x_i, \color{darkred}y_{c,s} \in \{0,1\} \end{align}\]

The max constraint implements the definition of the $y_{c,s}$ variables: if any of the selected items has color/shape combination $(c,s)$ then $y_{c,s}=1$ (and else it stays zero). The implication constraint says: if we have a blue shape $s$, then there can be only one shape of type $s$ of other color. The model is a bit complicated with all subsets and indexing because I wanted to be precise. No hand-waving. This helps when implementing it.

MIP Model I

This is not yet a MIP model, but translation of the above model into a normal MIP is not too difficult.

Mixed Integer Programming Model 1
\[\begin{align}\max & \sum_i \color{darkblue}{\mathit{Value}}_i \cdot \color{darkred}x_i \\ &\sum_i \color{darkred}x_i = \color{darkblue}{\mathit{NumItems}}\\ &\sum_{i \| \color{darkblue}{\mathit{IC}}(i,c)} \color{darkred}x_i = \color{darkblue}{\mathit{NumColor}}_c && \forall c\\ & \color{darkred}y_{c,s} \ge \color{darkred}x_i && \forall i,c,s\|\color{darkblue}{\mathit{ICS}}(i,c,s)\\ & \sum_{c\|\color{darkblue}{\mathit{YRO}}(c)} \color{darkred}y_{c,s} \le 1 + \color{darkblue}M(1-\color{darkred}y_{\color{darkblue}{\mathit{blue}},s}) && \forall s \\&\color{darkred}x_i, \color{darkred}y_{c,s} \in \{0,1\} \end{align}\]

Mixed Integer Programming Model 1

\[\begin{align}\max & \sum_i \color{darkblue}{\mathit{Value}}_i \cdot \color{darkred}x_i \\ &\sum_i \color{darkred}x_i = \color{darkblue}{\mathit{NumItems}}\\ &\sum_{i | \color{darkblue}{\mathit{IC}}(i,c)} \color{darkred}x_i = \color{darkblue}{\mathit{NumColor}}_c && \forall c\\ & \color{darkred}y_{c,s} \ge \color{darkred}x_i && \forall i,c,s|\color{darkblue}{\mathit{ICS}}(i,c,s)\\ & \sum_{c|\color{darkblue}{\mathit{YRO}}(c)} \color{darkred}y_{c,s} \le 1 + \color{darkblue}M(1-\color{darkred}y_{\color{darkblue}{\mathit{blue}},s}) && \forall s \\&\color{darkred}x_i, \color{darkred}y_{c,s} \in \{0,1\} \end{align}\]

The max constraint has been replaced by a single inequality. This works in this case as we are only interested in $y$'s that are forced to be one. Those guys will limit blue shape constraint. This type of bounding is quite common, but it always requires a bit of thinking through in order to establish if this is allowed for the specific case. The blue shape implication constraint itself is rewritten as a big-M inequality. A good value for $M$ is not very difficult to establish: the number of colors minus blue minus 1.

MIP Model 2

We can actually bypass the variable $y$ and directly formulate the limit on selected shapes in terms of variables $x$. For this we develop a set SameShape(i,j) with:

i is an item with color blue
j is an item with color not blue
i and j have the same shape

The set SameShape looks like:

----     71 SET SameShape  blue i and non-blue j with same shape

            B           C           D           E           F           H           I           J           L

A                                 YES                                                                     YES
G                                 YES                                                                     YES
K                     YES                                                                     YES
M         YES                                 YES         YES         YES         YES

+           N           O

A         YES
G         YES
K                     YES

This means for the first row: if item A (a blue circle) is considered, then other non-blue circles are items D, L and N. With this we can write:

Mixed Integer Programming Model 2
\[\begin{align}\max & \sum_i \color{darkblue}{\mathit{Value}}_i \cdot \color{darkred}x_i \\ &\sum_i \color{darkred}x_i = \color{darkblue}{\mathit{NumItems}}\\ &\sum_{i \| \color{darkblue}{\mathit{IC}}(i,c)} \color{darkred}x_i = \color{darkblue}{\mathit{NumColor}}_c && \forall c\\ & \sum_{j\|\color{darkblue}{\mathit{SameShape}}(i,j)} \color{darkred}x_j \le 1 + \color{darkblue}M(1-\color{darkred}x_i) && \forall i \| \color{darkblue}{\mathit{Blue}}_i \\&\color{darkred}x_i \in \{0,1\} \end{align}\]

Mixed Integer Programming Model 2

\[\begin{align}\max & \sum_i \color{darkblue}{\mathit{Value}}_i \cdot \color{darkred}x_i \\ &\sum_i \color{darkred}x_i = \color{darkblue}{\mathit{NumItems}}\\ &\sum_{i | \color{darkblue}{\mathit{IC}}(i,c)} \color{darkred}x_i = \color{darkblue}{\mathit{NumColor}}_c && \forall c\\ & \sum_{j|\color{darkblue}{\mathit{SameShape}}(i,j)} \color{darkred}x_j \le 1 + \color{darkblue}M(1-\color{darkred}x_i) && \forall i | \color{darkblue}{\mathit{Blue}}_i \\&\color{darkred}x_i \in \{0,1\} \end{align}\]

Solution without "blue shape" constraint

For comparison, let's first run the model without the complex "blue" shape constraints (the last two constraints). That gives:

----     90 VARIABLE z.L                   =        4.087  obj

----     90 VARIABLE x.L  select item

E 1.000,    G 1.000,    J 1.000,    L 1.000,    N 1.000


----     90 SET selected  

              circle      square    triangle

E.red                        YES
G.blue           YES
J.orange                                 YES
L.yellow         YES
N.orange         YES

We see that the blue shape (circle) is also selected as orange and yellow.

Solution with "blue shape" constraint

With the additional constraints, we see:

----     95 VARIABLE z.L                   =        4.049  obj

----     95 VARIABLE x.L  select item

E 1.000,    J 1.000,    K 1.000,    L 1.000,    N 1.000


----     95 SET selected  

              circle      square    triangle

E.red                        YES
J.orange                                 YES
K.blue                                   YES
L.yellow         YES
N.orange         YES

Now we see the blue shape is a triangle. We only have another triangle of color orange.

Obviously, as a sanity check, the objective reduces a bit when we introduce this constraint.

Appendix: Model formulated in GAMS

The complete GAMS model looks like:

$ontext

   I'm trying to solve a napsack-style optimization problem with additional complexity.

   Here is a simple example. I'm trying to select 5 items that maximize value. 2 of the
   items must be orange, one must be blue, one must be yellow, and one must be red.
   This is straightforward. However, I want to add a constraint that the selected yellow,
   red, and orange items can only have one shape in common with the selected blue item.

$offtext

set
   i 'item' /A*O/
   c 'color' /blue,yellow,red,orange/
   s 'shape' /circle,square,triangle/
;

parameters
    data(i,c,s) 'value' /
         A . blue   . circle     0.454
         B . yellow . square     0.570
         C . red    . triangle   0.789
         D . red    . circle     0.718
         E . red    . square     0.828
         F . orange . square     0.709
         G . blue   . circle     0.696
         H . orange . square     0.285
         I . orange . square     0.698
         J . orange . triangle   0.861
         K . blue   . triangle   0.658
         L . yellow . circle     0.819
         M . blue   . square     0.352
         N . orange . circle     0.883
         O . yellow . triangle   0.755
         /
    NumItems 'number of items to select' /5/
    NumColor(c) 'required number of each color' /
         orange 2
         red    1
         blue   1
         yellow 1
        /
;

alias(i,j);

sets
ICS(i,c,s) '(i,c,s) combinations'
IC(i,c)    '(i,c) combinations'
IS(i,s)    '(i,s) combinations'
blue(i)    'blue colored items'
;
parameter
value(i)   'value of item'
M          'big-M'
;

ICS(i,c,s) = data(i,c,s);
IC(i,c) = sum(ICS(i,c,s),1);
IS(i,s) = sum(ICS(i,c,s),1);
blue(i) = IC(i,"blue");
value(i) = sum((c,s),data(i,c,s));
display i,c,s,ICS,IC,IS,blue,value;

M = card(s)-2;

set SameShape(i,j) 'blue i and non-blue j with same shape';
SameShape(i,j)$(blue(i) and not blue(j)) = sum(s, IS(i,s) and IS(j,s));
display SameShape;

binary variable x(i) 'select item';
variable z 'obj';

equations
   obj       'objective'
   count     'count number of selected items'
   countcolor(c) 'count selected items for each color'
   limit(i)
;
obj.. z =e= sum(i, value(i)*x(i));
count.. sum(i, x(i)) =e= numitems;
countcolor(c).. sum(IC(i,c), x(i)) =e= numcolor(c);
limit(i)$blue(i).. sum(sameshape(i,j), x(j)) =l= 1 + M*(1-x(i));

option optcr=0;
set selected(i,c,s) 'reporting';

model m1 /obj,count,countcolor/;
solve m1 maximizing z using mip;
selected(i,c,s)$ICS(i,c,s) = x.l(i)>0.5;
display z.l,x.l,selected;

model m2 /all/;
solve m2 maximizing z using mip;
selected(i,c,s)$ICS(i,c,s) = x.l(i)>0.5;
display z.l,x.l,selected;

A second exercise would be to write this in Python using PuLP.

Appendix: PuLP model

import pulp
import io
import pandas

# -------- data ----------------

df = pandas.read_csv(io.StringIO('''
item,color,shape,value
A,blue,circle,0.454
B,yellow,square,0.570
C,red,triangle,0.789
D,red,circle,0.718
E,red,square,0.828
F,orange,square,0.709
G,blue,circle,0.696
H,orange,square,0.285
I,orange,square,0.698
J,orange,triangle,0.861
K,blue,triangle,0.658
L,yellow,circle,0.819
M,blue,square,0.352
N,orange,circle,0.883
O,yellow,triangle,0.755
'''))
print(df)

numItems = 5
numColor = {'orange':2, 'red':1, 'blue':1, 'yellow':1}

# --------- derived data ------------

Items = df["item"]
Colors = df["color"].unique()
Shapes = df["shape"].unique()
YRO = Colors[Colors != 'blue']

value = dict(zip(Items,df["value"]))
color = dict(zip(Items,df["color"]))
shape = dict(zip(Items,df["shape"]))

M = Shapes.size - 2

SameShape = {
   (i,j) for i in Items for j in Items
         if (color[i]=='blue') and (color[j]!='blue') and
             (shape[j]==shape[i]) }
print('SameShape:',SameShape)

# ----------- Model -----------

p = pulp.LpProblem("MIP problem", pulp.LpMaximize)
x = pulp.LpVariable.dicts("x",Items,0,1,pulp.LpInteger)

p += pulp.lpSum([x[i]*value[i] for i in Items])
p += pulp.lpSum([x[i] for i in Items]) == numItems

for c in Colors:
    p += pulp.lpSum([x[i] for i in Items if color[i]==c]) == numColor[c]

for i in Items:
    if color[i]=="blue":
         p += pulp.lpSum([x[j] for j in Items if (i,j) in SameShape]) <= 1 + M*(1-x[i])

p.solve(pulp.PULP_CBC_CMD(msg=1))

#----------- results --------------

print("Status:", pulp.LpStatus[p.status])
print("Objective:",pulp.value(p.objective))

selected = []
for i in Items:
    if pulp.value(x[i]) > 0.5:
         selected += [[i, color[i], shape[i]]]
selected = pandas.DataFrame(selected, columns=['item','color','shape'])
print(selected)

Note that SameShape is a Python set. We could have dropped SameShape and implemented the logic of this set directly in the blue shape constraint. However I believe it is often a good thing to keep constraints as simple as possible. They are more difficult to debug. The set SameShape can easily be inspected by a print statement, even before we have written the rest of the MIP model.

The output looks like:

   item   color     shape  value
0     A    blue    circle  0.454
1     B  yellow    square  0.570
2     C     red  triangle  0.789
3     D     red    circle  0.718
4     E     red    square  0.828
5     F  orange    square  0.709
6     G    blue    circle  0.696
7     H  orange    square  0.285
8     I  orange    square  0.698
9     J  orange  triangle  0.861
10    K    blue  triangle  0.658
11    L  yellow    circle  0.819
12    M    blue    square  0.352
13    N  orange    circle  0.883
14    O  yellow  triangle  0.755
Welcome to the CBC MILP Solver
Version: 2.9.0
Build Date: Feb 12 2015

command line - D:\Python\Python37\lib\site-packages\pulp\solverdir\cbc\win\64\cbc.exe 3c0bdd177daf444ba924d26442c171ba-pulp.mps max branch printingOptions all solution 3c0bdd177daf444ba924d26442c171ba-pulp.sol (default strategy 1)
At line 2 NAME          MODEL
At line 3 ROWS
At line 14 COLUMNS
At line 108 RHS
At line 118 BOUNDS
At line 134 ENDATA
Problem MODEL has 9 rows, 15 columns and 48 elements
Coin0008I MODEL read with 0 errors
Continuous objective value is 4.049 - 0.00 seconds
Cgl0004I processed model has 9 rows, 15 columns (15 integer (15 of which binary)) and 48 elements
Cutoff increment increased from 1e-005 to 0.000999
Cbc0038I Initial state - 0 integers unsatisfied sum - 0
Cbc0038I Solution found of -4.049
Cbc0038I Before mini branch and bound, 15 integers at bound fixed and 0 continuous
Cbc0038I Mini branch and bound did not improve solution (0.01 seconds)
Cbc0038I After 0.01 seconds - Feasibility pump exiting with objective of -4.049 - took 0.00 seconds
Cbc0012I Integer solution of -4.049 found by feasibility pump after 0 iterations and 0 nodes (0.01 seconds)
Cbc0001I Search completed - best objective -4.049, took 0 iterations and 0 nodes (0.01 seconds)
Cbc0035I Maximum depth 0, 0 variables fixed on reduced cost
Cuts at root node changed objective from -4.049 to -4.049
Probing was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Gomory was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Knapsack was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Clique was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
MixedIntegerRounding2 was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
FlowCover was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
TwoMirCuts was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)

Result - Optimal solution found

Objective value:                4.04900000
Enumerated nodes:               0
Total iterations:               0
Time (CPU seconds):             0.02
Time (Wallclock seconds):       0.02

Option for printingOptions changed from normal to all
Total time (CPU seconds):       0.07   (Wallclock seconds):       0.07

Status: Optimal
Objective: 4.0489999999999995
  item   color     shape
0    E     red    square
1    J  orange  triangle
2    K    blue  triangle
3    L  yellow    circle
4    N  orange    circle

Some notes:

This does not look so bad. The PuLP model follows the mathematical model quite closely.
The main differences with the GAMS model: we need for loops here and there (GAMS has implicit loops), and the variable y is dense. I don't think PuLP has direct facilities for sparse variables. In this case this is not a problem.
The CBC version that comes with PuLP is rather old.
Having said that, CBC is doing pretty good on this model: 0 nodes and 0 simplex iterations.

References

How to construct a complex constraint in PuLP Python, https://stackoverflow.com/questions/58770011/how-to-construct-a-complex-constraint-in-pulp-python