Yet Another Math Programming Consultant: September 2020

Tuesday, September 29, 2020

Fix non positive semi-definite covariance matrix in R

In [1], the following R code is proposed to solve a Mean-Variance portfolio optimization problem:

set.seed(123)

nO <- 10L  ## number of observations
nA <- 100L  ## number of assets
mData <- array(rnorm(nO * nA, sd = 0.05), 
               dim = c(nO, nA)) #Creating sample stock observations

library("quadprog")
aMat <- array(1, dim = c(1,nA))
bVec <- 1
zeros <- array(0, dim = c(nA,1))

solQP <- solve.QP(cov(mData), zeros, t(aMat), bVec, meq = 1) #Minimize optimization
solQP$solution

I added the line that explicitly sets the seed to make the runs reproducible.

The model is a bit difficult to decipher in this R code, but it is a simplified traditional MV model:

Mean-Variance Portfolio Optimization Model
\[\begin{align} \min\> & \color{darkred}x'\color{darkblue}Q\color{darkred}x - \color{darkblue}\lambda \bar{\color{darkblue}r}'\color{darkred}x\\ & \sum_i \color{darkred}x_i = 1 \\ & \color{darkred}x_i \ge 0\end{align}\]

Instead of having the expected returns in the objective, we also see versions of this model with a linear constraint that puts a minimum value on the expected return of the portfolio.

In the example, we have $\lambda=0$, so the objective only has a quadratic term to minimize the risk. We don't care about returns.

Note that the data only has 10 observations but 100 instruments. This situation can often lead to some problems. Indeed when we run this, we see:

> solQP <- solve.QP(cov(mData), zeros, t(aMat), bVec, meq = 1) #Minimize optimization
Error in solve.QP(cov(mData), zeros, t(aMat), bVec, meq = 1) : 
  matrix D in quadratic function is not positive definite!
>

As a side note: the data is just completely random, so we cannot expect useful solutions. However, at least, we should be able to produce some solutions.

Solution #1

The covariance matrix is in theory positive-semi definite. We can see this from using the definition of the covariance [2]. However, due to floating-point inaccuracies, we can end up with a covariance matrix that is just slightly non-positive semi-definite. We can see this by inspecting the eigenvalues:

> Q <- cov(mData)
> eigen(Q)$values
  [1]  3.977635e-02  3.789523e-02  3.258164e-02  3.093782e-02  2.605389e-02  2.466463e-02  2.127899e-02
  [8]  1.897506e-02  1.627900e-02  1.871945e-17  1.454315e-17  8.114384e-18  5.109196e-18  4.392427e-18
 [15]  3.680766e-18  2.728485e-18  2.649302e-18  2.192073e-18  1.937634e-18  1.914487e-18  1.856811e-18
 [22]  1.449321e-18  1.391434e-18  1.211216e-18  1.196721e-18  1.127119e-18  9.776146e-19  9.670197e-19
 [29]  8.257462e-19  7.572554e-19  6.283522e-19  6.030820e-19  4.056541e-19  3.610128e-19  3.604800e-19
 [36]  2.770106e-19  2.530797e-19  2.263314e-19  2.133854e-19  2.089700e-19  2.024130e-19  2.006560e-19
 [43]  1.688435e-19  9.412675e-20  7.033666e-20  7.012531e-20  5.942729e-20  5.673501e-20  4.444257e-20
 [50]  4.259787e-20  3.124233e-21 -2.017011e-20 -2.269857e-20 -3.034668e-20 -3.075493e-20 -3.262642e-20
 [57] -4.345710e-20 -6.351674e-20 -7.347496e-20 -7.739535e-20 -8.065131e-20 -9.318415e-20 -1.004645e-19
 [64] -1.048992e-19 -1.185271e-19 -1.315539e-19 -1.362383e-19 -1.376705e-19 -1.456396e-19 -1.697860e-19
 [71] -1.700317e-19 -1.745068e-19 -1.794480e-19 -1.835296e-19 -1.960933e-19 -2.017457e-19 -2.332259e-19
 [78] -2.377082e-19 -3.232830e-19 -3.468225e-19 -3.844707e-19 -4.310383e-19 -4.467493e-19 -4.540707e-19
 [85] -4.977900e-19 -6.159103e-19 -6.338311e-19 -6.484161e-19 -6.489508e-19 -6.551306e-19 -6.664641e-19
 [92] -7.276688e-19 -7.422977e-19 -7.927766e-19 -1.106209e-18 -1.707077e-18 -1.742870e-18 -2.465438e-18
 [99] -1.859346e-17 -3.473638e-17

Most of the eigenvalues are basically zero with quite a few just a bit negative. A positive-definite matrix would have only positive values. A positive semi-definite (PSD) matrix would allow positive and zero eigenvalues but no negative ones.

There are algorithms to find a close-by positive-definite matrix. In R this is available as nearPD() in the Matrix package. Let's try this out:

> Q <- nearPD(cov(mData))$mat
> solQP <- solve.QP(Q, zeros, t(aMat), bVec, meq = 1) 
> solQP$solution
  [1]  0.007861808  0.004649581  0.012677143  0.009135964  0.011926439  0.013624409  0.007916747  0.004307772
  [9]  0.011808486  0.011085302  0.010229029  0.013080240  0.010085504  0.010998879  0.011099784  0.008520520
 [17]  0.007451730  0.012424371  0.012677330  0.007013515  0.009070013  0.010422999  0.011041407  0.008768456
 [25]  0.007594051  0.010845828  0.008091472  0.010219700  0.017100764  0.005583947  0.011981802  0.007366402
 [33]  0.013770522  0.009063828  0.010744898  0.007836608  0.009580356  0.007065588  0.008219370  0.010592597
 [41]  0.011775598  0.011112051  0.005555992  0.014596684  0.016019481  0.008870919  0.010940604  0.005662200
 [49]  0.007056997  0.012274299  0.006510091  0.012764160  0.008258651  0.009945869  0.009891680  0.007318763
 [57]  0.014960558  0.008255708  0.009655097  0.010054760  0.007309219  0.008761360  0.011616297  0.010300254
 [65]  0.012043733  0.011963495  0.009991699  0.010353212  0.013896332  0.011619403  0.007160011  0.007202327
 [73]  0.008530819  0.010918658  0.004754215  0.008877062  0.008526937  0.010251250  0.008190446  0.006754629
 [81]  0.013209816  0.014849748  0.012213860  0.008888433  0.018188014  0.009756690  0.014786487  0.014893706
 [89]  0.009737739  0.013256532  0.009990737 -0.001155889  0.009264992  0.007733719  0.014505131  0.011040198
 [97]  0.012859850  0.008987104  0.008458713  0.004497735

Much better. When printing the eigenvalues of the new Q matrix, we see:

> eigen(Q)$values
  [1] 3.977635e-02 3.789522e-02 3.258164e-02 3.093782e-02 2.605389e-02 2.466462e-02 2.127899e-02 1.897506e-02
  [9] 1.627900e-02 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10
 [17] 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10
 [25] 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10
 [33] 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10
 [41] 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10
 [49] 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10
 [57] 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10
 [65] 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10
 [73] 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977634e-10 3.977634e-10 3.977634e-10 3.977634e-10
 [81] 3.977634e-10 3.977634e-10 3.977634e-10 3.977634e-10 3.977634e-10 3.977634e-10 3.977634e-10 3.977634e-10
 [89] 3.977634e-10 3.977634e-10 3.977634e-10 3.977634e-10 3.977634e-10 3.977634e-10 3.977634e-10 3.977634e-10
 [97] 3.977634e-10 3.977634e-10 3.977633e-10 3.977633e-10

All small (and negative) eigenvalues are now mapped to some tolerance. This means the Q matrix is now numerically positive definite.

Solution #2

A simple DIY approach would be to fix the Q matrix by adding a small number to the diagonal. The following code shows a small perturbation is enough:

> Q <- cov(mData) + diag(1e-6,nA)

Some solvers do something like this automatically.

Fixing the model

Let's first fix our model a bit:

We want to pay attention to returns. So let's add a constraint that the expected portfolio returns are at least 0.035. This also has the effect that just a few assets will be in our solution. Which makes them easier to compare. The constraint can be stated as \[\sum_i \bar{r}_i \ge 0.035\] where \[\bar{r}_i = \frac{\sum_t r_{t,i}}{T}\] are the mean returns. Here $T$ indicates the number of observations, and $r_{t,i}$ are the returns.
We also want to explicitly state that variables are non-negative: $x_i \ge 0$. This is called "no short-selling",

The portfolio model becomes:

Portfolio Model algebraic formulation	matrix notation
\[\begin{align} \min & \sum_{i,j} \color{darkblue}q_{i,j} \color{darkred}x_i \color{darkred}x_j \\ & \sum_i \color{darkred}x_i = 1 \\ & \sum_i \bar{\color{darkblue}r}_i \color{darkred}x_i \ge 0.035\\ & \color{darkred}x_i \ge 0\end{align}\]	\[\begin{align} \min \>& \color{darkred}x' \color{darkblue}Q \color{darkred}x \\ & \color{darkblue}e'\color{darkred}x = 1 \\ & \bar{\color{darkblue}r}' \color{darkred}x \ge 0.035\\ & \color{darkred}x \ge 0\end{align}\]

QuadProg does not have facilities for bounds on the variables so we need to form explicit constraints. Its input-model looks like: \[\begin{align} \min \> & -d'b + \frac{1}{2} b'Db \\ & A'b\ge b_0\end{align}\] where the first meq constraints are equalities. The factor 0.5 is a relic from quadratic optimization (it makes the gradient a bit easier). If the model includes linear objective coefficients, it is important to take this 0.5 factor into account. Our model has only quadratic objective coefficients, so we don't care in this case. All in all, this is often a bit of an inconvenient input format.

R trick. In the examples below we want to display a sparse vector. A simple trick using names can help here.

> v <- c(0,0,3,0,0,0,1,0,0,0,2,0,0,0,0)
> v
 [1] 0 0 3 0 0 0 1 0 0 0 2 0 0 0 0
> names(v) <- paste("v",1:length(v),sep="")
> v[v>0]
 v3  v7 v11 
  3   1   2

The R code for our optimization model can look like:

> Q <- nearPD(cov(mData))$mat
> aMat <- rbind(
+     c(rep(1,nA)),     # sum(x)=1
+     colMeans(mData),  # sum(mu(i)*x(i))>=0.035   
+     diag(1,nA))       # for bounds x(i)>=0
> bVec <- c(1,
+          0.035,
+          rep(0,nA))
> solQP <- solve.QP(Q, zeros, t(aMat), bVec, meq = 1) 
> xsol <- solQP$solution
> names(xsol) = paste("x",1:100,sep="")
> xsol[xsol>1e-4]
      x30       x75       x88 
0.1465990 0.5314489 0.3219522

Another issue with QuadProg is that requires a strictly positive definite matrix. This also means it does not allow linear variables: all variables need appear to quadratically in the objective (and without all zero quadratic coefficients). After all, if the quadratic coefficient matrix would look like \[D = \left[\begin{array}{c|c}Q & 0 \\ \hline 0 & 0 \end{array} \right]\] we would end up with a matrix that is not strictly positive-definite.

Often, for these types of models, CVXR is a more agreeable tool. Here is the same model in CVXR, and solved with the OSQP QP solver:

> library(CVXR)
> x <- Variable(nA)
> w <- Variable(nO)
> prob <- Problem(
+     Minimize(quad_form(x,cov(mData))),
+     list(sum(x) == 1,
+          t(colMeans(mData)) %*% x >= 0.035,
+          x >= 0)
+     )
> result <- solve(prob, verbose=T)
-----------------------------------------------------------------
           OSQP v0.6.0  -  Operator Splitting QP Solver
              (c) Bartolomeo Stellato,  Goran Banjac
        University of Oxford  -  Stanford University 2019
-----------------------------------------------------------------
problem:  variables n = 100, constraints m = 102
          nnz(P) + nnz(A) = 5350
settings: linear system solver = qdldl,
          eps_abs = 1.0e-05, eps_rel = 1.0e-05,
          eps_prim_inf = 1.0e-04, eps_dual_inf = 1.0e-04,
          rho = 1.00e-01 (adaptive),
          sigma = 1.00e-06, alpha = 1.60, max_iter = 10000
          check_termination: on (interval 25),
          scaling: on, scaled_termination: off
          warm start: on, polish: on, time_limit: off

iter  objective    pri res    dua res    rho        time
   1   0.0000e+00   1.00e+00   1.00e+02   1.00e-01   2.88e-03s
 200   8.7535e-04   5.96e-04   1.36e-06   1.75e-02   9.38e-03s
 400   9.2707e-04   5.09e-04   1.16e-06   1.75e-02   1.41e-02s
 600   9.7703e-04   4.36e-04   9.94e-07   1.75e-02   1.99e-02s
 800   1.0239e-03   3.72e-04   8.50e-07   1.75e-02   2.58e-02s
1000   1.0671e-03   3.18e-04   7.27e-07   1.75e-02   3.11e-02s
1200   1.1062e-03   2.72e-04   6.21e-07   1.75e-02   3.76e-02s
1400   1.1413e-03   2.33e-04   5.31e-07   1.75e-02   4.35e-02s
1600   1.1726e-03   1.99e-04   4.54e-07   1.75e-02   4.90e-02s
1800   1.2001e-03   1.70e-04   3.88e-07   1.75e-02   5.51e-02s
2000   1.2243e-03   1.45e-04   3.32e-07   1.75e-02   6.20e-02s
2200   1.2455e-03   1.24e-04   2.84e-07   1.75e-02   6.81e-02s
2400   1.2639e-03   1.06e-04   2.43e-07   1.75e-02   7.33e-02s
2600   1.2799e-03   9.09e-05   2.07e-07   1.75e-02   7.83e-02s
2800   1.2938e-03   7.77e-05   1.77e-07   1.75e-02   8.51e-02s
3000   1.3058e-03   6.65e-05   1.52e-07   1.75e-02   9.20e-02s
3200   1.3161e-03   5.68e-05   1.30e-07   1.75e-02   1.00e-01s
3400   1.3251e-03   4.86e-05   1.11e-07   1.75e-02   1.08e-01s
3600   1.3327e-03   4.15e-05   9.48e-08   1.75e-02   1.21e-01s
3800   1.3394e-03   3.55e-05   8.11e-08   1.75e-02   1.28e-01s
4000   1.3450e-03   3.04e-05   6.93e-08   1.75e-02   1.35e-01s
4200   1.3499e-03   2.60e-05   5.92e-08   1.75e-02   1.43e-01s
4400   1.3541e-03   2.22e-05   5.07e-08   1.75e-02   1.50e-01s
4550   1.3568e-03   1.97e-05   4.50e-08   1.75e-02   1.56e-01s
plsh   1.3790e-03   3.33e-16   1.21e-15  ---------   1.58e-01s

status:               solved
solution polish:      successful
number of iterations: 4550
optimal objective:    0.0014
run time:             1.58e-01s
optimal rho estimate: 2.58e-02

> xsol <- result$getValue(x)
> names(xsol) = paste("x",1:100,sep="")
> xsol[xsol>1e-4]
      x30       x75       x88 
0.1465989 0.5314489 0.3219522 
>

Here we use our covariance matrix directly. Both CVXR and OSQP accept it, even though it has tiny negative numbers. However, the OSQP documentation warns that this may be a bit dangerous [3]:

We recommend the user to check the convexity of their problem before passing it to OSQP! If the user passes a non-convex problem we do not assure the solver will be able to detect it.
OSQP will try to detect non-convex problems by checking if the residuals diverge or if there are any issues in the initial factorization (if a direct method is used). It will detect non-convex problems when one or more of the eigenvalues of P are “clearly” negative, i.e., when P + sigma * I is not positive semidefinite. However, it might fail to detect non-convexity when P has slightly negative eigenvalues, i.e., when P + sigma * I is positive semidefinite and P is not.

Notice that CVXR will complain if the Q matrix is really non-positive semi-definite.

Solution #3

In [2] a different model is proposed for these situations where we have more instruments than observations.

Means-adjusted-returns Portfolio Optimization Model
\[\begin{align} \min\> & \sum_t \color{darkred}w_t^2 \\ &\color{darkred}w_t = \sum_i (\color{darkblue}r_{t,i} - \bar{\color{darkblue}r}_i)\color{darkred}x_i \\ & \sum_i \color{darkred}x_i = 1 \\ & \sum_i \bar{\color{darkblue}r}_i \color{darkred}x_i \ge 0.035\\ & \color{darkred}x_i \ge 0\end{align}\]

Some of the advantages of this model are:

The quadratic objective is benign. It is sparse and safely positive definite.
The data is smaller than a full-blown covariance matrix given that the number of observations is smaller than the number of instruments.
The number of quadratic variables is smaller (again, because $T<N$).

This model is easily implemented in CVXR:

> library(CVXR)
> x <- Variable(nA)
> w <- Variable(nO)
> adjRet <- mData - matrix(colMeans(mData),nrow=nO,ncol=nA,byrow=T)
> prob <- Problem(
+     Minimize(sum_squares(w)),
+     list(w == adjRet %*% x,
+          sum(x)==1,
+          t(colMeans(mData)) %*% x >= 0.035,
+          x >= 0)
+     )
> result <- solve(prob, verbose=T)
-----------------------------------------------------------------
           OSQP v0.6.0  -  Operator Splitting QP Solver
              (c) Bartolomeo Stellato,  Goran Banjac
        University of Oxford  -  Stanford University 2019
-----------------------------------------------------------------
problem:  variables n = 110, constraints m = 112
          nnz(P) + nnz(A) = 1320
settings: linear system solver = qdldl,
          eps_abs = 1.0e-05, eps_rel = 1.0e-05,
          eps_prim_inf = 1.0e-04, eps_dual_inf = 1.0e-04,
          rho = 1.00e-01 (adaptive),
          sigma = 1.00e-06, alpha = 1.60, max_iter = 10000
          check_termination: on (interval 25),
          scaling: on, scaled_termination: off
          warm start: on, polish: on, time_limit: off

iter  objective    pri res    dua res    rho        time
   1   0.0000e+00   1.00e+00   1.00e+02   1.00e-01   1.15e-03s
 200   5.3515e-03   1.49e-03   4.04e-05   1.00e-01   4.33e-03s
 400   7.0787e-03   7.87e-04   6.50e-06   1.00e-01   8.41e-03s
 600   7.3521e-03   7.13e-04   5.90e-06   1.00e-01   1.23e-02s
 800   7.6413e-03   6.45e-04   5.34e-06   1.00e-01   1.61e-02s
1000   7.9373e-03   5.84e-04   4.84e-06   1.00e-01   1.93e-02s
1200   8.2334e-03   5.29e-04   4.38e-06   1.00e-01   2.27e-02s
1400   8.5244e-03   4.79e-04   3.96e-06   1.00e-01   2.64e-02s
1600   8.8067e-03   4.33e-04   3.59e-06   1.00e-01   3.00e-02s
1800   9.0778e-03   3.92e-04   3.25e-06   1.00e-01   3.34e-02s
2000   9.3358e-03   3.55e-04   2.94e-06   1.00e-01   3.69e-02s
2200   9.5798e-03   3.22e-04   2.66e-06   1.00e-01   4.05e-02s
2400   9.8091e-03   2.91e-04   2.41e-06   1.00e-01   4.43e-02s
2600   1.0024e-02   2.64e-04   2.18e-06   1.00e-01   4.81e-02s
2800   1.0224e-02   2.39e-04   1.98e-06   1.00e-01   5.12e-02s
3000   1.0410e-02   2.16e-04   1.79e-06   1.00e-01   5.48e-02s
3200   1.0582e-02   1.96e-04   1.62e-06   1.00e-01   5.84e-02s
3400   1.0741e-02   1.77e-04   1.47e-06   1.00e-01   6.19e-02s
3600   1.0887e-02   1.60e-04   1.33e-06   1.00e-01   6.55e-02s
3800   1.1022e-02   1.45e-04   1.20e-06   1.00e-01   6.90e-02s
4000   1.1145e-02   1.31e-04   1.09e-06   1.00e-01   7.53e-02s
4200   1.1259e-02   1.19e-04   9.85e-07   1.00e-01   8.18e-02s
4400   1.1362e-02   1.08e-04   8.92e-07   1.00e-01   8.95e-02s
4600   1.1457e-02   9.75e-05   8.07e-07   1.00e-01   9.44e-02s
4800   1.1544e-02   8.83e-05   7.31e-07   1.00e-01   1.04e-01s
5000   1.1623e-02   7.99e-05   6.62e-07   1.00e-01   1.09e-01s
5200   1.1695e-02   7.24e-05   5.99e-07   1.00e-01   1.13e-01s
5400   1.1761e-02   6.55e-05   5.42e-07   1.00e-01   1.20e-01s
5600   1.1821e-02   5.93e-05   4.91e-07   1.00e-01   1.26e-01s
5800   1.1876e-02   5.37e-05   4.45e-07   1.00e-01   1.31e-01s
6000   1.1925e-02   4.86e-05   4.02e-07   1.00e-01   1.37e-01s
6200   1.1970e-02   4.40e-05   3.64e-07   1.00e-01   1.45e-01s
6400   1.2011e-02   3.98e-05   3.30e-07   1.00e-01   1.58e-01s
6600   1.2049e-02   3.61e-05   2.99e-07   1.00e-01   1.72e-01s
6800   1.2082e-02   3.27e-05   2.70e-07   1.00e-01   1.81e-01s
7000   1.2113e-02   2.96e-05   2.45e-07   1.00e-01   1.90e-01s
7200   1.2141e-02   2.68e-05   2.22e-07   1.00e-01   1.98e-01s
7400   1.2166e-02   2.42e-05   2.01e-07   1.00e-01   2.01e-01s
7600   1.2189e-02   2.19e-05   1.82e-07   1.00e-01   2.05e-01s
7800   1.2210e-02   1.99e-05   1.64e-07   1.00e-01   2.08e-01s
plsh   1.2411e-02   6.66e-16   1.28e-15  ---------   2.09e-01s

status:               solved
solution polish:      successful
number of iterations: 7800
optimal objective:    0.0124
run time:             2.09e-01s
optimal rho estimate: 3.27e-01

> xsol <- result$getValue(x)
> names(xsol) = paste("x",1:100,sep="")
> xsol[xsol>1e-4]
      x30       x75       x88 
0.1465989 0.5314489 0.3219522

Using QuadProg this is a much more difficult exercise. The quadratic coefficient matrix $D$ will look like: \[D = \left[\begin{array}{c|c}Q & 0 \\ \hline 0 & \varepsilon I \end{array} \right]\] The R code can look like:

> Q <- nearPD(cov(mData))$mat
> D <- rbind(
+   cbind(Q,matrix(0,nrow=nA,ncol=nO)),
+   cbind(matrix(0,nrow=nO,ncol=nA),diag(1e-6,nO))
+ )
> aMat <- rbind(
+   cbind(adjRet,diag(-1,nO)),     # sum(i,adjRet(t,i)*x(i))-w(t)=0
+   c(rep(1,nA),rep(0,nO)),        # sum(x)=1
+   c(colMeans(mData),rep(0,nO)),  # sum(mu(i)*x(i))>=0.035   
+   cbind(diag(1,nA),matrix(0,nrow=nA,ncol=nO)) # for bounds x(i)>=0
+ )
> bVec <- c(rep(0,nO),
+           1,
+           0.035,
+           rep(0,nA))
> zeros <- rep(0,nA+nO)
> solQP <- solve.QP(D, zeros, t(aMat), bVec, meq = nO+1) 
> xsol <- solQP$solution[1:nA]
> names(xsol) = paste("x",1:100,sep="")
> xsol[xsol>1e-4]
      x30       x75       x88 
0.1465990 0.5314489 0.3219522

Conclusion

QuadProg is a well-known QP solver for R. Unfortunately it has some drawbacks:

It is not very suited for more complex models: the matrix oriented input format is just too cumbersome.
The quadratic coefficient matrix must be strictly positive definite.
All variables are quadratic, linear variables are not available.

For quick experimentation with these types of models, CVXR is likely a better environment. This exercise showed:

The models are more readable, assuming you are familiar with matrix notation.
We can formulate models with linear variables without a problem.
Models with a quadratic coefficient matrix that are borderline positive semi-definite are accepted by both CVXR and the solver I used: OSQP.

Two useful R tricks are discussed:

nearPD(A) to find a positive-definite matrix close to A.
Printing a sparse vector using names.

References

R minimum variance portfolio: solve non invertible matrix, https://stackoverflow.com/questions/64067591/r-minimum-variance-portfolio-solve-non-invertible-matrix
Covariance Matrix not positive definite in portfolio models, https://yetanothermathprogrammingconsultant.blogspot.com/2018/04/covariance-matrix-not-positive-definite.html
OSQP, Status values and errors, https://osqp.org/docs/interfaces/status_values.html

Wednesday, September 23, 2020

Limiting the number of changes in the operating mode of a machine

Limiting the number of changes in the operating mode of a machine in a scheduling problem can be seen as a direct extension of limiting the number of start-ups.

Limit number of start-ups

Suppose we have a machine scheduling problem, where we use the decision variable \[x_t = \begin{cases} 1 & \text{if machine is operating at time slot $t$}\\ 0 & \text{if machine is idle at time slot $t$}\end{cases}\] To limit the number number of start-ups we introduce a binary variable $\mathit{start}_t \in \{0,1\}$ and write: \[\begin{align} & \mathit{start}_t \ge x_t - x_{t-1}\\ & \sum_t \mathit{start}_t \le U\end{align}\] The idea is as follows:

Actually we are only sure that \[x_{t-1}=0, x_t = 1 \Rightarrow \mathit{start}_t =1\] so a better picture is:

as the $\mathit{start}_t$ is undetermined when we do not switch from 0 to 1.

Notes:

This formulation works for $\sum_t \mathit{start}_t \le U$ or minimize $\sum_t \mathit{start}_t $
It is not suited for $\sum_t \mathit{start}_t = U$, $\sum_t \mathit{start}_t \ge U$ or maximize $\sum_t \mathit{start}_t $
We can relax $\mathit{start}_t$ to $\mathit{start}_t \in [0,1]$, i.e. continuous between 0 and 1

Limit the number of mode changes

Let's now assume that the machine can operate in different modes. One could be inclined to use an integer variable: $\mathit{mode}_t \in \{0,1,2,\dots,M\}$. Easier is to use a binary variable \[\mathit{mode}_{m,t} = \begin{cases} 1 & \text{if the machine is operating in mode $m$ in time slot $t$} \\ 0 & \text{otherwise}\end{cases}\] A mode switch from $m1$ to $m2$ means:

In other words, by just focusing on the mode that is being activated, limiting the number of mode switches can be interpreted as limiting the number start-ups for each mode: \[\begin{align} & \mathit{switch}_t \ge \mathit{mode}_{m,t} - \mathit{mode}_{m,t-1} && \forall m,t\\ & \sum_t \mathit{switch}_t \le U\end{align}\] Again, this works whether we have an upper limit on the number of mode changes or just minimize this number.

References

Maximizing number of consecutive days with the same operation mode for a scheduling problem, https://stackoverflow.com/questions/64012796/maximizing-number-of-consecutive-days-with-the-same-operation-mode-for-a-schedul

Monday, September 14, 2020

Cover points with tiles

In [1] a problem is posted:

Given $n$ points and a collection of tiles (rectangles) of given size, try to cover all all points with the minimum number of tiles.

There are a number of things to consider:

Are the tiles of equal size or have they different sizes?
Can tiles overlap?
After minimizing the number of tiles, should we minimize the total size of the selected tiles, so we select the smaller tiles when possible.

Data

Let's generate some data, assuming tiles have different sizes:

----     19 PARAMETER p  location of points

              x           y

p1       17.175      84.327
p2       55.038      30.114
p3       29.221      22.405
p4       34.983      85.627
p5        6.711      50.021
p6       99.812      57.873
p7       99.113      76.225
p8       13.069      63.972
p9       15.952      25.008
p10      66.893      43.536
p11      35.970      35.144
p12      13.149      15.010
p13      58.911      83.089
p14      23.082      66.573
p15      77.586      30.366
p16      11.049      50.238
p17      16.017      87.246
p18      26.511      28.581
p19      59.396      72.272
p20      62.825      46.380
p21      41.331      11.770
p22      31.421       4.655
p23      33.855      18.210
p24      64.573      56.075
p25      76.996      29.781
p26      66.111      75.582
p27      62.745      28.386
p28       8.642      10.251
p29      64.125      54.531
p30       3.152      79.236


----     19 PARAMETER r  size of rectangles

              x           y

r1       12.183      15.270
r2       25.769      32.506
r3       15.344      11.024
r4       27.554      28.637
r5       21.681      20.761
r6       17.291      17.393
r7       13.915      38.003
r8       21.398      33.502
r9       19.001      13.764
r10      32.466      12.077

Random points

Model

To model this problem, we introduce binary variables $\mathit{select}_j \in \{0,1\}$: \[\mathit{select}_j = \begin{cases} 1 & \text{if tile $j$ is selected}\\ 0 & \text{otherwise}\end{cases}\] and a continuous variable $\mathit{place}_{j,c}$ indicating the $x$ and $y$ coordinates of (selected) tile $j$. To be precise: it is the left-lower corner of the rectangle. The set $c$ is simply $c=\{x,y\}$. In addition we need a variable indicating if a point is covered by a tile:\[\mathit{cov}_{i,j}=\begin{cases} 1 & \text{if point $i$ is covered by tile $j$} \\ 0 &\text{otherwise}\end{cases} \]

The model can look like:

MIP Model
\[\begin{align}\min &\sum_j \color{darkred}{\mathit{select}}_j \\ & \color{darkred}{\mathit{place}}_{j,c} \le \color{darkblue}p_{i,c}+\color{darkblue}M\cdot(1-\color{darkred}{\mathit{cov}}_{i,j}) && \forall i,j,c \\ &\color{darkred}{\mathit{place}}_{j,c}+\color{darkblue}r_{j,c}\ge\color{darkblue}p_{i,c}-\color{darkblue}M\cdot(1-\color{darkred}{\mathit{cov}}_{i,j}) && \forall i,j,c \\ & \color{darkred}{\mathit{cov}}_{i,j} \le \color{darkred}{\mathit{select}}_j && \forall i,j \\ & \sum_j \color{darkred}{\mathit{cov}}_{i,j} \ge 1 && \forall i \\ & \color{darkred}{\mathit{select}}_j \in \{0,1\} \\ & \color{darkred}{\mathit{cov}}_{i,j} \in \{0,1\} \end{align}\]

MIP Model

\[\begin{align}\min &\sum_j \color{darkred}{\mathit{select}}_j \\ & \color{darkred}{\mathit{place}}_{j,c} \le \color{darkblue}p_{i,c}+\color{darkblue}M\cdot(1-\color{darkred}{\mathit{cov}}_{i,j}) && \forall i,j,c \\ &\color{darkred}{\mathit{place}}_{j,c}+\color{darkblue}r_{j,c}\ge\color{darkblue}p_{i,c}-\color{darkblue}M\cdot(1-\color{darkred}{\mathit{cov}}_{i,j}) && \forall i,j,c \\ & \color{darkred}{\mathit{cov}}_{i,j} \le \color{darkred}{\mathit{select}}_j && \forall i,j \\ & \sum_j \color{darkred}{\mathit{cov}}_{i,j} \ge 1 && \forall i \\ & \color{darkred}{\mathit{select}}_j \in \{0,1\} \\ & \color{darkred}{\mathit{cov}}_{i,j} \in \{0,1\} \end{align}\]

Our points are drawn from \[p_{i,c}\sim U(0,100)\] So it make sense to restrict the placement of tiles to: \[0 \le \mathit{place}_{j,c} \le 100-r_{j,c}\] Using this, a good value for $M$ is $M=100$.

Results

----     55 VARIABLE z.L                   =        7.000  objective

----     55 PARAMETER tile  placement of tiles

              x           y           w           h

r1       53.928      67.819      12.183      15.270
r2        5.652                  25.769      32.506
r4       27.484       6.507      27.554      28.637
r5       78.131      55.464      21.681      20.761
r6        5.791      49.181      17.291      17.393
r8       56.188      22.573      21.398      33.502
r10       2.517      75.169      32.466      12.077

Points are covered by tiles

No overlap

We can add no-overlap constraints [2] to the model soo that no pair of tiles can overlap. Just to be sure, I'll allow tiles to go outside the area $[0,100]\times [0,100]$. Of course, that also has an impact on the value for $M$. The new solution looks like:

With no-overlap constraints

Same-sized tiles

If all tiles are identical, we just have a special case of our original problem. It may make sense to add an ordering constraint: \[\mathit{select}_j \le \mathit{select}_{j-1}\] We can also consider additional symmetry-breaking constraints, such as \[\mathit{place}_{j,x} \ge \mathit{place}_{j,x-1}\] It requires some experimentation if these constraints pay off in performance.

All tiles have the same size

In this example all the times are $(20 \times 20)$. Below is a larger example with $n=50$ points and $(10 \times 10)$ tiles.

Larger example

Set-covering formulation for equal-sized tiles

If the problem has all equal-sized tiles, we can employ a very different algorithm. For this we need to generate candidate locations in advance. One way to generate those is as follows:

$k:=0$
For each point with coordinates $(x_1,y_1)$:

For each point with coordinates $(x_2,y_2)$ such that $x_1 \le x_2 \le x_1+w$ and $y_1-h\le y_2\le y_1+h$:

Form the candidate rectangle $(x_1,y_2)-(x_1+w,y_2+h)$
$k := k+1$
Store as $\mathit{loc}_{k,c}=(x_1,y_2)$
Find all points $i$ inside the candidate rectangle, and store this inside a mapping $\mathit{contain}_{k,i}$

Now we can solve the set-covering problem [3]:

Set-covering Model
\[\begin{align}\min & \sum_j \color{darkred}{\mathit{select}}_j \\ & \sum_j \color{darkblue}{\mathit{contain}}_{j,i} \cdot \color{darkred}{\mathit{select}}_j \ge 1 && \forall i \\ & \color{darkred}{\mathit{select}}_j \in \{0,1\}\end{align}\]

where \[\mathit{contain}_{j,i} = \begin{cases} 1 & \text{if candidate rectangle $j$ contains point $i$}\\ 0 & \text{otherwise}\end{cases}\]

This model solves very fast.

A multi-criteria version

Instead of just minimizing the number of tiles, we can subsequently minimize the total area covered by the tiles. I.e. use smaller tiles if possible. This is essentially a multi-objective problem. We used here a lexicographic approach (i.e. do two solves in sequence). In our experiment, we reduced the total size of the selected tiles from 3672.5809 to 3004.0812.

Multi-objective model results

Obviously, this additional step makes only sense if the tiles have different sizes.

GAMS Models

As requested, here are some of the source files.

Model for tiles with different sizes.

set
i 'points to cover' /p1*p30/
j 'rectangles' /r1*r10/
c 'coordinates' /x,y/
;

parameter
   A 'side of area' / 100 /
   M 'big-M'
   p(i,c) 'location of points'
   r(j,c) 'size of rectangles'
;
p(i,c) = uniform(0,A);
r(j,c) = uniform(10,40);
M = A;
display p,r;

binary variable select(j) 'select tile';
positive variable place(j,c) 'placement of tile';
place.up(j,c) = A-r(j,c);
variable z 'objective';
binary variable cov(i,j);

equations
   obj
   cover1
   cover2
   cover3
   cover4
;

obj.. z =e= sum(j, select(j));
cover1(i,j,c).. place(j,c) =l= p(i,c)+M*(1-cov(i,j));
cover2(i,j,c).. place(j,c)+r(j,c) =g= p(i,c)-M*(1-cov(i,j));
cover3(i,j).. cov(i,j) =l= select(j);
cover4(i).. sum(j,cov(i,j)) =g= 1;

model m1 /all/;
option optcr=0, threads=8;
solve m1 minimizing z using mip;

select.l(j) = round(select.l(j));

parameter tile(j,*) 'placement of tiles';
tile(j,c) = place.l(j,c)*select.l(j);
tile(j,'w')$select.l(j) = r(j,'x');
tile(j,'h')$select.l(j) = r(j,'y');
display z.l,tile;

Set-covering model for equal-sized tiles.

set
i 'points to cover' /p1*p50/
j 'candidate rectangles' /r1*r1000/
contain(j,i) 'points contained in rectangle j'
c 'coordinates' /x,y/
;

parameter
   A 'side of area' / 100 /
   p(i,c) 'location of points'
   r(c) 'size of rectangles' /x 10, y 10/
   loc(j,c) 'locations of candidate rectangles'
;
p(i,c) = uniform(0,A);
display p,r;

*------------------------------------------------------------
* generate candidate rectangle locations
*------------------------------------------------------------

alias (i,ii,iii);
scalar k / 0 /;

loop(i,
   loop(ii$(p(ii,'x') >= p(i,'x') and p(ii,'x') <= p(i,'x')+r('x') and
               abs(p(ii,'y')-p(i,'y')) <= r('y')),
      k = k + 1;
      loop(j$(ord(j)=k),
         loc(j,'x') = p(i,'x');
         loc(j,'y') = p(ii,'y')-r('y');
         contain(j,iii) = p(iii,'x') >= loc(j,'x') and p(iii,'x') <= loc(j,'x')+r('x') and
                          p(iii,'y') >= loc(j,'y') and p(iii,'y') <= loc(j,'y')+r('y')
      );
   );
);

*------------------------------------------------------------
* set covering model
*------------------------------------------------------------

binary variable select(j) 'select tile';
variable z 'objective';

equations
   obj
   cover(i)
;

obj.. z =e= sum(j, select(j));
cover(i).. sum(contain(j,i),select(j)) =g= 1;

model m1 /all/;
option optcr=0;
solve m1 minimizing z using mip;

*------------------------------------------------------------
* reporting
*------------------------------------------------------------

parameter tile(j,*) 'placement of tiles';
tile(j,c) = loc(j,c)*select.l(j);
tile(j,'w')$select.l(j) = r('x');
tile(j,'h')$select.l(j) = r('y');
display z.l,tile;

The R code to plot the results looks like:

library(ggplot2)

pt <- read.table(text="p x y
p1       17.175      84.327
p2       55.038      30.114
p3       29.221      22.405
p4       34.983      85.627
p5        6.711      50.021
p6       99.812      57.873
p7       99.113      76.225
p8       13.069      63.972
p9       15.952      25.008
p10      66.893      43.536
p11      35.970      35.144
p12      13.149      15.010
p13      58.911      83.089
p14      23.082      66.573
p15      77.586      30.366
p16      11.049      50.238
p17      16.017      87.246
p18      26.511      28.581
p19      59.396      72.272
p20      62.825      46.380
p21      41.331      11.770
p22      31.421       4.655
p23      33.855      18.210
p24      64.573      56.075
p25      76.996      29.781
p26      66.111      75.582
p27      62.745      28.386
p28       8.642      10.251
p29      64.125      54.531
p30       3.152      79.236
",header=T)


r <- read.table(text="r x y w h
r1       53.928      67.819      12.183      15.270
r2       15.562       2.638      25.769      32.506
r4       50.032      27.438      27.554      28.637
r5        7.540       4.247      21.681      20.761
r6        5.791      49.181      17.291      17.393
r7       85.897      38.222      13.915      38.003
r10       2.517      75.169      32.466      12.077
",header=T)


ggplot(pt, aes(x=x , y=y)) + geom_point() + 
  geom_rect(data=r, aes(xmin=x,xmax=x+w,ymin=y,ymax=y+h,color=r,fill=r,alpha=0.1))+
  guides(alpha=FALSE)

Conclusion

The small MIP model presented above can be used as a basis for alternative, related problems. For equal-sized tiles, a faster set-covering model is shown.

References

Tiling fixed-size rectangles to cover a given set of points, https://stackoverflow.com/questions/63872910/tiling-fixed-size-rectangles-to-cover-a-given-set-of-points
Rectangles: no-overlap constraints, https://yetanothermathprogrammingconsultant.blogspot.com/2017/07/rectangles-no-overlap-constraints.html
Set cover problem, https://en.wikipedia.org/wiki/Set_cover_problem

Saturday, September 5, 2020

Optimal double exponential smoothing

Holt-Winters Double Exponential Smoothing method is a well-known method for smoothing data with a trend [1]. In [2] an Excel implementation was demonstrated. The formulas (from [2]) are: \[\begin{align} &u_1 = y_1 \\ & v_1 = 0 \\& u_t = \alpha y_t + (1-\alpha) (u_{t-1}+v_{t-1})&& t\gt 1 \\ &v_t = \beta (u_t - u_{t-1}) + (1-\beta)v_{t-1}&& t\gt 1 \\ & \hat{y}_{t+1} = u_{t}+v_{t} \end{align}\] The symbols are:

$y_t$ is the data,
$u_t$ and $v_t$ are estimates of the smoothed value and the trend,
$\alpha$ and $\beta$ are parameters. We have $0 \le \alpha \le 1$ and $0 \le \beta \le 1$.
$\hat{y}_t$ is the estimate for $y$.

In [2] the MAE (Mean Absolute Error) is minimized. This is \[ \min \>\mathit{MAE}= \frac{\displaystyle\sum_{t\gt1} |\hat{y}_t - y_t|}{T-1} \] where $T$ is the number of time periods. Of course, in practice a more popular measure is to minimize the Mean Squared Error. One reason for choosing a least absolute value criterion is that this provides robustness when outliers are present.

Data

I use the small data set from [2]:

----     23 PARAMETER y  data

t1   3.000,    t2   5.000,    t3   9.000,    t4  20.000,    t5  12.000,    t6  17.000,    t7  22.000,    t8  23.000
t9  51.000,    t10 41.000,    t11 56.000,    t12 75.000,    t13 60.000,    t14 75.000,    t15 88.000

Model

Often, in statistical software, the MSE version of the problem is modeled as a black-box function of just $\alpha$ and $\beta$ and solved using say a BFGS-B method (usually without analytic gradients, so the solver will use finite-differences). They typically ignore that we can end up in a local minimum (although most routines allow us to specify a starting point for $\alpha$ and $\beta$). Here I want to state an explicit non-convex quadratic model that we can solve with different types of solvers. This approach will give us a much larger model, with variables $\alpha$, $\beta$, $u$, $v$, and $\hat{y}$. In addition, when using an MAE objective, we need extra variables to linearize the absolute value.

Non-convex Quadratic Model
\[\begin{align}\min\>&\color{darkred}{\mathit{MAE}}=\sum_{t\gt 1} \color{darkred}{\mathit{abserr}}_t \\ & \color{darkred}u_1 = \color{darkblue}y_1 \\ & \color{darkred}v_1 = 0 \\ & \color{darkred}u_t = \color{darkred}\alpha \cdot \color{darkblue}y_t + (1-\color{darkred}\alpha) \cdot (\color{darkred}u_{t-1}+\color{darkred}v_{t-1})&& t\gt 1\\& \color{darkred}v_t = \color{darkred}\beta \cdot (\color{darkred}u_t - \color{darkred}u_{t-1}) + (1-\color{darkred}\beta)\cdot \color{darkred}v_{t-1}&& t\gt 1 \\ & \color{darkred}{\hat{y}}_{t+1} = \color{darkred}u_{t}+\color{darkred}v_{t}\\ & -\color{darkred}{\mathit{abserr}}_t \le \color{darkred}{\hat{y}}_{t}- \color{darkblue}y_t \le \color{darkred}{\mathit{abserr}}_t && t\gt 1\\ & \color{darkred}\alpha \in [0,1] \\ & \color{darkred}\beta \in [0,1]\\ & \color{darkred}{\mathit{abserr}}_t \ge 0\end{align}\]

Non-convex Quadratic Model

\[\begin{align}\min\>&\color{darkred}{\mathit{MAE}}=\sum_{t\gt 1} \color{darkred}{\mathit{abserr}}_t \\ & \color{darkred}u_1 = \color{darkblue}y_1 \\ & \color{darkred}v_1 = 0 \\ & \color{darkred}u_t = \color{darkred}\alpha \cdot \color{darkblue}y_t + (1-\color{darkred}\alpha) \cdot (\color{darkred}u_{t-1}+\color{darkred}v_{t-1})&& t\gt 1\\& \color{darkred}v_t = \color{darkred}\beta \cdot (\color{darkred}u_t - \color{darkred}u_{t-1}) + (1-\color{darkred}\beta)\cdot \color{darkred}v_{t-1}&& t\gt 1 \\ & \color{darkred}{\hat{y}}_{t+1} = \color{darkred}u_{t}+\color{darkred}v_{t}\\ & -\color{darkred}{\mathit{abserr}}_t \le \color{darkred}{\hat{y}}_{t}- \color{darkblue}y_t \le \color{darkred}{\mathit{abserr}}_t && t\gt 1\\ & \color{darkred}\alpha \in [0,1] \\ & \color{darkred}\beta \in [0,1]\\ & \color{darkred}{\mathit{abserr}}_t \ge 0\end{align}\]

The first thing to do is to see if we can reproduce the results for $\alpha=0.4$ and $\beta=0.7$ as they are reported in [1]. We do this by fixing the variables $\alpha$ and $\beta$ and throw the model into my favorite NLP solver: CONOPT. We only need to fix these two variables, as all other levels will be pinned down by this. This fixed problem solves very fast:

   Iter Phase Ninf   Infeasibility   RGmax    NSB   Step InItr MX OK
      0   0        7.8250000000E+02 (Input point)
 
                   Pre-triangular equations:   42
                   Post-triangular equations:  29
 
      1   0        0.0000000000E+00 (After pre-processing)
      2   0        0.0000000000E+00 (After scaling)
 
 ** Feasible solution. Value of objective =    7.40965962794
 
   Iter Phase Ninf     Objective     RGmax    NSB   Step InItr MX OK
      4   3        7.4096596279E+00 0.0E+00     0
 
 ** Optimal solution. There are no superbasic variables.

Basically, CONOPT recovers the optimal solution during preprocessing. A few more iterations are needed, probably to get the basis right. The objective value is the same as reported in [2]. With this little experiment, we have learned a lot. Most likely our constraints and objective are correctly formulated.

Solution

We are now ready to try to solve the problem. When using $\alpha=0.4$ and $\beta=0.7$ as starting point, CONOPT quickly finds a very good solution. A solution report can look like:

----     75 PARAMETER results  

              y           u           v        yhat         |e|

t1      3.00000     3.00000
t2      5.00000     3.42764     0.37003     3.00000     2.00000
t3      9.00000     4.91004     1.33254     3.79767     5.20233
t4     20.00000     9.18421     3.87786     6.24258    13.75742
t5     12.00000    12.83498     3.68136    13.06207     1.06207
t6     17.00000    16.61976     3.77085    16.51634     0.48366
t7     22.00000    20.73473     4.06861    20.39061     1.60939
t8     23.00000    24.41775     3.73497    24.80334     1.80334
t9     51.00000    33.03795     7.96205    28.15271    22.84729
t10    41.00000    41.00000     7.96205    41.00000
t11    56.00000    50.46691     9.26417    48.96205     7.03795
t12    75.00000    62.99591    12.08915    59.73109    15.26891
t13    60.00000    71.85955     9.29819    75.08506    15.08506
t14    75.00000    79.84108     8.15892    81.15774     6.15774
t15    88.00000    88.00000     8.15892    88.00000


----     76 VARIABLE alpha.L               =      0.21382  
            VARIABLE beta.L                =      0.86528  
            VARIABLE MAE.L                 =      6.59394

Conopt found this solution very quickly:

   Iter Phase Ninf   Infeasibility   RGmax    NSB   Step InItr MX OK
      0   0        7.8250000000E+02 (Input point)
 
                   Pre-triangular equations:   1
                   Post-triangular equations:  5
                   Definitional equations:     39
 
      1   0        1.0173523479E+02 (After pre-processing)
      2   0        3.6729535718E+00 (After scaling)
     10   0    12  3.3409268983E+00               0.0E+00      F  F
 
 ** Feasible solution. Value of objective =    7.07090607760
 
   Iter Phase Ninf     Objective     RGmax    NSB   Step InItr MX OK
     16   3        6.5939399026E+00 2.1E+00     3 1.0E+00    3 T  T
     18   3        6.5939398862E+00 0.0E+00     0
 
 ** Optimal solution. There are no superbasic variables.

When we compare this with the Excel solution from [2], we see the CONOPT solution is actually better.

Optimal Excel Solver Solution (from [2])

What we see is:

Variable	Excel solution [2]	Conopt solution
$\alpha$	0.271817	0.21382
$\beta$	0.598161	0.86528
MAE	6.740208	6.59394

This is typical for non-convex problems: local NLP solvers may converge to a local optimum. One way to get a better handle on this is to use different starting points. Another way is to try a global solver. Using global solvers, we can show that indeed Conopt found the global optimum.

Starting points

Here we try quickly a few local solvers, with different starting points. The QCP (Quadratically Constrained Program) is non-convex. This means we should be prepared for local optimal. When different solvers starting from different initial points converge to the same solution, we have some (limited) confidence that this point is a global optimum. Note that when solving as a general-purpose NLP model, the solver does not know the model is quadratic. The local solver also does not know whether the optimal solution it finds is just a local optimum or a global one. It is noted that with this extended QCP formulation, the solver is provided with exact gradients.

Solver	Starting point	Objective	Iterations
Conopt	$\alpha=0.4,\beta=0.7$	6.5939	18
Conopt	$\alpha=0,\beta=0$	6.5939	29
Conopt	$\alpha=1,\beta=1$	6.5939	17
Minos	$\alpha=0.4,\beta=0.7$	6.5939	8
Minos	$\alpha=0,\beta=0$	6.5939	14
Minos	$\alpha=1,\beta=1$	6.5939	16
Snopt	$\alpha=0.4,\beta=0.7$	6.5939	20
Snopt	$\alpha=0,\beta=0$	6.5939	14
Snopt	$\alpha=1,\beta=1$	6.5939	17
Ipopt	$\alpha=0.4,\beta=0.7$	6.5939	23
Ipopt	$\alpha=0,\beta=0$	6.5939	14
Ipopt	$\alpha=1,\beta=1$	6.5939	26

Local solvers have in general no problem in finding the same optimal solution. This is some evidence this is the global optimum.

Note: I was a bit lazy and left $u_t$ and $v_t$ at their default initial point of zero.

A global approach

The first thing I did, was to throw the above model unmodified at some global NLP solvers. This was a big disappointment for a small model like this:

Solver	Objective	Time	Gap	Note
Baron	6.5939	3600	95%	Time limit
Couenne	6.5939	3600	78%	Time limit
Antigone	6.5939	184	0%	Optimal
Gurobi	6.5946	218	0%	Optimal

Note: The runs with solvers Baron/Couenne/Antigone were done on a somewhat slow laptop. Gurobi ran on a faster (virtual) machine. So timings are not directly comparable between these solvers but are just indicative.

Looking again at our model, we can see that the model is way more non-linear than it needs to be. To fix this, we make the model even larger (more variables, more equations). But also less non-linear. The idea is that we replace terms like \[(x+y)z=xz+yz\] by \[\begin{align} & w=x+y \\ & wz \end{align}\] When we do this for our model, we get:

Non-convex Quadratic Model v2
\[\begin{align}\min\>&\color{darkred}{\mathit{MAE}}=\sum_{t\gt 1} \color{darkred}{\mathit{abserr}}_t \\ & \color{darkred}u_1 = \color{darkblue}y_1 \\ & \color{darkred}v_1 = 0 \\ & \color{darkred}w_t = \color{darkred}u_t+\color{darkred}v_t\\ & \color{darkred}u_t = \color{darkred}\alpha \cdot \color{darkblue}y_t + \color{darkred}w_{t-1} -\color{darkred}\alpha \cdot \color{darkred}w_{t-1}&& t\gt 1\\ & \color{darkred}x_t = \color{darkred}u_t - \color{darkred}u_{t-1} -\color{darkred}v_{t-1} && t\gt 1 \\& \color{darkred}v_t = \color{darkred}\beta \cdot \color{darkred}x_t + \color{darkred}v_{t-1}&& t\gt 1 \\ & \color{darkred}{\hat{y}}_{t+1} = \color{darkred}u_{t}+\color{darkred}v_{t}\\ & -\color{darkred}{\mathit{abserr}}_t \le \color{darkred}{\hat{y}}_{t}- \color{darkblue}y_t \le \color{darkred}{\mathit{abserr}}_t && t\gt 1\\ & \color{darkred}\alpha \in [0,1] \\ & \color{darkred}\beta \in [0,1]\\ & \color{darkred}{\mathit{abserr}}_t \ge 0\end{align}\]

Non-convex Quadratic Model v2

\[\begin{align}\min\>&\color{darkred}{\mathit{MAE}}=\sum_{t\gt 1} \color{darkred}{\mathit{abserr}}_t \\ & \color{darkred}u_1 = \color{darkblue}y_1 \\ & \color{darkred}v_1 = 0 \\ & \color{darkred}w_t = \color{darkred}u_t+\color{darkred}v_t\\ & \color{darkred}u_t = \color{darkred}\alpha \cdot \color{darkblue}y_t + \color{darkred}w_{t-1} -\color{darkred}\alpha \cdot \color{darkred}w_{t-1}&& t\gt 1\\ & \color{darkred}x_t = \color{darkred}u_t - \color{darkred}u_{t-1} -\color{darkred}v_{t-1} && t\gt 1 \\& \color{darkred}v_t = \color{darkred}\beta \cdot \color{darkred}x_t + \color{darkred}v_{t-1}&& t\gt 1 \\ & \color{darkred}{\hat{y}}_{t+1} = \color{darkred}u_{t}+\color{darkred}v_{t}\\ & -\color{darkred}{\mathit{abserr}}_t \le \color{darkred}{\hat{y}}_{t}- \color{darkblue}y_t \le \color{darkred}{\mathit{abserr}}_t && t\gt 1\\ & \color{darkred}\alpha \in [0,1] \\ & \color{darkred}\beta \in [0,1]\\ & \color{darkred}{\mathit{abserr}}_t \ge 0\end{align}\]

This version of the model behaves better:

Solver	Obj	Time	Gap	Note
Baron	6.5939	60	0%	Optimal
Couenne	6.5939	3600	49%	Time limit
Antigone	6.5939	157	0%	Optimal
Gurobi	6.5939	157	0%	Optimal

This reformulation is a big help for Baron, but also helps Gurobi delivering more accurate solutions.

How difficult is this problem?

We saw that Conopt (and other local NLP solvers) found the global optimal solution without any problems. Of course, Conopt does not know whether it is a local or a global solution. So the question arises how difficult is this problem? As it turns out, the problem is actually very well-behaved. Any good local NLP solver will likely find the global optimum, even when feeding it our non-convex QCP problem. The following plot will illuminate this:

Conclusion

The problem of finding optimal parameters $\alpha$ and $\beta$ for the Holt-Winters double exponential smoothing method leads to a non-convex quadratic optimization problem. Local solvers may deliver solutions that are only locally optimal (and thus not necessarily globally optimal). When trying to solve the global problem, it may help to reformulate the model in order to minimize the number of quadratic terms.

In practice, the problem is quite well-behaved, and a good NLP solver should be able to find the global optimum solution quite easily. It looks like Excel's solver is just stopping a bit prematurely. Interestingly quadratic solvers like Cplex and Gurobi are not as fast or suitable as a general-purpose local NLP solver for this problem.

The GAMS source of the first model can be found in [4].

References

Exponential Smoothing, https://en.wikipedia.org/wiki/Exponential_smoothing
Holt's Linear Trend, https://www.real-statistics.com/time-series-analysis/basic-time-series-forecasting/holt-linear-trend/
Handanhal V. Ravinder, Determining The Optimal Values Of Exponential Smoothing Constants – Does Solver Really Work?, American Journal Of Business Education, Volume 6, Number 3, 2013, https://files.eric.ed.gov/fulltext/EJ1054363.pdf. This paper shows some computational experience with Excel on these types of models.
How to setup a minimization problem in GAMS, https://stackoverflow.com/questions/63738161/how-to-setup-a-minimization-problem-in-gams

Variable	Excel solution [2]	Conopt solution
\(\alpha\)	0.271817	0.21382
\(\beta\)	0.598161	0.86528
MAE	6.740208	6.59394

Solver	Starting point	Objective	Iterations
Conopt	\(\alpha=0.4,\beta=0.7\)	6.5939	18
Conopt	\(\alpha=0,\beta=0\)	6.5939	29
Conopt	\(\alpha=1,\beta=1\)	6.5939	17
Minos	\(\alpha=0.4,\beta=0.7\)	6.5939	8
Minos	\(\alpha=0,\beta=0\)	6.5939	14
Minos	\(\alpha=1,\beta=1\)	6.5939	16
Snopt	\(\alpha=0.4,\beta=0.7\)	6.5939	20
Snopt	\(\alpha=0,\beta=0\)	6.5939	14
Snopt	\(\alpha=1,\beta=1\)	6.5939	17
Ipopt	\(\alpha=0.4,\beta=0.7\)	6.5939	23
Ipopt	\(\alpha=0,\beta=0\)	6.5939	14
Ipopt	\(\alpha=1,\beta=1\)	6.5939	26