Tuesday, September 29, 2020

Fix non positive semi-definite covariance matrix in R

In [1], the following R code is proposed to solve a Mean-Variance portfolio optimization problem:

set.seed(123)

nO <- 10L  ## number of observations
nA <- 100L  ## number of assets
mData <- array(rnorm(nO * nA, sd = 0.05),
dim = c(nO, nA)) #Creating sample stock observations

aMat <- array(1, dim = c(1,nA))
bVec <- 1
zeros <- array(0, dim = c(nA,1))

solQP <- solve.QP(cov(mData), zeros, t(aMat), bVec, meq = 1) #Minimize optimization
solQPsolution I added the line that explicitly sets the seed to make the runs reproducible. The model is a bit difficult to decipher in this R code, but it is a simplified traditional MV model: Mean-Variance Portfolio Optimization Model \begin{align} \min\> & \color{darkred}x'\color{darkblue}Q\color{darkred}x - \color{darkblue}\lambda \bar{\color{darkblue}r}'\color{darkred}x\\ & \sum_i \color{darkred}x_i = 1 \\ & \color{darkred}x_i \ge 0\end{align} Instead of having the expected returns in the objective, we also see versions of this model with a linear constraint that puts a minimum value on the expected return of the portfolio. In the example, we have $$\lambda=0$$, so the objective only has a quadratic term to minimize the risk. We don't care about returns. Note that the data only has 10 observations but 100 instruments. This situation can often lead to some problems. Indeed when we run this, we see: > solQP <- solve.QP(cov(mData), zeros, t(aMat), bVec, meq = 1) #Minimize optimization Error in solve.QP(cov(mData), zeros, t(aMat), bVec, meq = 1) : matrix D in quadratic function is not positive definite! > As a side note: the data is just completely random, so we cannot expect useful solutions. However, at least, we should be able to produce some solutions. Solution #1 The covariance matrix is in theory positive-semi definite. We can see this from using the definition of the covariance [2]. However, due to floating-point inaccuracies, we can end up with a covariance matrix that is just slightly non-positive semi-definite. We can see this by inspecting the eigenvalues: > Q <- cov(mData) > eigen(Q)values
[1]  3.977635e-02  3.789523e-02  3.258164e-02  3.093782e-02  2.605389e-02  2.466463e-02  2.127899e-02
[8]  1.897506e-02  1.627900e-02  1.871945e-17  1.454315e-17  8.114384e-18  5.109196e-18  4.392427e-18
[15]  3.680766e-18  2.728485e-18  2.649302e-18  2.192073e-18  1.937634e-18  1.914487e-18  1.856811e-18
[22]  1.449321e-18  1.391434e-18  1.211216e-18  1.196721e-18  1.127119e-18  9.776146e-19  9.670197e-19
[29]  8.257462e-19  7.572554e-19  6.283522e-19  6.030820e-19  4.056541e-19  3.610128e-19  3.604800e-19
[36]  2.770106e-19  2.530797e-19  2.263314e-19  2.133854e-19  2.089700e-19  2.024130e-19  2.006560e-19
[43]  1.688435e-19  9.412675e-20  7.033666e-20  7.012531e-20  5.942729e-20  5.673501e-20  4.444257e-20
[50]  4.259787e-20  3.124233e-21 -2.017011e-20 -2.269857e-20 -3.034668e-20 -3.075493e-20 -3.262642e-20
[57] -4.345710e-20 -6.351674e-20 -7.347496e-20 -7.739535e-20 -8.065131e-20 -9.318415e-20 -1.004645e-19
[64] -1.048992e-19 -1.185271e-19 -1.315539e-19 -1.362383e-19 -1.376705e-19 -1.456396e-19 -1.697860e-19
[71] -1.700317e-19 -1.745068e-19 -1.794480e-19 -1.835296e-19 -1.960933e-19 -2.017457e-19 -2.332259e-19
[78] -2.377082e-19 -3.232830e-19 -3.468225e-19 -3.844707e-19 -4.310383e-19 -4.467493e-19 -4.540707e-19
[85] -4.977900e-19 -6.159103e-19 -6.338311e-19 -6.484161e-19 -6.489508e-19 -6.551306e-19 -6.664641e-19
[92] -7.276688e-19 -7.422977e-19 -7.927766e-19 -1.106209e-18 -1.707077e-18 -1.742870e-18 -2.465438e-18
[99] -1.859346e-17 -3.473638e-17

Most of the eigenvalues are basically zero with quite a few just a bit negative. A positive-definite matrix would have only positive values. A positive semi-definite (PSD) matrix would allow positive and zero eigenvalues but no negative ones.

There are algorithms to find a close-by positive-definite matrix. In R this is available as nearPD() in the Matrix package. Let's try this out:

> Q <- nearPD(cov(mData))$mat > solQP <- solve.QP(Q, zeros, t(aMat), bVec, meq = 1) > solQP$solution
[1]  0.007861808  0.004649581  0.012677143  0.009135964  0.011926439  0.013624409  0.007916747  0.004307772
[9]  0.011808486  0.011085302  0.010229029  0.013080240  0.010085504  0.010998879  0.011099784  0.008520520
[17]  0.007451730  0.012424371  0.012677330  0.007013515  0.009070013  0.010422999  0.011041407  0.008768456
[25]  0.007594051  0.010845828  0.008091472  0.010219700  0.017100764  0.005583947  0.011981802  0.007366402
[33]  0.013770522  0.009063828  0.010744898  0.007836608  0.009580356  0.007065588  0.008219370  0.010592597
[41]  0.011775598  0.011112051  0.005555992  0.014596684  0.016019481  0.008870919  0.010940604  0.005662200
[49]  0.007056997  0.012274299  0.006510091  0.012764160  0.008258651  0.009945869  0.009891680  0.007318763
[57]  0.014960558  0.008255708  0.009655097  0.010054760  0.007309219  0.008761360  0.011616297  0.010300254
[65]  0.012043733  0.011963495  0.009991699  0.010353212  0.013896332  0.011619403  0.007160011  0.007202327
[73]  0.008530819  0.010918658  0.004754215  0.008877062  0.008526937  0.010251250  0.008190446  0.006754629
[81]  0.013209816  0.014849748  0.012213860  0.008888433  0.018188014  0.009756690  0.014786487  0.014893706
[89]  0.009737739  0.013256532  0.009990737 -0.001155889  0.009264992  0.007733719  0.014505131  0.011040198
[97]  0.012859850  0.008987104  0.008458713  0.004497735

Much better. When printing the eigenvalues of the new Q matrix, we see:

> eigen(Q)values [1] 3.977635e-02 3.789522e-02 3.258164e-02 3.093782e-02 2.605389e-02 2.466462e-02 2.127899e-02 1.897506e-02 [9] 1.627900e-02 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 [17] 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 [25] 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 [33] 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 [41] 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 [49] 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 [57] 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 [65] 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 [73] 3.977635e-10 3.977635e-10 3.977635e-10 3.977635e-10 3.977634e-10 3.977634e-10 3.977634e-10 3.977634e-10 [81] 3.977634e-10 3.977634e-10 3.977634e-10 3.977634e-10 3.977634e-10 3.977634e-10 3.977634e-10 3.977634e-10 [89] 3.977634e-10 3.977634e-10 3.977634e-10 3.977634e-10 3.977634e-10 3.977634e-10 3.977634e-10 3.977634e-10 [97] 3.977634e-10 3.977634e-10 3.977633e-10 3.977633e-10 All small (and negative) eigenvalues are now mapped to some tolerance. This means the Q matrix is now numerically positive definite. Solution #2 A simple DIY approach would be to fix the Q matrix by adding a small number to the diagonal. The following code shows a small perturbation is enough: > Q <- cov(mData) + diag(1e-6,nA) Some solvers do something like this automatically. Fixing the model Let's first fix our model a bit: • We want to pay attention to returns. So let's add a constraint that the expected portfolio returns are at least 0.035. This also has the effect that just a few assets will be in our solution. Which makes them easier to compare. The constraint can be stated as $\sum_i \bar{r}_i \ge 0.035$ where $\bar{r}_i = \frac{\sum_t r_{t,i}}{T}$ are the mean returns. Here $$T$$ indicates the number of observations, and $$r_{t,i}$$ are the returns. • We also want to explicitly state that variables are non-negative: $$x_i \ge 0$$. This is called "no short-selling", The portfolio model becomes: Portfolio Model algebraic formulation matrix notation \begin{align} \min & \sum_{i,j} \color{darkblue}q_{i,j} \color{darkred}x_i \color{darkred}x_j \\ & \sum_i \color{darkred}x_i = 1 \\ & \sum_i \bar{\color{darkblue}r}_i \color{darkred}x_i \ge 0.035\\ & \color{darkred}x_i \ge 0\end{align}\begin{align} \min \>& \color{darkred}x' \color{darkblue}Q \color{darkred}x \\ & \color{darkblue}e'\color{darkred}x = 1 \\ & \bar{\color{darkblue}r}' \color{darkred}x \ge 0.035\\ & \color{darkred}x \ge 0\end{align} QuadProg does not have facilities for bounds on the variables so we need to form explicit constraints. Its input-model looks like: \begin{align} \min \> & -d'b + \frac{1}{2} b'Db \\ & A'b\ge b_0\end{align} where the first meq constraints are equalities. The factor 0.5 is a relic from quadratic optimization (it makes the gradient a bit easier). If the model includes linear objective coefficients, it is important to take this 0.5 factor into account. Our model has only quadratic objective coefficients, so we don't care in this case. All in all, this is often a bit of an inconvenient input format. R trick. In the examples below we want to display a sparse vector. A simple trick using names can help here. > v <- c(0,0,3,0,0,0,1,0,0,0,2,0,0,0,0) > v [1] 0 0 3 0 0 0 1 0 0 0 2 0 0 0 0 > names(v) <- paste("v",1:length(v),sep="") > v[v>0] v3 v7 v11 3 1 2 The R code for our optimization model can look like: > Q <- nearPD(cov(mData))mat
> aMat <- rbind(
+     c(rep(1,nA)),     # sum(x)=1
+     colMeans(mData),  # sum(mu(i)*x(i))>=0.035
+     diag(1,nA))       # for bounds x(i)>=0
> bVec <- c(1,
+          0.035,
+          rep(0,nA))
> solQP <- solve.QP(Q, zeros, t(aMat), bVec, meq = 1)
> xsol <- solQP$solution > names(xsol) = paste("x",1:100,sep="") > xsol[xsol>1e-4] x30 x75 x88 0.1465990 0.5314489 0.3219522 Another issue with QuadProg is that requires a strictly positive definite matrix. This also means it does not allow linear variables: all variables need appear to quadratically in the objective (and without all zero quadratic coefficients). After all, if the quadratic coefficient matrix would look like $D = \left[\begin{array}{c|c}Q & 0 \\ \hline 0 & 0 \end{array} \right]$ we would end up with a matrix that is not strictly positive-definite. Often, for these types of models, CVXR is a more agreeable tool. Here is the same model in CVXR, and solved with the OSQP QP solver: > library(CVXR) > x <- Variable(nA) > w <- Variable(nO) > prob <- Problem( + Minimize(quad_form(x,cov(mData))), + list(sum(x) == 1, + t(colMeans(mData)) %*% x >= 0.035, + x >= 0) + ) > result <- solve(prob, verbose=T) ----------------------------------------------------------------- OSQP v0.6.0 - Operator Splitting QP Solver (c) Bartolomeo Stellato, Goran Banjac University of Oxford - Stanford University 2019 ----------------------------------------------------------------- problem: variables n = 100, constraints m = 102 nnz(P) + nnz(A) = 5350 settings: linear system solver = qdldl, eps_abs = 1.0e-05, eps_rel = 1.0e-05, eps_prim_inf = 1.0e-04, eps_dual_inf = 1.0e-04, rho = 1.00e-01 (adaptive), sigma = 1.00e-06, alpha = 1.60, max_iter = 10000 check_termination: on (interval 25), scaling: on, scaled_termination: off warm start: on, polish: on, time_limit: off iter objective pri res dua res rho time 1 0.0000e+00 1.00e+00 1.00e+02 1.00e-01 2.88e-03s 200 8.7535e-04 5.96e-04 1.36e-06 1.75e-02 9.38e-03s 400 9.2707e-04 5.09e-04 1.16e-06 1.75e-02 1.41e-02s 600 9.7703e-04 4.36e-04 9.94e-07 1.75e-02 1.99e-02s 800 1.0239e-03 3.72e-04 8.50e-07 1.75e-02 2.58e-02s 1000 1.0671e-03 3.18e-04 7.27e-07 1.75e-02 3.11e-02s 1200 1.1062e-03 2.72e-04 6.21e-07 1.75e-02 3.76e-02s 1400 1.1413e-03 2.33e-04 5.31e-07 1.75e-02 4.35e-02s 1600 1.1726e-03 1.99e-04 4.54e-07 1.75e-02 4.90e-02s 1800 1.2001e-03 1.70e-04 3.88e-07 1.75e-02 5.51e-02s 2000 1.2243e-03 1.45e-04 3.32e-07 1.75e-02 6.20e-02s 2200 1.2455e-03 1.24e-04 2.84e-07 1.75e-02 6.81e-02s 2400 1.2639e-03 1.06e-04 2.43e-07 1.75e-02 7.33e-02s 2600 1.2799e-03 9.09e-05 2.07e-07 1.75e-02 7.83e-02s 2800 1.2938e-03 7.77e-05 1.77e-07 1.75e-02 8.51e-02s 3000 1.3058e-03 6.65e-05 1.52e-07 1.75e-02 9.20e-02s 3200 1.3161e-03 5.68e-05 1.30e-07 1.75e-02 1.00e-01s 3400 1.3251e-03 4.86e-05 1.11e-07 1.75e-02 1.08e-01s 3600 1.3327e-03 4.15e-05 9.48e-08 1.75e-02 1.21e-01s 3800 1.3394e-03 3.55e-05 8.11e-08 1.75e-02 1.28e-01s 4000 1.3450e-03 3.04e-05 6.93e-08 1.75e-02 1.35e-01s 4200 1.3499e-03 2.60e-05 5.92e-08 1.75e-02 1.43e-01s 4400 1.3541e-03 2.22e-05 5.07e-08 1.75e-02 1.50e-01s 4550 1.3568e-03 1.97e-05 4.50e-08 1.75e-02 1.56e-01s plsh 1.3790e-03 3.33e-16 1.21e-15 --------- 1.58e-01s status: solved solution polish: successful number of iterations: 4550 optimal objective: 0.0014 run time: 1.58e-01s optimal rho estimate: 2.58e-02 > xsol <- result$getValue(x)
> names(xsol) = paste("x",1:100,sep="")
> xsol[xsol>1e-4]
x30       x75       x88
0.1465989 0.5314489 0.3219522
>

Here we use our covariance matrix directly. Both CVXR and OSQP accept it, even though it has tiny negative numbers. However, the OSQP documentation warns that this may be a bit dangerous [3]:

We recommend the user to check the convexity of their problem before passing it to OSQP! If the user passes a non-convex problem we do not assure the solver will be able to detect it.

OSQP will try to detect non-convex problems by checking if the residuals diverge or if there are any issues in the initial factorization (if a direct method is used). It will detect non-convex problems when one or more of the eigenvalues of P are “clearly” negative, i.e., when P + sigma * I is not positive semidefinite. However, it might fail to detect non-convexity when P has slightly negative eigenvalues, i.e., when P + sigma * I is positive semidefinite and P is not.

Notice that CVXR will complain if the Q matrix is really non-positive semi-definite.

Solution #3

In [2] a different model is proposed for these situations where we have more instruments than observations.

\begin{align} \min\> & \sum_t \color{darkred}w_t^2 \\ &\color{darkred}w_t = \sum_i (\color{darkblue}r_{t,i} - \bar{\color{darkblue}r}_i)\color{darkred}x_i \\ & \sum_i \color{darkred}x_i = 1 \\ & \sum_i \bar{\color{darkblue}r}_i \color{darkred}x_i \ge 0.035\\ & \color{darkred}x_i \ge 0\end{align}

Some of the advantages of this model are:
• The quadratic objective is benign. It is sparse and safely positive definite.
• The data is smaller than a full-blown covariance matrix given that the number of observations is smaller than the number of instruments.
• The number of quadratic variables is smaller (again, because $$T<N$$).

This model is easily implemented in CVXR:

> library(CVXR)
> x <- Variable(nA)
> w <- Variable(nO)
> adjRet <- mData - matrix(colMeans(mData),nrow=nO,ncol=nA,byrow=T)
> prob <- Problem(
+     Minimize(sum_squares(w)),
+     list(w == adjRet %*% x,
+          sum(x)==1,
+          t(colMeans(mData)) %*% x >= 0.035,
+          x >= 0)
+     )
> result <- solve(prob, verbose=T)
-----------------------------------------------------------------
OSQP v0.6.0  -  Operator Splitting QP Solver
(c) Bartolomeo Stellato,  Goran Banjac
University of Oxford  -  Stanford University 2019
-----------------------------------------------------------------
problem:  variables n = 110, constraints m = 112
nnz(P) + nnz(A) = 1320
settings: linear system solver = qdldl,
eps_abs = 1.0e-05, eps_rel = 1.0e-05,
eps_prim_inf = 1.0e-04, eps_dual_inf = 1.0e-04,
sigma = 1.00e-06, alpha = 1.60, max_iter = 10000
check_termination: on (interval 25),
scaling: on, scaled_termination: off
warm start: on, polish: on, time_limit: off

iter  objective    pri res    dua res    rho        time
1   0.0000e+00   1.00e+00   1.00e+02   1.00e-01   1.15e-03s
200   5.3515e-03   1.49e-03   4.04e-05   1.00e-01   4.33e-03s
400   7.0787e-03   7.87e-04   6.50e-06   1.00e-01   8.41e-03s
600   7.3521e-03   7.13e-04   5.90e-06   1.00e-01   1.23e-02s
800   7.6413e-03   6.45e-04   5.34e-06   1.00e-01   1.61e-02s
1000   7.9373e-03   5.84e-04   4.84e-06   1.00e-01   1.93e-02s
1200   8.2334e-03   5.29e-04   4.38e-06   1.00e-01   2.27e-02s
1400   8.5244e-03   4.79e-04   3.96e-06   1.00e-01   2.64e-02s
1600   8.8067e-03   4.33e-04   3.59e-06   1.00e-01   3.00e-02s
1800   9.0778e-03   3.92e-04   3.25e-06   1.00e-01   3.34e-02s
2000   9.3358e-03   3.55e-04   2.94e-06   1.00e-01   3.69e-02s
2200   9.5798e-03   3.22e-04   2.66e-06   1.00e-01   4.05e-02s
2400   9.8091e-03   2.91e-04   2.41e-06   1.00e-01   4.43e-02s
2600   1.0024e-02   2.64e-04   2.18e-06   1.00e-01   4.81e-02s
2800   1.0224e-02   2.39e-04   1.98e-06   1.00e-01   5.12e-02s
3000   1.0410e-02   2.16e-04   1.79e-06   1.00e-01   5.48e-02s
3200   1.0582e-02   1.96e-04   1.62e-06   1.00e-01   5.84e-02s
3400   1.0741e-02   1.77e-04   1.47e-06   1.00e-01   6.19e-02s
3600   1.0887e-02   1.60e-04   1.33e-06   1.00e-01   6.55e-02s
3800   1.1022e-02   1.45e-04   1.20e-06   1.00e-01   6.90e-02s
4000   1.1145e-02   1.31e-04   1.09e-06   1.00e-01   7.53e-02s
4200   1.1259e-02   1.19e-04   9.85e-07   1.00e-01   8.18e-02s
4400   1.1362e-02   1.08e-04   8.92e-07   1.00e-01   8.95e-02s
4600   1.1457e-02   9.75e-05   8.07e-07   1.00e-01   9.44e-02s
4800   1.1544e-02   8.83e-05   7.31e-07   1.00e-01   1.04e-01s
5000   1.1623e-02   7.99e-05   6.62e-07   1.00e-01   1.09e-01s
5200   1.1695e-02   7.24e-05   5.99e-07   1.00e-01   1.13e-01s
5400   1.1761e-02   6.55e-05   5.42e-07   1.00e-01   1.20e-01s
5600   1.1821e-02   5.93e-05   4.91e-07   1.00e-01   1.26e-01s
5800   1.1876e-02   5.37e-05   4.45e-07   1.00e-01   1.31e-01s
6000   1.1925e-02   4.86e-05   4.02e-07   1.00e-01   1.37e-01s
6200   1.1970e-02   4.40e-05   3.64e-07   1.00e-01   1.45e-01s
6400   1.2011e-02   3.98e-05   3.30e-07   1.00e-01   1.58e-01s
6600   1.2049e-02   3.61e-05   2.99e-07   1.00e-01   1.72e-01s
6800   1.2082e-02   3.27e-05   2.70e-07   1.00e-01   1.81e-01s
7000   1.2113e-02   2.96e-05   2.45e-07   1.00e-01   1.90e-01s
7200   1.2141e-02   2.68e-05   2.22e-07   1.00e-01   1.98e-01s
7400   1.2166e-02   2.42e-05   2.01e-07   1.00e-01   2.01e-01s
7600   1.2189e-02   2.19e-05   1.82e-07   1.00e-01   2.05e-01s
7800   1.2210e-02   1.99e-05   1.64e-07   1.00e-01   2.08e-01s
plsh   1.2411e-02   6.66e-16   1.28e-15  ---------   2.09e-01s

status:               solved
solution polish:      successful
number of iterations: 7800
optimal objective:    0.0124
run time:             2.09e-01s
optimal rho estimate: 3.27e-01

> xsol <- result$getValue(x) > names(xsol) = paste("x",1:100,sep="") > xsol[xsol>1e-4] x30 x75 x88 0.1465989 0.5314489 0.3219522 Using QuadProg this is a much more difficult exercise. The quadratic coefficient matrix $$D$$ will look like: $D = \left[\begin{array}{c|c}Q & 0 \\ \hline 0 & \varepsilon I \end{array} \right]$ The R code can look like: > Q <- nearPD(cov(mData))$mat
> D <- rbind(
+   cbind(Q,matrix(0,nrow=nA,ncol=nO)),
+   cbind(matrix(0,nrow=nO,ncol=nA),diag(1e-6,nO))
+ )
> aMat <- rbind(
+   c(rep(1,nA),rep(0,nO)),        # sum(x)=1
+   c(colMeans(mData),rep(0,nO)),  # sum(mu(i)*x(i))>=0.035
+   cbind(diag(1,nA),matrix(0,nrow=nA,ncol=nO)) # for bounds x(i)>=0
+ )
> bVec <- c(rep(0,nO),
+           1,
+           0.035,
+           rep(0,nA))
> zeros <- rep(0,nA+nO)
> solQP <- solve.QP(D, zeros, t(aMat), bVec, meq = nO+1)
> xsol <- solQPsolution[1:nA] > names(xsol) = paste("x",1:100,sep="") > xsol[xsol>1e-4] x30 x75 x88 0.1465990 0.5314489 0.3219522 Conclusion QuadProg is a well-known QP solver for R. Unfortunately it has some drawbacks: • It is not very suited for more complex models: the matrix oriented input format is just too cumbersome. • The quadratic coefficient matrix must be strictly positive definite. • All variables are quadratic, linear variables are not available. For quick experimentation with these types of models, CVXR is likely a better environment. This exercise showed: • The models are more readable, assuming you are familiar with matrix notation. • We can formulate models with linear variables without a problem. • Models with a quadratic coefficient matrix that are borderline positive semi-definite are accepted by both CVXR and the solver I used: OSQP. Two useful R tricks are discussed: • nearPD(A) to find a positive-definite matrix close to A. • Printing a sparse vector using names. References Wednesday, September 23, 2020 Limiting the number of changes in the operating mode of a machine Limiting the number of changes in the operating mode of a machine in a scheduling problem can be seen as a direct extension of limiting the number of start-ups. Limit number of start-ups Suppose we have a machine scheduling problem, where we use the decision variable $x_t = \begin{cases} 1 & \text{if machine is operating at time slot t}\\ 0 & \text{if machine is idle at time slot t}\end{cases}$ To limit the number number of start-ups we introduce a binary variable $$\mathit{start}_t \in \{0,1\}$$ and write: \begin{align} & \mathit{start}_t \ge x_t - x_{t-1}\\ & \sum_t \mathit{start}_t \le U\end{align} The idea is as follows: Actually we are only sure that $x_{t-1}=0, x_t = 1 \Rightarrow \mathit{start}_t =1$ so a better picture is: as the $$\mathit{start}_t$$ is undetermined when we do not switch from 0 to 1. Notes: • This formulation works for $$\sum_t \mathit{start}_t \le U$$ or minimize $$\sum_t \mathit{start}_t$$ • It is not suited for $$\sum_t \mathit{start}_t = U$$, $$\sum_t \mathit{start}_t \ge U$$ or maximize $$\sum_t \mathit{start}_t$$ • We can relax $$\mathit{start}_t$$ to $$\mathit{start}_t \in [0,1]$$, i.e. continuous between 0 and 1 Limit the number of mode changes Let's now assume that the machine can operate in different modes. One could be inclined to use an integer variable: $$\mathit{mode}_t \in \{0,1,2,\dots,M\}$$. Easier is to use a binary variable $\mathit{mode}_{m,t} = \begin{cases} 1 & \text{if the machine is operating in mode $$m$$ in time slot $$t$$} \\ 0 & \text{otherwise}\end{cases}$ A mode switch from $$m1$$ to $$m2$$ means: In other words, by just focusing on the mode that is being activated, limiting the number of mode switches can be interpreted as limiting the number start-ups for each mode: \begin{align} & \mathit{switch}_t \ge \mathit{mode}_{m,t} - \mathit{mode}_{m,t-1} && \forall m,t\\ & \sum_t \mathit{switch}_t \le U\end{align} Again, this works whether we have an upper limit on the number of mode changes or just minimize this number. References 1. Maximizing number of consecutive days with the same operation mode for a scheduling problem, https://stackoverflow.com/questions/64012796/maximizing-number-of-consecutive-days-with-the-same-operation-mode-for-a-schedul Monday, September 14, 2020 Cover points with tiles In [1] a problem is posted: Given $$n$$ points and a collection of tiles (rectangles) of given size, try to cover all all points with the minimum number of tiles. There are a number of things to consider: • Are the tiles of equal size or have they different sizes? • Can tiles overlap? • After minimizing the number of tiles, should we minimize the total size of the selected tiles, so we select the smaller tiles when possible. Data Let's generate some data, assuming tiles have different sizes: ---- 19 PARAMETER p location of points x y p1 17.175 84.327 p2 55.038 30.114 p3 29.221 22.405 p4 34.983 85.627 p5 6.711 50.021 p6 99.812 57.873 p7 99.113 76.225 p8 13.069 63.972 p9 15.952 25.008 p10 66.893 43.536 p11 35.970 35.144 p12 13.149 15.010 p13 58.911 83.089 p14 23.082 66.573 p15 77.586 30.366 p16 11.049 50.238 p17 16.017 87.246 p18 26.511 28.581 p19 59.396 72.272 p20 62.825 46.380 p21 41.331 11.770 p22 31.421 4.655 p23 33.855 18.210 p24 64.573 56.075 p25 76.996 29.781 p26 66.111 75.582 p27 62.745 28.386 p28 8.642 10.251 p29 64.125 54.531 p30 3.152 79.236 ---- 19 PARAMETER r size of rectangles x y r1 12.183 15.270 r2 25.769 32.506 r3 15.344 11.024 r4 27.554 28.637 r5 21.681 20.761 r6 17.291 17.393 r7 13.915 38.003 r8 21.398 33.502 r9 19.001 13.764 r10 32.466 12.077  Random points Model To model this problem, we introduce binary variables $$\mathit{select}_j \in \{0,1\}$$: $\mathit{select}_j = \begin{cases} 1 & \text{if tile j is selected}\\ 0 & \text{otherwise}\end{cases}$ and a continuous variable $$\mathit{place}_{j,c}$$ indicating the $$x$$ and $$y$$ coordinates of (selected) tile $$j$$. To be precise: it is the left-lower corner of the rectangle. The set $$c$$ is simply $$c=\{x,y\}$$. In addition we need a variable indicating if a point is covered by a tile:$\mathit{cov}_{i,j}=\begin{cases} 1 & \text{if point i is covered by tile j} \\ 0 &\text{otherwise}\end{cases}$ The model can look like: MIP Model \begin{align}\min &\sum_j \color{darkred}{\mathit{select}}_j \\ & \color{darkred}{\mathit{place}}_{j,c} \le \color{darkblue}p_{i,c}+\color{darkblue}M\cdot(1-\color{darkred}{\mathit{cov}}_{i,j}) && \forall i,j,c \\ &\color{darkred}{\mathit{place}}_{j,c}+\color{darkblue}r_{j,c}\ge\color{darkblue}p_{i,c}-\color{darkblue}M\cdot(1-\color{darkred}{\mathit{cov}}_{i,j}) && \forall i,j,c \\ & \color{darkred}{\mathit{cov}}_{i,j} \le \color{darkred}{\mathit{select}}_j && \forall i,j \\ & \sum_j \color{darkred}{\mathit{cov}}_{i,j} \ge 1 && \forall i \\ & \color{darkred}{\mathit{select}}_j \in \{0,1\} \\ & \color{darkred}{\mathit{cov}}_{i,j} \in \{0,1\} \end{align} Our points are drawn from $p_{i,c}\sim U(0,100)$ So it make sense to restrict the placement of tiles to: $0 \le \mathit{place}_{j,c} \le 100-r_{j,c}$ Using this, a good value for $$M$$ is $$M=100$$. Results ---- 55 VARIABLE z.L = 7.000 objective ---- 55 PARAMETER tile placement of tiles x y w h r1 53.928 67.819 12.183 15.270 r2 5.652 25.769 32.506 r4 27.484 6.507 27.554 28.637 r5 78.131 55.464 21.681 20.761 r6 5.791 49.181 17.291 17.393 r8 56.188 22.573 21.398 33.502 r10 2.517 75.169 32.466 12.077  Points are covered by tiles No overlap We can add no-overlap constraints [2] to the model soo that no pair of tiles can overlap. Just to be sure, I'll allow tiles to go outside the area $$[0,100]\times [0,100]$$. Of course, that also has an impact on the value for $$M$$. The new solution looks like:  With no-overlap constraints Same-sized tiles If all tiles are identical, we just have a special case of our original problem. It may make sense to add an ordering constraint: $\mathit{select}_j \le \mathit{select}_{j-1}$ We can also consider additional symmetry-breaking constraints, such as $\mathit{place}_{j,x} \ge \mathit{place}_{j,x-1}$ It requires some experimentation if these constraints pay off in performance.  All tiles have the same size In this example all the times are $$(20 \times 20)$$. Below is a larger example with $$n=50$$ points and $$(10 \times 10)$$ tiles.  Larger example Set-covering formulation for equal-sized tiles If the problem has all equal-sized tiles, we can employ a very different algorithm. For this we need to generate candidate locations in advance. One way to generate those is as follows: 1. $$k:=0$$ 2. For each point with coordinates $$(x_1,y_1)$$: 1. For each point with coordinates $$(x_2,y_2)$$ such that $$x_1 \le x_2 \le x_1+w$$ and $$y_1-h\le y_2\le y_1+h$$: 1. Form the candidate rectangle $$(x_1,y_2)-(x_1+w,y_2+h)$$ 2. $$k := k+1$$ 3. Store as $$\mathit{loc}_{k,c}=(x_1,y_2)$$ 4. Find all points $$i$$ inside the candidate rectangle, and store this inside a mapping $$\mathit{contain}_{k,i}$$ Now we can solve the set-covering problem [3]: Set-covering Model \begin{align}\min & \sum_j \color{darkred}{\mathit{select}}_j \\ & \sum_j \color{darkblue}{\mathit{contain}}_{j,i} \cdot \color{darkred}{\mathit{select}}_j \ge 1 && \forall i \\ & \color{darkred}{\mathit{select}}_j \in \{0,1\}\end{align} where $\mathit{contain}_{j,i} = \begin{cases} 1 & \text{if candidate rectangle j contains point i}\\ 0 & \text{otherwise}\end{cases}$ This model solves very fast. A multi-criteria version Instead of just minimizing the number of tiles, we can subsequently minimize the total area covered by the tiles. I.e. use smaller tiles if possible. This is essentially a multi-objective problem. We used here a lexicographic approach (i.e. do two solves in sequence). In our experiment, we reduced the total size of the selected tiles from 3672.5809 to 3004.0812.  Multi-objective model results Obviously, this additional step makes only sense if the tiles have different sizes. GAMS Models As requested, here are some of the source files. Model for tiles with different sizes.  set i 'points to cover' /p1*p30/ j 'rectangles' /r1*r10/ c 'coordinates' /x,y/ ; parameter A 'side of area' / 100 / M 'big-M' p(i,c) 'location of points' r(j,c) 'size of rectangles' ; p(i,c) = uniform(0,A); r(j,c) = uniform(10,40); M = A; display p,r; binary variable select(j) 'select tile'; positive variable place(j,c) 'placement of tile'; place.up(j,c) = A-r(j,c); variable z 'objective'; binary variable cov(i,j); equations obj cover1 cover2 cover3 cover4 ; obj.. z =e= sum(j, select(j)); cover1(i,j,c).. place(j,c) =l= p(i,c)+M*(1-cov(i,j)); cover2(i,j,c).. place(j,c)+r(j,c) =g= p(i,c)-M*(1-cov(i,j)); cover3(i,j).. cov(i,j) =l= select(j); cover4(i).. sum(j,cov(i,j)) =g= 1; model m1 /all/; option optcr=0, threads=8; solve m1 minimizing z using mip; select.l(j) = round(select.l(j)); parameter tile(j,*) 'placement of tiles'; tile(j,c) = place.l(j,c)*select.l(j); tile(j,'w')select.l(j) =  r(j,'x'); tile(j,'h')$select.l(j) = r(j,'y'); display z.l,tile; Set-covering model for equal-sized tiles.  set i 'points to cover' /p1*p50/ j 'candidate rectangles' /r1*r1000/ contain(j,i) 'points contained in rectangle j' c 'coordinates' /x,y/ ; parameter A 'side of area' / 100 / p(i,c) 'location of points' r(c) 'size of rectangles' /x 10, y 10/ loc(j,c) 'locations of candidate rectangles' ; p(i,c) = uniform(0,A); display p,r; *------------------------------------------------------------ * generate candidate rectangle locations *------------------------------------------------------------ alias (i,ii,iii); scalar k / 0 /; loop(i, loop(ii$(p(ii,'x') >= p(i,'x') and p(ii,'x') <= p(i,'x')+r('x') and                abs(p(ii,'y')-p(i,'y')) <= r('y')),       k = k + 1;       loop(j$(ord(j)=k), loc(j,'x') = p(i,'x'); loc(j,'y') = p(ii,'y')-r('y'); contain(j,iii) = p(iii,'x') >= loc(j,'x') and p(iii,'x') <= loc(j,'x')+r('x') and p(iii,'y') >= loc(j,'y') and p(iii,'y') <= loc(j,'y')+r('y') ); ); ); *------------------------------------------------------------ * set covering model *------------------------------------------------------------ binary variable select(j) 'select tile'; variable z 'objective'; equations obj cover(i) ; obj.. z =e= sum(j, select(j)); cover(i).. sum(contain(j,i),select(j)) =g= 1; model m1 /all/; option optcr=0; solve m1 minimizing z using mip; *------------------------------------------------------------ * reporting *------------------------------------------------------------ parameter tile(j,*) 'placement of tiles'; tile(j,c) = loc(j,c)*select.l(j); tile(j,'w')$select.l(j) =  r('x'); tile(j,'h')\$select.l(j) =  r('y'); display z.l,tile;
The R code to plot the results looks like:

library(ggplot2)

p1       17.175      84.327
p2       55.038      30.114
p3       29.221      22.405
p4       34.983      85.627
p5        6.711      50.021
p6       99.812      57.873
p7       99.113      76.225
p8       13.069      63.972
p9       15.952      25.008
p10      66.893      43.536
p11      35.970      35.144
p12      13.149      15.010
p13      58.911      83.089
p14      23.082      66.573
p15      77.586      30.366
p16      11.049      50.238
p17      16.017      87.246
p18      26.511      28.581
p19      59.396      72.272
p20      62.825      46.380
p21      41.331      11.770
p22      31.421       4.655
p23      33.855      18.210
p24      64.573      56.075
p25      76.996      29.781
p26      66.111      75.582
p27      62.745      28.386
p28       8.642      10.251
p29      64.125      54.531
p30       3.152      79.236

r <- read.table(text="r x y w h
r1       53.928      67.819      12.183      15.270
r2       15.562       2.638      25.769      32.506
r4       50.032      27.438      27.554      28.637
r5        7.540       4.247      21.681      20.761
r6        5.791      49.181      17.291      17.393
r7       85.897      38.222      13.915      38.003
r10       2.517      75.169      32.466      12.077

ggplot(pt, aes(x=x , y=y)) + geom_point() +
geom_rect(data=r, aes(xmin=x,xmax=x+w,ymin=y,ymax=y+h,color=r,fill=r,alpha=0.1))+
guides(alpha=FALSE)

Conclusion

The small MIP model presented above can be used as a basis for alternative, related problems. For equal-sized tiles, a faster set-covering model is shown.

Saturday, September 5, 2020

Optimal double exponential smoothing

Holt-Winters Double Exponential Smoothing method is a well-known method for smoothing data with a trend [1].  In [2] an Excel implementation was demonstrated. The formulas (from [2]) are: \begin{align} &u_1 = y_1 \\ & v_1 = 0 \\& u_t = \alpha y_t + (1-\alpha) (u_{t-1}+v_{t-1})&& t\gt 1 \\ &v_t = \beta (u_t - u_{t-1}) + (1-\beta)v_{t-1}&& t\gt 1 \\ & \hat{y}_{t+1} = u_{t}+v_{t} \end{align} The symbols are:

• $$y_t$$ is the data,
• $$u_t$$ and $$v_t$$ are estimates of the smoothed value and the trend,
• $$\alpha$$ and $$\beta$$ are parameters. We have $$0 \le \alpha \le 1$$ and $$0 \le \beta \le 1$$.
• $$\hat{y}_t$$ is the estimate for $$y$$.

In [2] the MAE (Mean Absolute Error) is minimized. This is $\min \>\mathit{MAE}= \frac{\displaystyle\sum_{t\gt1} |\hat{y}_t - y_t|}{T-1}$ where $$T$$ is the number of time periods. Of course, in practice a more popular measure is to minimize the Mean Squared Error. One reason for choosing a least absolute value criterion is that this provides robustness when outliers are present.

Data

I use the small data set from [2]:

----     23 PARAMETER y  data

t1   3.000,    t2   5.000,    t3   9.000,    t4  20.000,    t5  12.000,    t6  17.000,    t7  22.000,    t8  23.000
t9  51.000,    t10 41.000,    t11 56.000,    t12 75.000,    t13 60.000,    t14 75.000,    t15 88.000

Model

Often, in statistical software, the MSE version of the problem is modeled as a black-box function of just  $$\alpha$$ and $$\beta$$ and solved using say a BFGS-B method (usually without analytic gradients, so the solver will use finite-differences). They typically ignore that we can end up in a local minimum (although most routines allow us to specify a starting point for $$\alpha$$ and $$\beta$$). Here I want to state an explicit non-convex quadratic model that we can solve with different types of solvers. This approach will give us a much larger model, with variables $$\alpha$$, $$\beta$$, $$u$$, $$v$$, and  $$\hat{y}$$. In addition, when using an MAE objective, we need extra variables to linearize the absolute value.

\begin{align}\min\>&\color{darkred}{\mathit{MAE}}=\sum_{t\gt 1} \color{darkred}{\mathit{abserr}}_t \\ & \color{darkred}u_1 = \color{darkblue}y_1 \\ & \color{darkred}v_1 = 0 \\ & \color{darkred}u_t = \color{darkred}\alpha \cdot \color{darkblue}y_t + (1-\color{darkred}\alpha) \cdot (\color{darkred}u_{t-1}+\color{darkred}v_{t-1})&& t\gt 1\\& \color{darkred}v_t = \color{darkred}\beta \cdot (\color{darkred}u_t - \color{darkred}u_{t-1}) + (1-\color{darkred}\beta)\cdot \color{darkred}v_{t-1}&& t\gt 1 \\ & \color{darkred}{\hat{y}}_{t+1} = \color{darkred}u_{t}+\color{darkred}v_{t}\\ & -\color{darkred}{\mathit{abserr}}_t \le \color{darkred}{\hat{y}}_{t}- \color{darkblue}y_t \le \color{darkred}{\mathit{abserr}}_t && t\gt 1\\ & \color{darkred}\alpha \in [0,1] \\ & \color{darkred}\beta \in [0,1]\\ & \color{darkred}{\mathit{abserr}}_t \ge 0\end{align}

The first thing to do is to see if we can reproduce the results for $$\alpha=0.4$$ and $$\beta=0.7$$ as they are reported in [1]. We do this by fixing the variables $$\alpha$$ and $$\beta$$ and throw the model into my favorite NLP solver: CONOPT. We only need to fix these two variables, as all other levels will be pinned down by this. This fixed problem solves very fast:

Iter Phase Ninf   Infeasibility   RGmax    NSB   Step InItr MX OK
0   0        7.8250000000E+02 (Input point)

Pre-triangular equations:   42
Post-triangular equations:  29

1   0        0.0000000000E+00 (After pre-processing)
2   0        0.0000000000E+00 (After scaling)

** Feasible solution. Value of objective =    7.40965962794

Iter Phase Ninf     Objective     RGmax    NSB   Step InItr MX OK
4   3        7.4096596279E+00 0.0E+00     0

** Optimal solution. There are no superbasic variables.

Basically, CONOPT recovers the optimal solution during preprocessing. A few more iterations are needed, probably to get the basis right. The objective value is the same as reported in [2]. With this little experiment, we have learned a lot. Most likely our constraints and objective are correctly formulated.

Solution

We are now ready to try to solve the problem. When using  $$\alpha=0.4$$ and $$\beta=0.7$$ as starting point, CONOPT quickly finds a very good solution. A solution report can look like:

----     75 PARAMETER results

y           u           v        yhat         |e|

t1      3.00000     3.00000
t2      5.00000     3.42764     0.37003     3.00000     2.00000
t3      9.00000     4.91004     1.33254     3.79767     5.20233
t4     20.00000     9.18421     3.87786     6.24258    13.75742
t5     12.00000    12.83498     3.68136    13.06207     1.06207
t6     17.00000    16.61976     3.77085    16.51634     0.48366
t7     22.00000    20.73473     4.06861    20.39061     1.60939
t8     23.00000    24.41775     3.73497    24.80334     1.80334
t9     51.00000    33.03795     7.96205    28.15271    22.84729
t10    41.00000    41.00000     7.96205    41.00000
t11    56.00000    50.46691     9.26417    48.96205     7.03795
t12    75.00000    62.99591    12.08915    59.73109    15.26891
t13    60.00000    71.85955     9.29819    75.08506    15.08506
t14    75.00000    79.84108     8.15892    81.15774     6.15774
t15    88.00000    88.00000     8.15892    88.00000

----     76 VARIABLE alpha.L               =      0.21382
VARIABLE beta.L                =      0.86528
VARIABLE MAE.L                 =      6.59394

Conopt found this solution very quickly:

Iter Phase Ninf   Infeasibility   RGmax    NSB   Step InItr MX OK
0   0        7.8250000000E+02 (Input point)

Pre-triangular equations:   1
Post-triangular equations:  5
Definitional equations:     39

1   0        1.0173523479E+02 (After pre-processing)
2   0        3.6729535718E+00 (After scaling)
10   0    12  3.3409268983E+00               0.0E+00      F  F

** Feasible solution. Value of objective =    7.07090607760

Iter Phase Ninf     Objective     RGmax    NSB   Step InItr MX OK
16   3        6.5939399026E+00 2.1E+00     3 1.0E+00    3 T  T
18   3        6.5939398862E+00 0.0E+00     0

** Optimal solution. There are no superbasic variables.

When we compare this with the Excel solution from [2], we see the CONOPT solution is actually better.

 Optimal Excel Solver Solution (from [2])

What we see is:

VariableExcel solution [2]Conopt solution
$$\alpha$$0.2718170.21382
$$\beta$$0.5981610.86528
MAE6.7402086.59394

This is typical for non-convex problems: local NLP solvers may converge to a local optimum. One way to get a better handle on this is to use different starting points. Another way is to try a global solver. Using global solvers, we can show that indeed Conopt found the global optimum.

Starting points

Here we try quickly a few local solvers, with different starting points. The QCP (Quadratically Constrained Program) is non-convex. This means we should be prepared for local optimal. When different solvers starting from different initial points converge to the same solution, we have some (limited) confidence that this point is a global optimum. Note that when solving as a general-purpose NLP  model, the solver does not know the model is quadratic. The local solver also does not know whether the optimal solution it finds is just a local optimum or a global one. It is noted that with this extended QCP formulation, the solver is provided with exact gradients.

SolverStarting pointObjectiveTimeIterations
Conopt$$\alpha=0.4,\beta=0.7$$6.5939018
Conopt$$\alpha=0,\beta=0$$6.5939029
Conopt$$\alpha=1,\beta=1$$6.5939017
Minos$$\alpha=0.4,\beta=0.7$$6.593908
Minos$$\alpha=0,\beta=0$$6.5939014
Minos$$\alpha=1,\beta=1$$6.5939016
Snopt$$\alpha=0.4,\beta=0.7$$6.5939020
Snopt$$\alpha=0,\beta=0$$6.5939014
Snopt$$\alpha=1,\beta=1$$6.5939017
Ipopt$$\alpha=0.4,\beta=0.7$$6.5939023
Ipopt$$\alpha=0,\beta=0$$6.5939014
Ipopt$$\alpha=1,\beta=1$$6.5939026

Local solvers have in general no problem in finding the same optimal solution. This is some evidence this is the global optimum.

Note: I was a bit lazy and left $$u_t$$ and $$v_t$$ at their default initial point of zero.

A global approach

The first thing I did, was to throw the above model unmodified at some global NLP solvers. This was a big disappointment for a small model like this:

SolverObjectiveTimeGapNote
Baron6.5939360095%Time limit
Couenne6.5939360078%Time limit
Antigone6.59391840%Optimal
Gurobi6.59462180%Optimal

Note: The runs with solvers Baron/Couenne/Antigone were done on a somewhat slow laptop. Gurobi ran on a faster (virtual) machine. So timings are not directly comparable between these solvers but are just indicative.

Looking again at our model, we can see that the model is way more non-linear than it needs to be. To fix this, we make the model even larger (more variables, more equations). But also less non-linear. The idea is that we replace terms like $(x+y)z=xz+yz$ by \begin{align} & w=x+y \\ & wz \end{align} When we do this for our model, we get:

\begin{align}\min\>&\color{darkred}{\mathit{MAE}}=\sum_{t\gt 1} \color{darkred}{\mathit{abserr}}_t \\ & \color{darkred}u_1 = \color{darkblue}y_1 \\ & \color{darkred}v_1 = 0 \\ & \color{darkred}w_t = \color{darkred}u_t+\color{darkred}v_t\\ & \color{darkred}u_t = \color{darkred}\alpha \cdot \color{darkblue}y_t + \color{darkred}w_{t-1} -\color{darkred}\alpha \cdot \color{darkred}w_{t-1}&& t\gt 1\\ & \color{darkred}x_t = \color{darkred}u_t - \color{darkred}u_{t-1} -\color{darkred}v_{t-1} && t\gt 1 \\& \color{darkred}v_t = \color{darkred}\beta \cdot \color{darkred}x_t + \color{darkred}v_{t-1}&& t\gt 1 \\ & \color{darkred}{\hat{y}}_{t+1} = \color{darkred}u_{t}+\color{darkred}v_{t}\\ & -\color{darkred}{\mathit{abserr}}_t \le \color{darkred}{\hat{y}}_{t}- \color{darkblue}y_t \le \color{darkred}{\mathit{abserr}}_t && t\gt 1\\ & \color{darkred}\alpha \in [0,1] \\ & \color{darkred}\beta \in [0,1]\\ & \color{darkred}{\mathit{abserr}}_t \ge 0\end{align}

This version of the model behaves better:

SolverObjTimeGapNote
Baron6.5939600%Optimal
Couenne6.5939360049%Time limit
Antigone6.59391570%Optimal
Gurobi6.59391570%Optimal

This reformulation is a big help for Baron, but also helps Gurobi delivering more accurate solutions.

How difficult is this problem?

We saw that Conopt (and other local NLP solvers) found the global optimal solution without any problems. Of course, Conopt does not know whether it is a local or a global solution. So the question arises how difficult is this problem? As it turns out, the problem is actually very well-behaved. Any good local NLP solver will likely find the global optimum, even when feeding it our non-convex QCP problem. The following plot will illuminate this:

Conclusion

The problem of finding optimal parameters $$\alpha$$ and $$\beta$$ for the Holt-Winters double exponential smoothing method leads to a non-convex quadratic optimization problem. Local solvers may deliver solutions that are only locally optimal (and thus not necessarily globally optimal). When trying to solve the global problem, it may help to reformulate the model in order to minimize the number of quadratic terms.

In practice, the problem is quite well-behaved, and a good NLP solver should be able to find the global optimum solution quite easily. It looks like Excel's solver is just stopping a bit prematurely. Interestingly quadratic solvers like Cplex and Gurobi are not as fast or suitable as a general-purpose local NLP solver for this problem.

The GAMS source of the first model can be found in [4].

References

1. Exponential Smoothing, https://en.wikipedia.org/wiki/Exponential_smoothing
2. Holt's Linear Trend, https://www.real-statistics.com/time-series-analysis/basic-time-series-forecasting/holt-linear-trend/
3. Handanhal V. Ravinder, Determining The Optimal Values Of Exponential Smoothing Constants – Does Solver Really Work?, American Journal Of Business Education, Volume 6, Number 3, 2013, https://files.eric.ed.gov/fulltext/EJ1054363.pdf. This paper shows some computational experience with Excel on these types of models.
4. How to setup a minimization problem in GAMS, https://stackoverflow.com/questions/63738161/how-to-setup-a-minimization-problem-in-gams