Suppose we have data like:
worker weight height weight2 height2 1 120 60 125 60 2 152 55 156 66 3 222 55 100 20
We can for each case pick (weight,height) or (weight2,height2). The problem is to pick one of these observations for each row such that the correlation between weight and height is maximized.
Data
First, let's invent some data that we can use for some experiments.
---- 27 PARAMETER data height1 weight1 height2 weight2 i1 67.433285 168.262871 67.445523 163.692389 i2 70.638374 174.437750 68.649190 160.084811 i3 71.317794 159.909672 69.503911 164.720010 i4 59.850261 145.704159 61.175728 142.708300 i5 65.341938 155.586984 68.483909 165.564991 i6 64.142009 154.335001 68.568683 166.169507 i7 67.030368 158.768813 65.780803 153.721717 i8 73.672863 175.126951 73.236515 164.704340 i9 65.203516 157.593587 63.279277 149.784500 i10 69.001848 160.063428 68.786656 162.278007 i11 64.455422 159.039195 63.930208 152.827710 i12 70.719334 164.885704 69.666096 157.356595 i13 65.688428 151.223468 63.614565 150.071072 i14 66.569252 160.978671 70.533320 160.722483 i15 78.417676 172.298652 80.070076 172.695207 i16 65.396154 158.234709 67.404942 158.310596 i17 62.504967 150.899428 61.000439 154.094647 i18 62.122630 150.024298 63.634554 153.644324 i19 70.598400 165.086523 72.999194 166.771223 i20 74.935107 170.820610 76.622182 169.013550 i21 63.233956 154.331546 60.372876 149.152520 i22 72.550105 173.961915 76.748649 167.462369 i23 74.086553 168.190867 75.433331 171.773607 i24 65.379648 163.577697 65.717553 160.134888 i25 64.003038 155.357607 67.301426 158.713710
The optimal solution that has the highest correlation coefficient between height and weight is:
---- 92 PARAMETER result optimal selected observations height1 weight1 height2 weight2 i1 67.445523 163.692389 i2 68.649190 160.084811 i3 69.503911 164.720010 i4 59.850261 145.704159 i5 65.341938 155.586984 i6 64.142009 154.335001 i7 67.030368 158.768813 i8 73.236515 164.704340 i9 63.279277 149.784500 i10 69.001848 160.063428 i11 63.930208 152.827710 i12 70.719334 164.885704 i13 63.614565 150.071072 i14 70.533320 160.722483 i15 78.417676 172.298652 i16 67.404942 158.310596 i17 62.504967 150.899428 i18 62.122630 150.024298 i19 72.999194 166.771223 i20 74.935107 170.820610 i21 60.372876 149.152520 i22 76.748649 167.462369 i23 75.433331 171.773607 i24 65.717553 160.134888 i25 67.301426 158.713710
The impact of being allowed to choose between observations (height1,weight1) and (height2,weight2) is significant:
Correlation coefficients | |
---|---|
height1-weight1 | 0.868691 |
height2-weight2 | 0.894532 |
optimal | 0.956452 |
Below we shall see how we arrive at this conclusion.
MINLP Model
A high-level Mixed-Integer Non-Linear Programming model is simply:
MINLP Model |
---|
\[ \begin{align}\max\> &\mathbf{cor}(\color{darkred}h,\color{darkred}w) \\ & \color{darkred}h_i = \color{darkblue}{\mathit{height1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{height2}}_i\cdot\color{darkred}x_i\\ & \color{darkred}w_i = \color{darkblue}{\mathit{weight1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{weight2}}_i\cdot\color{darkred}x_i \\ & \color{darkred}x_i \in \{0,1\} \end{align}\] |
Here \(\mathbf{cor}(h,w)\) indicates the (Pearson) correlation between vectors \(h\) and \(w\). It is noted that height and weight are positively correlated, so maximizing makes sense. Of course, GAMS does not know about correlations, so we implement this model as:
Implementation of MINLP Model |
---|
\[ \begin{align}\max\> & \color{darkred}z = \frac{\displaystyle\sum_i (\color{darkred}h_i-\bar{\color{darkred}h})(\color{darkred}w_i-\bar{\color{darkred}w})}{\sqrt{\displaystyle\sum_i(\color{darkred}h_i-\bar{\color{darkred}h})^2}\cdot \sqrt{\displaystyle\sum_i(\color{darkred}w_i-\bar{\color{darkred}w})^2}} \\ & \color{darkred}h_i = \color{darkblue}{\mathit{height1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{height2}}_i\cdot\color{darkred}x_i \\ & \color{darkred}w_i = \color{darkblue}{\mathit{weight1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{weight2}}_i\cdot\color{darkred}x_i \\ & \bar{\color{darkred}h} = \frac{1}{n}\displaystyle\sum_i \color{darkred}h_i \\ & \bar{\color{darkred}w} = \frac{1}{n}\displaystyle\sum_i \color{darkred}w_i \\ & \color{darkred}x_i \in \{0,1\} \end{align}\] |
We need to watch when using divisions. I typically like to reformulate divisions by multiplications: \[\begin{align} \max \>& \color{darkred}z\\ &\color{darkred}z\cdot \sqrt{\displaystyle\sum_i(\color{darkred}h_i-\bar{\color{darkred}h})^2}\cdot \sqrt{\displaystyle\sum_i(\color{darkred}w_i-\bar{\color{darkred}w})^2} = \displaystyle\sum_i (\color{darkred}h_i-\bar{\color{darkred}h})(\color{darkred}w_i-\bar{\color{darkred}w})\end{align}\]
The advantage of this formulation is in the protection against division by zero. A disadvantage is that non-linearities are moved to the constraints. Many NLP solvers are happier when constraints are linear, and only the objective is non-linear. My answer: experiment with formulations.
When we solve this model with GAMS/Baron, we see:
---- 79 VARIABLE x.L select 1 or 2 i1 1.000000, i2 1.000000, i3 1.000000, i8 1.000000, i9 1.000000, i11 1.000000, i13 1.000000 i14 1.000000, i16 1.000000, i19 1.000000, i21 1.000000, i22 1.000000, i23 1.000000, i24 1.000000 i25 1.000000 ---- 79 VARIABLE z.L = 0.956452 objective ---- 83 PARAMETER corr all1 0.868691, all2 0.894532, optimal 0.956452
The parameter corr shows different correlations:
- When all \(x_i=0\), that is when we compare height1 vs weight1.
- When all \(x_i=1\), so we compare height2 vs weight2.
- When using an optimal solution for \(x\).
Nonconvex MIQCP
Gurobi can solve non-convex quadratic models quite efficiently. To try this out, I reformulated the MINLP model into a Mixed-Integer Quadratically-Constrained Programming model. We need to add some variables and constraints to make this happen.
Non-convex MIQCP Model |
---|
\[ \begin{align}\max\> & \color{darkred}z\\ & \color{darkred}z\cdot \color{darkred}{\mathit{denom}} = \sum_i (\color{darkred}h_i-\bar{\color{darkred}h})(\color{darkred}w_i-\bar{\color{darkred}w})\\ & \color{darkred}s^2_h = \sum_i(\color{darkred}h_i-\bar{\color{darkred}h})^2 \\ & \color{darkred}s^2_w = \sum_i(\color{darkred}w_i-\bar{\color{darkred}w})^2 \\ & \color{darkred}{\mathit{denom}} = \color{darkred}s_h \cdot \color{darkred}s_w\\ & \color{darkred}h_i = \color{darkblue}{\mathit{height1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{height2}}_i\cdot\color{darkred}x_i \\ & \color{darkred}w_i = \color{darkblue}{\mathit{weight1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{weight2}}_i\cdot\color{darkred}x_i \\ & \bar{\color{darkred}h} = \frac{1}{\color{darkblue}n}\displaystyle\sum_i \color{darkred}h_i \\ & \bar{\color{darkred}w} = \frac{1}{\color{darkblue}n}\displaystyle\sum_i \color{darkred}w_i \\ & \color{darkred}x_i \in \{0,1\} \\ & \color{darkred}s_h, \color{darkred}s_w \ge 0 \end{align}\] |
The final model is still quite small, but I encountered severe problems:
Gurobi Optimizer version 9.0.1 build v9.0.1rc0 (win64) Optimize a model with 52 rows, 81 columns and 152 nonzeros Model fingerprint: 0xc1d6aa23 Model has 4 quadratic constraints Variable types: 56 continuous, 25 integer (25 binary) Coefficient statistics: Matrix range [1e-02, 1e+01] QMatrix range [1e+00, 2e+01] QLMatrix range [1e+00, 1e+00] Objective range [1e+00, 1e+00] Bounds range [1e+00, 2e+02] RHS range [6e+01, 2e+02] Presolve removed 50 rows and 50 columns Presolve time: 0.02s Presolved: 179 rows, 88 columns, 637 nonzeros Presolved model has 7 bilinear constraint(s) Variable types: 63 continuous, 25 integer (25 binary) Root relaxation: unbounded, 0 iterations, 0.00 seconds Nodes | Current Node | Objective Bounds | Work Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time 0 2 postponed 0 - - - - 0s 50873 43644 postponed 872 - - - 0.0 5s 124683 118126 postponed 805 - - - 0.0 10s 196009 189074 postponed 1409 - - - 0.0 15s 268890 261269 postponed 1771 - - - 0.0 20s 341602 333841 postponed 2653 - - - 0.0 25s 416698 408963 postponed 2652 - - - 0.0 30s 491794 484945 postponed 2764 - - - 0.0 35s 566890 559155 postponed 3533 - - - 0.0 40s 641986 634407 postponed 4413 - - - 0.0 45s 717082 709397 postponed 5294 - - - 0.0 50s 792178 784445 postponed 5296 - - - 0.0 55s 867274 859517 postponed 5295 - - - 0.0 60s 942370 934643 postponed 6177 - - - 0.0 65s 1016274 1009505 postponed 6177 - - - 0.0 70s 1091370 1083743 postponed 7052 - - - 0.0 75s 1166466 1158805 postponed 7939 - - - 0.0 80s 1241562 1233913 postponed 7938 - - - 0.0 85s 1316658 1308953 postponed 7939 - - - 0.0 90s 1391754 1384013 postponed 8820 - - - 0.0 95s 1465658 1457969 postponed 8820 - - - 0.0 100s . . .
Root relaxation: unbounded
This unbounded relaxation is the cause of the problem. The relaxation Gurobi solves is not a non-convex QCP model (that would not be unbounded), but rather something even more "relaxed". Luckily, we know a good bound on the objective. Correlation coefficients are always between \(-1\) and \(+1\). This means we can add the bound: \(z \le 1\). After adding this, we see:
Root relaxation: objective 1.000000e+00, 65 iterations, 0.00 seconds Nodes | Current Node | Objective Bounds | Work Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time 0 0 1.00000 0 7 - 1.00000 - - 0s 0 0 1.00000 0 5 - 1.00000 - - 0s 0 0 1.00000 0 5 - 1.00000 - - 0s 0 0 1.00000 0 5 - 1.00000 - - 0s 0 2 1.00000 0 6 - 1.00000 - - 0s * 324 297 44 0.9195020 1.00000 8.75% 7.7 0s * 416 297 42 0.9400483 1.00000 6.38% 7.5 0s * 556 511 28 0.9409507 1.00000 6.28% 7.6 0s * 598 511 27 0.9457779 1.00000 5.73% 7.5 0s * 603 511 28 0.9473171 1.00000 5.56% 7.4 0s * 643 511 29 0.9483207 1.00000 5.45% 7.2 0s * 1195 620 33 0.9494881 1.00000 5.32% 7.0 0s *10829 2567 40 0.9499222 1.00000 5.27% 7.7 0s *12064 2834 41 0.9526072 1.00000 4.98% 7.6 0s *13333 3393 39 0.9530905 1.00000 4.92% 7.4 0s *13341 3392 39 0.9531092 1.00000 4.92% 7.4 0s H27924 7927 0.9539760 1.00000 4.82% 6.4 1s *52593 15598 41 0.9543576 1.00000 4.78% 5.9 1s H53294 15799 0.9548335 1.00000 4.73% 5.9 2s H96539 25687 0.9564515 0.99711 4.25% 5.9 3s 155894 36175 cutoff 33 0.95645 0.99403 3.93% 5.2 5s 335621 57101 0.98359 32 10 0.95645 0.98804 3.30% 4.7 10s 518727 68913 0.95912 31 13 0.95645 0.98354 2.83% 4.5 15s 703811 73266 cutoff 31 0.95645 0.97999 2.46% 4.3 20s 890371 75019 cutoff 33 0.95645 0.97695 2.14% 4.3 25s 1082662 73850 cutoff 37 0.95645 0.97425 1.86% 4.2 30s 1270873 69325 0.96318 34 9 0.95645 0.97177 1.60% 4.1 35s 1451055 61577 cutoff 37 0.95645 0.96956 1.37% 4.1 40s 1628515 49114 cutoff 35 0.95645 0.96730 1.13% 4.0 45s 1800508 30288 0.95954 35 9 0.95645 0.96460 0.85% 4.0 50s Cutting planes: Gomory: 1 MIR: 2 Flow cover: 3 RLT: 5 Explored 1916074 nodes (7618484 simplex iterations) in 53.76 seconds Thread count was 8 (of 16 available processors) Solution count 10: 0.956452 0.954834 0.954358 ... 0.948321 No other solutions better than 0.956452
This saved Gurobi! Now we see that Gurobi finds the optimal solution with a correlation coefficient of 0.956452.
Another possible improvement is to introduce mean-adjusted values (this is a linear operation). This will make the quadratic constraints a bit simpler, at the expense of extra variables and (linear) constraints.
Mean-adjusted non-convex MIQCP Model |
---|
\[ \begin{align}\max\> & \color{darkred}z\\ & \color{darkred}z\cdot \color{darkred}{\mathit{denom}} = \sum_i \tilde{\color{darkred}h_i} \cdot \tilde{\color{darkred}w_i}\\ & \color{darkred}s^2_h = \sum_i \tilde{\color{darkred}h}^2_i \\ & \color{darkred}s^2_w = \sum_i \tilde{\color{darkred}w}^2_i \\ & \color{darkred}{\mathit{denom}} = \color{darkred}s_h \cdot \color{darkred}s_w\\ & \color{darkred}h_i = \color{darkblue}{\mathit{height1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{height2}}_i\cdot\color{darkred}x_i \\ & \color{darkred}w_i = \color{darkblue}{\mathit{weight1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{weight2}}_i\cdot\color{darkred}x_i \\ & \bar{\color{darkred}h} = \frac{1}{\color{darkblue}n}\displaystyle\sum_i \color{darkred}h_i \\ & \bar{\color{darkred}w} = \frac{1}{\color{darkblue}n}\displaystyle\sum_i \color{darkred}w_i \\ & \tilde{\color{darkred}h_i}= \color{darkred}h_i - \bar{\color{darkred}h}\\ & \tilde{\color{darkred}w_i} = \color{darkred}w_i - \bar{\color{darkred}w} \\ & \color{darkred}x_i \in \{0,1\} \\ & \color{darkred}s_h, \color{darkred}s_w \ge 0 \end{align}\] |
This formulation does not need the bound \(z\le 1\), although it still performs better with it.
Genetic Algorithm
It is interesting to see how a simple meta-heuristic would do on this problem. Here are the results using the ga function from the GA package [2]. This was actually quite easy to implement.
> df <- read.table(text=" + id height1 weight1 height2 weight2 + i1 67.433285 168.262871 67.445523 163.692389 + i2 70.638374 174.437750 68.649190 160.084811 + i3 71.317794 159.909672 69.503911 164.720010 + i4 59.850261 145.704159 61.175728 142.708300 + i5 65.341938 155.586984 68.483909 165.564991 + i6 64.142009 154.335001 68.568683 166.169507 + i7 67.030368 158.768813 65.780803 153.721717 + i8 73.672863 175.126951 73.236515 164.704340 + i9 65.203516 157.593587 63.279277 149.784500 + i10 69.001848 160.063428 68.786656 162.278007 + i11 64.455422 159.039195 63.930208 152.827710 + i12 70.719334 164.885704 69.666096 157.356595 + i13 65.688428 151.223468 63.614565 150.071072 + i14 66.569252 160.978671 70.533320 160.722483 + i15 78.417676 172.298652 80.070076 172.695207 + i16 65.396154 158.234709 67.404942 158.310596 + i17 62.504967 150.899428 61.000439 154.094647 + i18 62.122630 150.024298 63.634554 153.644324 + i19 70.598400 165.086523 72.999194 166.771223 + i20 74.935107 170.820610 76.622182 169.013550 + i21 63.233956 154.331546 60.372876 149.152520 + i22 72.550105 173.961915 76.748649 167.462369 + i23 74.086553 168.190867 75.433331 171.773607 + i24 65.379648 163.577697 65.717553 160.134888 + i25 64.003038 155.357607 67.301426 158.713710 + ", header=T) > > # > # print obvious cases > # > cor(df$weight1,df$height1) [1] 0.8686908 > cor(df$weight2,df$height2) [1] 0.894532 > > # > # fitness function > # > f <- function(x) { + w <- df$weight1*(1-x) + df$weight2*x + h <- df$height1*(1-x) + df$height2*x + cor(w,h) + } > > library(GA) > res <- ga(type=c("binary"),fitness=f,nBits=25,seed=123) GA | iter = 1 | Mean = 0.8709318 | Best = 0.9237155 GA | iter = 2 | Mean = 0.8742004 | Best = 0.9237155 GA | iter = 3 | Mean = 0.8736450 | Best = 0.9237155 GA | iter = 4 | Mean = 0.8742228 | Best = 0.9384788 GA | iter = 5 | Mean = 0.8746517 | Best = 0.9384788 GA | iter = 6 | Mean = 0.8792048 | Best = 0.9486227 GA | iter = 7 | Mean = 0.8844841 | Best = 0.9486227 GA | iter = 8 | Mean = 0.8816874 | Best = 0.9486227 GA | iter = 9 | Mean = 0.8805522 | Best = 0.9486227 GA | iter = 10 | Mean = 0.8820974 | Best = 0.9486227 GA | iter = 11 | Mean = 0.8859074 | Best = 0.9486227 GA | iter = 12 | Mean = 0.8956467 | Best = 0.9486227 GA | iter = 13 | Mean = 0.8989140 | Best = 0.9486227 GA | iter = 14 | Mean = 0.9069327 | Best = 0.9486227 GA | iter = 15 | Mean = 0.9078787 | Best = 0.9486227 GA | iter = 16 | Mean = 0.9069163 | Best = 0.9489443 GA | iter = 17 | Mean = 0.9104712 | Best = 0.9489443 GA | iter = 18 | Mean = 0.9169900 | Best = 0.9489443 GA | iter = 19 | Mean = 0.9175285 | Best = 0.9489443 GA | iter = 20 | Mean = 0.9207076 | Best = 0.9489443 GA | iter = 21 | Mean = 0.9210288 | Best = 0.9489443 GA | iter = 22 | Mean = 0.9206928 | Best = 0.9489443 GA | iter = 23 | Mean = 0.9210399 | Best = 0.9489443 GA | iter = 24 | Mean = 0.9208985 | Best = 0.9489443 GA | iter = 25 | Mean = 0.9183778 | Best = 0.9511446 GA | iter = 26 | Mean = 0.9217391 | Best = 0.9511446 GA | iter = 27 | Mean = 0.9274271 | Best = 0.9522764 GA | iter = 28 | Mean = 0.9271156 | Best = 0.9522764 GA | iter = 29 | Mean = 0.9275347 | Best = 0.9522764 GA | iter = 30 | Mean = 0.9278315 | Best = 0.9522764 GA | iter = 31 | Mean = 0.9300289 | Best = 0.9522764 GA | iter = 32 | Mean = 0.9306409 | Best = 0.9528777 GA | iter = 33 | Mean = 0.9309087 | Best = 0.9528777 GA | iter = 34 | Mean = 0.9327691 | Best = 0.9528777 GA | iter = 35 | Mean = 0.9309344 | Best = 0.9549574 GA | iter = 36 | Mean = 0.9341977 | Best = 0.9549574 GA | iter = 37 | Mean = 0.9374437 | Best = 0.9559043 GA | iter = 38 | Mean = 0.9394410 | Best = 0.9559043 GA | iter = 39 | Mean = 0.9405482 | Best = 0.9559043 GA | iter = 40 | Mean = 0.9432749 | Best = 0.9564515 GA | iter = 41 | Mean = 0.9441814 | Best = 0.9564515 GA | iter = 42 | Mean = 0.9458232 | Best = 0.9564515 GA | iter = 43 | Mean = 0.9469625 | Best = 0.9564515 GA | iter = 44 | Mean = 0.9462313 | Best = 0.9564515 GA | iter = 45 | Mean = 0.9449716 | Best = 0.9564515 GA | iter = 46 | Mean = 0.9444071 | Best = 0.9564515 GA | iter = 47 | Mean = 0.9437149 | Best = 0.9564515 GA | iter = 48 | Mean = 0.9446355 | Best = 0.9564515 GA | iter = 49 | Mean = 0.9455424 | Best = 0.9564515 GA | iter = 50 | Mean = 0.9456497 | Best = 0.9564515 GA | iter = 51 | Mean = 0.9461382 | Best = 0.9564515 GA | iter = 52 | Mean = 0.9444960 | Best = 0.9564515 GA | iter = 53 | Mean = 0.9434671 | Best = 0.9564515 GA | iter = 54 | Mean = 0.9451851 | Best = 0.9564515 GA | iter = 55 | Mean = 0.9481903 | Best = 0.9564515 GA | iter = 56 | Mean = 0.9477778 | Best = 0.9564515 GA | iter = 57 | Mean = 0.9481829 | Best = 0.9564515 GA | iter = 58 | Mean = 0.9490952 | Best = 0.9564515 GA | iter = 59 | Mean = 0.9505670 | Best = 0.9564515 GA | iter = 60 | Mean = 0.9499329 | Best = 0.9564515 GA | iter = 61 | Mean = 0.9509299 | Best = 0.9564515 GA | iter = 62 | Mean = 0.9505341 | Best = 0.9564515 GA | iter = 63 | Mean = 0.9519624 | Best = 0.9564515 GA | iter = 64 | Mean = 0.9518618 | Best = 0.9564515 GA | iter = 65 | Mean = 0.9523598 | Best = 0.9564515 GA | iter = 66 | Mean = 0.9516766 | Best = 0.9564515 GA | iter = 67 | Mean = 0.9521926 | Best = 0.9564515 GA | iter = 68 | Mean = 0.9524419 | Best = 0.9564515 GA | iter = 69 | Mean = 0.9532865 | Best = 0.9564515 GA | iter = 70 | Mean = 0.9535871 | Best = 0.9564515 GA | iter = 71 | Mean = 0.9536049 | Best = 0.9564515 GA | iter = 72 | Mean = 0.9534035 | Best = 0.9564515 GA | iter = 73 | Mean = 0.9532859 | Best = 0.9564515 GA | iter = 74 | Mean = 0.9521064 | Best = 0.9564515 GA | iter = 75 | Mean = 0.9534997 | Best = 0.9564515 GA | iter = 76 | Mean = 0.9539987 | Best = 0.9564515 GA | iter = 77 | Mean = 0.9536670 | Best = 0.9564515 GA | iter = 78 | Mean = 0.9526224 | Best = 0.9564515 GA | iter = 79 | Mean = 0.9531871 | Best = 0.9564515 GA | iter = 80 | Mean = 0.9527495 | Best = 0.9564515 GA | iter = 81 | Mean = 0.9526061 | Best = 0.9564515 GA | iter = 82 | Mean = 0.9525577 | Best = 0.9564515 GA | iter = 83 | Mean = 0.9525084 | Best = 0.9564515 GA | iter = 84 | Mean = 0.9519052 | Best = 0.9564515 GA | iter = 85 | Mean = 0.9518549 | Best = 0.9564515 GA | iter = 86 | Mean = 0.9511299 | Best = 0.9564515 GA | iter = 87 | Mean = 0.9505129 | Best = 0.9564515 GA | iter = 88 | Mean = 0.9518203 | Best = 0.9564515 GA | iter = 89 | Mean = 0.9537234 | Best = 0.9564515 GA | iter = 90 | Mean = 0.9531017 | Best = 0.9564515 GA | iter = 91 | Mean = 0.9514525 | Best = 0.9564515 GA | iter = 92 | Mean = 0.9505517 | Best = 0.9564515 GA | iter = 93 | Mean = 0.9524752 | Best = 0.9564515 GA | iter = 94 | Mean = 0.9533879 | Best = 0.9564515 GA | iter = 95 | Mean = 0.9519166 | Best = 0.9564515 GA | iter = 96 | Mean = 0.9524416 | Best = 0.9564515 GA | iter = 97 | Mean = 0.9526676 | Best = 0.9564515 GA | iter = 98 | Mean = 0.9523745 | Best = 0.9564515 GA | iter = 99 | Mean = 0.9523710 | Best = 0.9564515 GA | iter = 100 | Mean = 0.9519255 | Best = 0.9564515 > res@solution x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 x20 x21 x22 x23 x24 x25 [1,] 1 1 1 0 0 0 0 1 1 0 1 0 1 1 0 1 0 0 1 0 1 1 1 1 1 > res@fitnessValue [1] 0.9564515
When comparing to our proven optimal Baron and Gurobi solution, we see that the ga function finds the optimal solution. Of course, we would not know this if we did not solve the model with a global solver.
Conclusions
- Allowing us to pick and choose observations between two data sets can have a significant effect on the correlation coefficient.
- The problem to find the optimal mix is a difficult combinatorial problem.
- The obvious approach is to formulate this as a non-convex MINLP.
- An alternative is to use a non-convex MIQCP problem. This will bring Gurobi into the picture. However, if we don't impose an upper-bound on the objective, Gurobi is rendered totally helpless. After adding the bound, things look much better.
- This problem looks quite suited for a meta-heuristic. They are fast and simple to implement, but you only get good solutions (occasionally optimal ones), without any quality assurance.
References
- Selecting sample that maximizes correlation, https://stackoverflow.com/questions/64251740/selecting-sample-that-maximizes-correlation
- Package 'GA', https://cran.r-project.org/web/packages/GA/GA.pdf
Appendix: Gurobi LP file
This LP file causes real trouble for Gurobi due to its unbounded relaxation. Note this has to be solved with the option nonconvex 2.
\ Model GAMS Model \ LP format - for model browsing. Use MPS format to capture full model detail. Maximize z Subject To defwx(i1): 4.57048155290812 x(i1) + hwx(i1,weight) = 168.2628705894044 defwx(i2): 14.35293953397428 x(i2) + hwx(i2,weight) = 174.4377502565614 defwx(i3): - 4.810337240480266 x(i3) + hwx(i3,weight) = 159.9096723101254 defwx(i4): 2.995859558106929 x(i4) + hwx(i4,weight) = 145.7041593291271 defwx(i5): - 9.978006854397336 x(i5) + hwx(i5,weight) = 155.5869839889373 defwx(i6): - 11.83450629445829 x(i6) + hwx(i6,weight) = 154.335000536442 defwx(i7): 5.047095974632526 x(i7) + hwx(i7,weight) = 158.7688132554227 defwx(i8): 10.42261080570944 x(i8) + hwx(i8,weight) = 175.1269512206012 defwx(i9): 7.809086784903798 x(i9) + hwx(i9,weight) = 157.5935866164602 defwx(i10): - 2.214578422082724 x(i10) + hwx(i10,weight) = 160.0634282485366 defwx(i11): 6.211485020463414 x(i11) + hwx(i11,weight) = 159.0391947749885 defwx(i12): 7.529108790630033 x(i12) + hwx(i12,weight) = 164.8857039875973 defwx(i13): 1.152396641792848 x(i13) + hwx(i13,weight) = 151.2234684137615 defwx(i14): 0.2561878995556413 x(i14) + hwx(i14,weight) = 160.9786711234614 defwx(i15): - 0.3965543777673872 x(i15) + hwx(i15,weight) = 172.2986522599465 defwx(i16): - 0.0758869132981204 x(i16) + hwx(i16,weight) = 158.2347086210175 defwx(i17): - 3.195219014258043 x(i17) + hwx(i17,weight) = 150.8994283190054 defwx(i18): - 3.620025521749142 x(i18) + hwx(i18,weight) = 150.0242984063181 defwx(i19): - 1.684699946033305 x(i19) + hwx(i19,weight) = 165.0865228272624 defwx(i20): 1.807060121523165 x(i20) + hwx(i20,weight) = 170.8206099631946 defwx(i21): 5.179025632721846 x(i21) + hwx(i21,weight) = 154.3315455729319 defwx(i22): 6.499545761101757 x(i22) + hwx(i22,weight) = 173.9619147338905 defwx(i23): - 3.582740184390104 x(i23) + hwx(i23,weight) = 168.190866891775 defwx(i24): 3.442809290323368 x(i24) + hwx(i24,weight) = 163.5776971098937 defwx(i25): - 3.356103037296691 x(i25) + hwx(i25,weight) = 155.3576067743665 defhx(i1): - 0.012238100107453 x(i1) + hwx(i1,height) = 67.43328535705723 defhx(i2): 1.98918429361234 x(i2) + hwx(i2,height) = 70.6383740411321 defhx(i3): 1.813883224886396 x(i3) + hwx(i3,height) = 71.3177939118135 defhx(i4): - 1.325467115496522 x(i4) + hwx(i4,height) = 59.85026108005525 defhx(i5): - 3.141971238432788 x(i5) + hwx(i5,height) = 65.34193793182213 defhx(i6): - 4.426674002923676 x(i6) + hwx(i6,height) = 64.14200864133288 defhx(i7): 1.249564935900537 x(i7) + hwx(i7,height) = 67.03036779841625 defhx(i8): 0.4363485690300877 x(i8) + hwx(i8,height) = 73.67286344172466 defhx(i9): 1.924239239167797 x(i9) + hwx(i9,height) = 65.20351620466388 defhx(i10): 0.2151915024726918 x(i10) + hwx(i10,height) = 69.00184760175402 defhx(i11): 0.5252136910261171 x(i11) + hwx(i11,height) = 64.45542162911404 defhx(i12): 1.053237234263008 x(i12) + hwx(i12,height) = 70.71933367432223 defhx(i13): 2.073862555158748 x(i13) + hwx(i13,height) = 65.68842782277154 defhx(i14): - 3.964068185905262 x(i14) + hwx(i14,height) = 66.5692517581452 defhx(i15): - 1.652399901482397 x(i15) + hwx(i15,height) = 78.41767636858228 defhx(i16): - 2.008787567894402 x(i16) + hwx(i16,height) = 65.39615405205915 defhx(i17): 1.504528370533244 x(i17) + hwx(i17,height) = 62.50496743343454 defhx(i18): - 1.511924170502624 x(i18) + hwx(i18,height) = 62.12263020481269 defhx(i19): - 2.400794170619122 x(i19) + hwx(i19,height) = 70.5983998511028 defhx(i20): - 1.68707570335097 x(i20) + hwx(i20,height) = 74.93510670286524 defhx(i21): 2.861079762565417 x(i21) + hwx(i21,height) = 63.23395556183407 defhx(i22): - 4.198544675012556 x(i22) + hwx(i22,height) = 72.55010450534969 defhx(i23): - 1.346778063106839 x(i23) + hwx(i23,height) = 74.08655337740214 defhx(i24): - 0.3379045824620164 x(i24) + hwx(i24,height) = 65.37964799513293 defhx(i25): - 3.298388577170783 x(i25) + hwx(i25,height) = 64.0030375549771 defmean(height): - 0.04 hwx(i1,height) - 0.04 hwx(i2,height) - 0.04 hwx(i3,height) - 0.04 hwx(i4,height) - 0.04 hwx(i5,height) - 0.04 hwx(i6,height) - 0.04 hwx(i7,height) - 0.04 hwx(i8,height) - 0.04 hwx(i9,height) - 0.04 hwx(i10,height) - 0.04 hwx(i11,height) - 0.04 hwx(i12,height) - 0.04 hwx(i13,height) - 0.04 hwx(i14,height) - 0.04 hwx(i15,height) - 0.04 hwx(i16,height) - 0.04 hwx(i17,height) - 0.04 hwx(i18,height) - 0.04 hwx(i19,height) - 0.04 hwx(i20,height) - 0.04 hwx(i21,height) - 0.04 hwx(i22,height) - 0.04 hwx(i23,height) - 0.04 hwx(i24,height) - 0.04 hwx(i25,height) + meanx(height) = 0 defmean(weight): - 0.04 hwx(i1,weight) - 0.04 hwx(i2,weight) - 0.04 hwx(i3,weight) - 0.04 hwx(i4,weight) - 0.04 hwx(i5,weight) - 0.04 hwx(i6,weight) - 0.04 hwx(i7,weight) - 0.04 hwx(i8,weight) - 0.04 hwx(i9,weight) - 0.04 hwx(i10,weight) - 0.04 hwx(i11,weight) - 0.04 hwx(i12,weight) - 0.04 hwx(i13,weight) - 0.04 hwx(i14,weight) - 0.04 hwx(i15,weight) - 0.04 hwx(i16,weight) - 0.04 hwx(i17,weight) - 0.04 hwx(i18,weight) - 0.04 hwx(i19,weight) - 0.04 hwx(i20,weight) - 0.04 hwx(i21,weight) - 0.04 hwx(i22,weight) - 0.04 hwx(i23,weight) - 0.04 hwx(i24,weight) - 0.04 hwx(i25,weight) + meanx(weight) = 0 sigma(height): [ - hwx(i1,height) ^2 + 2 hwx(i1,height) * meanx(height) - hwx(i2,height) ^2 + 2 hwx(i2,height) * meanx(height) - hwx(i3,height) ^2 + 2 hwx(i3,height) * meanx(height) - hwx(i4,height) ^2 + 2 hwx(i4,height) * meanx(height) - hwx(i5,height) ^2 + 2 hwx(i5,height) * meanx(height) - hwx(i6,height) ^2 + 2 hwx(i6,height) * meanx(height) - hwx(i7,height) ^2 + 2 hwx(i7,height) * meanx(height) - hwx(i8,height) ^2 + 2 hwx(i8,height) * meanx(height) - hwx(i9,height) ^2 + 2 hwx(i9,height) * meanx(height) - hwx(i10,height) ^2 + 2 hwx(i10,height) * meanx(height) - hwx(i11,height) ^2 + 2 hwx(i11,height) * meanx(height) - hwx(i12,height) ^2 + 2 hwx(i12,height) * meanx(height) - hwx(i13,height) ^2 + 2 hwx(i13,height) * meanx(height) - hwx(i14,height) ^2 + 2 hwx(i14,height) * meanx(height) - hwx(i15,height) ^2 + 2 hwx(i15,height) * meanx(height) - hwx(i16,height) ^2 + 2 hwx(i16,height) * meanx(height) - hwx(i17,height) ^2 + 2 hwx(i17,height) * meanx(height) - hwx(i18,height) ^2 + 2 hwx(i18,height) * meanx(height) - hwx(i19,height) ^2 + 2 hwx(i19,height) * meanx(height) - hwx(i20,height) ^2 + 2 hwx(i20,height) * meanx(height) - hwx(i21,height) ^2 + 2 hwx(i21,height) * meanx(height) - hwx(i22,height) ^2 + 2 hwx(i22,height) * meanx(height) - hwx(i23,height) ^2 + 2 hwx(i23,height) * meanx(height) - hwx(i24,height) ^2 + 2 hwx(i24,height) * meanx(height) - hwx(i25,height) ^2 + 2 hwx(i25,height) * meanx(height) - 25 meanx(height) ^2 + s(height) ^2 ] = 0 sigma(weight): [ - hwx(i1,weight) ^2 + 2 hwx(i1,weight) * meanx(weight) - hwx(i2,weight) ^2 + 2 hwx(i2,weight) * meanx(weight) - hwx(i3,weight) ^2 + 2 hwx(i3,weight) * meanx(weight) - hwx(i4,weight) ^2 + 2 hwx(i4,weight) * meanx(weight) - hwx(i5,weight) ^2 + 2 hwx(i5,weight) * meanx(weight) - hwx(i6,weight) ^2 + 2 hwx(i6,weight) * meanx(weight) - hwx(i7,weight) ^2 + 2 hwx(i7,weight) * meanx(weight) - hwx(i8,weight) ^2 + 2 hwx(i8,weight) * meanx(weight) - hwx(i9,weight) ^2 + 2 hwx(i9,weight) * meanx(weight) - hwx(i10,weight) ^2 + 2 hwx(i10,weight) * meanx(weight) - hwx(i11,weight) ^2 + 2 hwx(i11,weight) * meanx(weight) - hwx(i12,weight) ^2 + 2 hwx(i12,weight) * meanx(weight) - hwx(i13,weight) ^2 + 2 hwx(i13,weight) * meanx(weight) - hwx(i14,weight) ^2 + 2 hwx(i14,weight) * meanx(weight) - hwx(i15,weight) ^2 + 2 hwx(i15,weight) * meanx(weight) - hwx(i16,weight) ^2 + 2 hwx(i16,weight) * meanx(weight) - hwx(i17,weight) ^2 + 2 hwx(i17,weight) * meanx(weight) - hwx(i18,weight) ^2 + 2 hwx(i18,weight) * meanx(weight) - hwx(i19,weight) ^2 + 2 hwx(i19,weight) * meanx(weight) - hwx(i20,weight) ^2 + 2 hwx(i20,weight) * meanx(weight) - hwx(i21,weight) ^2 + 2 hwx(i21,weight) * meanx(weight) - hwx(i22,weight) ^2 + 2 hwx(i22,weight) * meanx(weight) - hwx(i23,weight) ^2 + 2 hwx(i23,weight) * meanx(weight) - hwx(i24,weight) ^2 + 2 hwx(i24,weight) * meanx(weight) - hwx(i25,weight) ^2 + 2 hwx(i25,weight) * meanx(weight) - 25 meanx(weight) ^2 + s(weight) ^2 ] = 0 denom: shw + [ - s(height) * s(weight) ] = 0 obj: [ - hwx(i1,height) * hwx(i1,weight) + hwx(i1,height) * meanx(weight) + hwx(i1,weight) * meanx(height) - hwx(i2,height) * hwx(i2,weight) + hwx(i2,height) * meanx(weight) + hwx(i2,weight) * meanx(height) - hwx(i3,height) * hwx(i3,weight) + hwx(i3,height) * meanx(weight) + hwx(i3,weight) * meanx(height) - hwx(i4,height) * hwx(i4,weight) + hwx(i4,height) * meanx(weight) + hwx(i4,weight) * meanx(height) - hwx(i5,height) * hwx(i5,weight) + hwx(i5,height) * meanx(weight) + hwx(i5,weight) * meanx(height) - hwx(i6,height) * hwx(i6,weight) + hwx(i6,height) * meanx(weight) + hwx(i6,weight) * meanx(height) - hwx(i7,height) * hwx(i7,weight) + hwx(i7,height) * meanx(weight) + hwx(i7,weight) * meanx(height) - hwx(i8,height) * hwx(i8,weight) + hwx(i8,height) * meanx(weight) + hwx(i8,weight) * meanx(height) - hwx(i9,height) * hwx(i9,weight) + hwx(i9,height) * meanx(weight) + hwx(i9,weight) * meanx(height) - hwx(i10,height) * hwx(i10,weight) + hwx(i10,height) * meanx(weight) + hwx(i10,weight) * meanx(height) - hwx(i11,height) * hwx(i11,weight) + hwx(i11,height) * meanx(weight) + hwx(i11,weight) * meanx(height) - hwx(i12,height) * hwx(i12,weight) + hwx(i12,height) * meanx(weight) + hwx(i12,weight) * meanx(height) - hwx(i13,height) * hwx(i13,weight) + hwx(i13,height) * meanx(weight) + hwx(i13,weight) * meanx(height) - hwx(i14,height) * hwx(i14,weight) + hwx(i14,height) * meanx(weight) + hwx(i14,weight) * meanx(height) - hwx(i15,height) * hwx(i15,weight) + hwx(i15,height) * meanx(weight) + hwx(i15,weight) * meanx(height) - hwx(i16,height) * hwx(i16,weight) + hwx(i16,height) * meanx(weight) + hwx(i16,weight) * meanx(height) - hwx(i17,height) * hwx(i17,weight) + hwx(i17,height) * meanx(weight) + hwx(i17,weight) * meanx(height) - hwx(i18,height) * hwx(i18,weight) + hwx(i18,height) * meanx(weight) + hwx(i18,weight) * meanx(height) - hwx(i19,height) * hwx(i19,weight) + hwx(i19,height) * meanx(weight) + hwx(i19,weight) * meanx(height) - hwx(i20,height) * hwx(i20,weight) + hwx(i20,height) * meanx(weight) + hwx(i20,weight) * meanx(height) - hwx(i21,height) * hwx(i21,weight) + hwx(i21,height) * meanx(weight) + hwx(i21,weight) * meanx(height) - hwx(i22,height) * hwx(i22,weight) + hwx(i22,height) * meanx(weight) + hwx(i22,weight) * meanx(height) - hwx(i23,height) * hwx(i23,weight) + hwx(i23,height) * meanx(weight) + hwx(i23,weight) * meanx(height) - hwx(i24,height) * hwx(i24,weight) + hwx(i24,height) * meanx(weight) + hwx(i24,weight) * meanx(height) - hwx(i25,height) * hwx(i25,weight) + hwx(i25,height) * meanx(weight) + hwx(i25,weight) * meanx(height) - 25 meanx(height) * meanx(weight) + z * shw ] = 0 Bounds \ z <= 1 Binaries x(i1) x(i2) x(i3) x(i4) x(i5) x(i6) x(i7) x(i8) x(i9) x(i10) x(i11) x(i12) x(i13) x(i14) x(i15) x(i16) x(i17) x(i18) x(i19) x(i20) x(i21) x(i22) x(i23) x(i24) x(i25) End
Hi Erwin. Thanks for the interesting blog post. I formulated the problem as a non-convex MIQP and solved it with Gurobi 9.0.3 as well. I got *very* different results. Here's the log:
ReplyDeleteOptimize a model with 52 rows, 133 columns and 202 nonzeros
Model fingerprint: 0xf968fe80
Model has 6 quadratic constraints
Model has 100 general constraints
Variable types: 108 continuous, 25 integer (25 binary)
Coefficient statistics:
Matrix range [4e-02, 1e+00]
QMatrix range [1e+00, 1e+00]
QLMatrix range [1e+00, 1e+00]
Objective range [1e+00, 1e+00]
Bounds range [1e+00, 1e+06]
RHS range [0e+00, 0e+00]
Presolve removed 0 rows and 50 columns
Presolve time: 0.01s
Presolved: 321 rows, 161 columns, 968 nonzeros
Presolved model has 79 bilinear constraint(s)
Variable types: 136 continuous, 25 integer (25 binary)
Root relaxation: objective 1.000000e+00, 189 iterations, 0.00 seconds
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 1.00000 0 85 - 1.00000 - - 0s
0 0 1.00000 0 82 - 1.00000 - - 0s
0 0 1.00000 0 80 - 1.00000 - - 0s
0 0 1.00000 0 80 - 1.00000 - - 0s
0 0 1.00000 0 80 - 1.00000 - - 0s
0 0 1.00000 0 78 - 1.00000 - - 0s
0 0 1.00000 0 78 - 1.00000 - - 0s
0 0 1.00000 0 78 - 1.00000 - - 0s
0 0 1.00000 0 78 - 1.00000 - - 0s
0 2 1.00000 0 78 - 1.00000 - - 0s
* 285 184 39 0.9333298 1.00000 7.14% 25.8 0s
* 322 197 31 0.9384552 1.00000 6.56% 26.1 0s
...
49221 3823 0.97881 29 85 0.95289 1.00000 4.94% 24.7 35s
*49503 3713 34 0.9554842 1.00000 4.66% 24.7 35s
55321 3829 0.98481 27 87 0.95548 0.99728 4.37% 24.9 40s
*60201 3345 33 0.9559043 0.99209 3.79% 25.6 44s
60617 3247 0.95599 23 88 0.95590 0.99170 3.74% 25.7 45s
*61614 3123 33 0.9564515 0.99071 3.58% 25.9 45s
65850 2317 infeasible 29 0.95645 0.98537 3.02% 26.8 50s
71208 659 cutoff 27 0.95645 0.96873 1.28% 28.7 55s
Cutting planes:
RLT: 42
Explored 72714 nodes (2139918 simplex iterations) in 56.59 seconds
Thread count was 4 (of 4 available processors)
Thanks. Yes, I know what is happening. In my formulation I did not have an upper bound on z (the objective). Somehow, this leads to an unbounded relaxation, from which Gurobi never recovers. After adding the bound z<=1 (we know the correlation coefficient is less than 1), I get similar results as you. I will add some notes in the post about this.
DeleteInteresting! Solvers can be weird...
DeleteBut it can't be that alone. When I remove all variable bounds from my model, the root relaxation of my model is "3.225447". After 1s the solver finds "0.9481638". (The optimum is found after 640s, and proving optimality takes more than 1000s.)
I have attached the Gurobi LP file demonstrating the problem.
DeleteThanks for attaching the LP. The number of explored nodes is very different for our models. I suspect that the "sigma"-constraints are not formulated ideally for Gurobi in your model. Instead of summing "(a-b)^2" directly, one can create a new intermediate variable "c=a-b" and sum "c^2" instead. Usually, this helps Gurobi's performance tremendously. Couldn't test it, because I suspect I hit a Gurobi bug...
Delete