## Tuesday, October 27, 2020

### Finding a cluster of closest points

#### Problem statement

From [1]:

Consider $$n=100$$ points in an 12 dimensional space. Find $$m=8$$ points such that they are as close as possible.

#### Models

The problem can be stated as a simple MIQP (Mixed Integer Quadratic Programming) model.

MINSUM MIQP Model
\begin{align} \min & \sum_{i\lt j} \color{darkred}x_i \cdot \color{darkred}x_j \cdot \color{darkblue}{\mathit{dist}}_{i,j} \\ & \sum_i \color{darkred}x_i = \color{darkblue}m \\ & \color{darkred}x_i \in \{0,1\} \end{align}

This is a simple model. The main wrinkle is that we want to make sure that we only count each distance once. For this reason, we only consider distances with $$i \lt j$$. Of course, we can exploit this also in calculating the distance matrix $$\color{darkblue}{\mathit{dist}}_{i,j}$$, and make this a strictly upper-triangular matrix. This saves time and memory.

As the $$\color{darkred}x$$ variables are binary, we can easily linearize this problem:

MINSUM MIP Model
\begin{align} \min & \sum_{i\lt j} \color{darkred}y_{i,j} \cdot \color{darkblue}{\mathit{dist}}_{i,j} \\ & \color{darkred}y_{i,j} \ge \color{darkred}x_i + \color{darkred}x_j - 1 && \forall i \lt j\\ & \sum_i \color{darkred}x_i = \color{darkblue}m \\ & \color{darkred}x_i \in \{0,1\} \\ &\color{darkred}y_{i,j} \in [0,1] \end{align}

The inequality implements the implication $\color{darkred}x_i= 1 \textbf{ and } \color{darkred}x_j = 1 \Rightarrow \color{darkred}y_{i,j} = 1$ The variables $$\color{darkred}y_{i,j}$$ can be binary or can be relaxed to be continuous between 0 and 1. The MIQP and MIP model should deliver the same results.

Finally, we can also consider a slightly different problem. Instead of minimizing the sum of the distances between points inside the cluster of the selected points, we can also minimize the maximum distance between points inside the cluster. The (linear) model can look like:

MINMAX MIP Model
\begin{align} \min\> & \color{darkred}z \\ & \color{darkred}z \ge \color{darkred}y_{i,j} \cdot \color{darkblue}{\mathit{dist}}_{i,j} && \forall i \lt j \\ & \color{darkred}y_{i,j} \ge \color{darkred}x_i + \color{darkred}x_j - 1 && \forall i \lt j\\ & \sum_i \color{darkred}x_i = \color{darkblue}m \\ & \color{darkred}x_i \in \{0,1\} \\ &\color{darkred}y_{i,j} \in [0,1] \end{align}

We re-use our linearization here.

Finally, I also tried a very simple greedy heuristic:
1. Select the two points that are closest to each other.
2. Select a new unselected point that is closest to our already selected points.
3. Repeat step 2 until we have selected 8 points.
We can expect our optimization models to do a bit better than this simplistic approach.

In our methods here, we did not make any assumptions about the distance metric being used We just needed the distance matrix. This means you can use Euclidean, Manhattan, or other metrics. In addition, you can use some normalization or weighting before calculating the distances. This may be useful if the features have different units. These models also do not change whether one uses low- or high-dimensional data.

There are alternative models one could envision, such as finding the smallest enclosing sphere or box. These methods do not make things simpler and will likely not improve upon our models.

#### Small data set

Let's start with selecting $$m=8$$ points from  $$n=50$$, using random 2d coordinates.

----     10 PARAMETER coord  coordinates

x           y

point1        0.172       0.843
point2        0.550       0.301
point3        0.292       0.224
point4        0.350       0.856
point5        0.067       0.500
point6        0.998       0.579
point7        0.991       0.762
point8        0.131       0.640
point9        0.160       0.250
point10       0.669       0.435
point11       0.360       0.351
point12       0.131       0.150
point13       0.589       0.831
point14       0.231       0.666
point15       0.776       0.304
point16       0.110       0.502
point17       0.160       0.872
point18       0.265       0.286
point19       0.594       0.723
point20       0.628       0.464
point21       0.413       0.118
point22       0.314       0.047
point23       0.339       0.182
point24       0.646       0.561
point25       0.770       0.298
point26       0.661       0.756
point27       0.627       0.284
point28       0.086       0.103
point29       0.641       0.545
point30       0.032       0.792
point31       0.073       0.176
point32       0.526       0.750
point33       0.178       0.034
point34       0.585       0.621
point35       0.389       0.359
point36       0.243       0.246
point37       0.131       0.933
point38       0.380       0.783
point39       0.300       0.125
point40       0.749       0.069
point41       0.202       0.005
point42       0.270       0.500
point43       0.151       0.174
point44       0.331       0.317
point45       0.322       0.964
point46       0.994       0.370
point47       0.373       0.772
point48       0.397       0.913
point49       0.120       0.735
point50       0.055       0.576


The MINSUM MIQP and MIP model gives the same solution, but (as expected) the MINMAX model is slightly different. Here are the results with Cplex:

          HEURISTIC        MIQP         MIP      MINMAX

point2        1.000
point3                    1.000       1.000       1.000
point9                                            1.000
point10       1.000
point11                   1.000       1.000
point12                                           1.000
point15       1.000
point18                   1.000       1.000       1.000
point20       1.000
point23                   1.000       1.000       1.000
point24       1.000
point25       1.000
point27       1.000
point29       1.000
point35                   1.000       1.000
point36                   1.000       1.000       1.000
point39                   1.000       1.000       1.000
point43                                           1.000
point44                   1.000       1.000
status                  Optimal     Optimal     Optimal
obj                       3.447       3.447       0.210
sum           4.997       3.447       3.447       3.522
max           0.291       0.250       0.250       0.210
time                     33.937       2.125       0.562


When we plot the results, we see:

Obviously, our optimization models do much better than the heuristic. Two optimization models have quite an overlap in the selected points. To compare the objectives MINSUM and MINMAX we can plot them against each other:

The MINSUM point is delivered by the MIQP and MIP models. Obviously, our heuristic is not doing that great.

#### Large data set

Here we select $$m=8$$ points from  $$n=100$$, using random 12d coordinates. Using Gurobi on a faster machine we see:

----    112 PARAMETER results

HEURISTIC        MIQP         MIP      MINMAX

point5        1.000                               1.000
point8                    1.000       1.000
point17       1.000                               1.000
point19                   1.000       1.000
point24                                           1.000
point35                   1.000       1.000
point38                   1.000       1.000
point39                                           1.000
point42       1.000                               1.000
point43                                           1.000
point45       1.000       1.000       1.000
point51       1.000
point56       1.000
point76       1.000
point81                   1.000       1.000
point89                   1.000       1.000
point94       1.000       1.000       1.000       1.000
point97                                           1.000
status                TimeLimit     Optimal     Optimal
obj                      26.230      26.230       1.147
sum          30.731      26.230      26.230      27.621
max           1.586       1.357       1.357       1.147
time                   1015.984      49.000       9.594
gap%                     23.379


The MINSUM MIQP model hit a time limit of 1000 seconds but actually found the optimal solution. It just did not have the time to prove it. The overlap between the MINSUM models and the MINMAX model is very small (just one shared point). I expected more overlap.

#### Conclusions

The MINMAX model seems to solve faster than the MINSUM versions. This is a little bit surprising: often these max or min objectives are slower than direct sums. It always pays off to do some experiments with different formulations. Slightly different models can show a large difference in solution times. This is a great example of that.

These models are independent of the dimensionality of the data and the distance metric being used.

#### References

1. Given a set of points or vectors, find the set of N points that are closest to each other, https://stackoverflow.com/questions/64542442/given-a-set-of-points-or-vectors-find-the-set-of-n-points-that-are-closest-to-e

#### Appendix: GAMS code

Some interesting features:
• There are not that many models where I can use xor operators. Here it is used in the Greedy Heuristic where we want to consider points $$i$$ and $$j$$ where one of them is in the cluster and one outside.
• Macros are used to prevent repetitive code in the reporting of results.
• Acronyms are used in reporting.

 $ontext find cluster of m=8 points closest to each other compare: simple greedy heuristic (HEURISTIC) quadratic model (MIQP) linearized model (MIP) minmax model (MINMAX)$offtext set   c 'coordinates' /x,y/   i 'points' /point1*point50/ ; scalar m 'number of points to select' /8/; parameter coord(i,c) 'random coordinates'; coord(i,c) = uniform(0,1); display coord; alias(i,j); set ij(i,j) 'upper triangular structure'; ij(i,j) = ord(i) < ord(j); parameter dist(i,j) 'euclidean distances'; dist(ij(i,j)) = sqrt(sum(c, sqr(coord(i,c)-coord(j,c)))); binary variables x(i) 'selected points'; positive variable y(i,j) 'x(i)*x(j), relaxed to be continuous'; y.up(ij)=1; variable z 'objective'; equations     obj1  'quadratic objective'     obj2  'linearized objective'     obj3  'minmax objective'     select 'number of selected points'     bound  'implication x(i)=x(j)=1 ==> y(i,j)=1' ; obj1.. z =e= sum(ij(i,j), x(i)*x(j)*dist(ij)); obj2.. z =e= sum(ij, y(ij)*dist(ij)); obj3(ij).. z =g= y(ij)*dist(ij); select.. sum(i, x(i)) =e= m; bound(ij(i,j))..  y(i,j) =g= x(i) + x(j) - 1; model m1 /obj1,select/; model m2 /obj2,select,bound/; model m3 /obj3,select,bound/; option optcr=0,mip=cplex,miqcp=cplex,threads=8; *------------------------------------------------ * reporting macros *------------------------------------------------ parameter results(*,*); set dummy 'ordering for output' /  status, obj, sum, max, time, 'gap%' /; acronym TimeLimit; acronym Optimal; * macros for reporting $macro sumdist sum(ij(i,j), x.l(i)*x.l(j)*dist(ij))$macro maxdist    smax(ij(i,j), x.l(i)*x.l(j)*dist(ij)) $macro report(m,label) \ x.l(i) = round(x.l(i)); \ results(i,label) = x.l(i); \ results('status',label)$(m.solvestat=1) = Optimal; \     results('status',label)$(m.solvestat=3) = TimeLimit; \ results('obj',label) = z.l; \ results('sum',label) = sumdist; \ results('max',label) = maxdist; \ results('time',label) = m.resusd; \ results('gap%',label)$(m.solvestat=3) = 100*abs(m.objest - m.objval)/abs(m.objest); *------------------------------------------------ * heuristic *------------------------------------------------ * step 1 : select points 1 and 2 by minimizing distance scalar dmin,dmin1,dmin2; dmin = smin(ij,dist(ij)); loop(ij(i,j)$(dist(ij)=dmin), x.l(i)=1; x.l(j)=1; break; ); * add points 3..m, closest to points selected earlier scalar k; for(k = 3 to m, dmin = smin(ij(i,j)$(x.l(i) xor x.l(j)), dist(ij));    loop(ij(i,j)(dist(ij)=dmin), x.l(i)=1; x.l(j)=1; break; ); ); results(i,'HEURISTIC') = x.l(i); results('sum','HEURISTIC') = sumdist; results('max','HEURISTIC') = maxdist; *------------------------------------------------ * run models m1 through m3 *------------------------------------------------ solve m1 minimizing z using miqcp; report(m1,'MIQP') solve m2 minimizing z using mip; report(m2,'MIP') solve m3 minimizing z using mip; report(m3,'MINMAX') display results; ## Monday, October 26, 2020 ### Solution methods for linear bilevel problems For some classes of problems, it is a strategy to form explicit KKT optimality conditions and convert the resulting complementarity conditions to big-M constraints. In case the original problem was linear (or in some cases quadratic), the resulting model is a MIP model. One such application is solving non-convex QP models [1]. Another application is a bilevel LP, where we reformulate the inner problem by stating the optimality conditions for the inner-problem. In [2], such a problem is given: bilevel LP \begin{align}\max\> &\color{darkred}z = \color{darkred}x+\color{darkred}y \\ & \color{darkred}x \in [0,2] \\ & \min \>\color{darkred}y\\ & \qquad \color{darkred}y \ge 0 \perp \color{darkred} \lambda_1 \\ & \qquad \color{darkred}x - 0.01\color{darkred}y \le 1 \perp \color{darkred}\lambda_2 \end{align} The notation $$\perp$$ indicates the name of the dual variable for the inner constraints. After the inner problem is converted to optimality conditions, we have: Inner LP as optimality conditions \begin{align}\max\> & \color{darkred}z = \color{darkred}x+\color{darkred}y \\ & \color{darkred}x \in [0,2]\\ & 1 - \color{darkred}\lambda_1 - 0.01 \color{darkred}\lambda_2 = 0 \\ & \color{darkred}y \ge 0 \\ & \color{darkred}x - 0.01\color{darkred}y \le 1\\ & \color{darkred}\lambda_1 \cdot \color{darkred}y = 0 \\ & \color{darkred}\lambda_2 \cdot \left( \color{darkred}x -0.01\color{darkred}y - 1 \right) = 0 \\ & \color{darkred}\lambda_1,\color{darkred}\lambda_2\ge 0 \end{align} The products can be interpreted as an "or" condition. These can be linearized using binary variables and big-M constraints: Linearized problem \begin{align}\max\> & \color{darkred}z = \color{darkred}x+\color{darkred}y \\ & \color{darkred}x \in [0,2]\\ & 1 - \color{darkred}\lambda_1 - 0.01 \color{darkred}\lambda_2 = 0 \\ & \color{darkred}y \ge 0 \\ & \color{darkred}x - 0.01\color{darkred}y \le 1\\ & \color{darkred}\lambda_1 \le \color{darkblue} M_1^d \color{darkred}\delta_1 \\ & \color{darkred}y \le \color{darkblue} M_1^p (1-\color{darkred}\delta_1) \\ & \color{darkred}\lambda_2 \le \color{darkblue} M_2^d \color{darkred}\delta_2 \\ & -\color{darkred}x +0.01\color{darkred}y + 1 \le \color{darkblue} M_2^p (1-\color{darkred}\delta_2) \\ & \color{darkred}\lambda_1,\color{darkred}\lambda_2\ge 0 \\ & \color{darkred}\delta_1,\color{darkred}\delta_2 \in \{0,1\} \end{align} The letters $$p$$ and $$d$$ in the big-M constants indicate primal and dual. Note that we flipped the sign of the second constraint of the inner problem. This was to make $$\color{darkred}\lambda_2$$ a non-negative variable. #### Choosing big-M values One big issue is: choose appropriate values for the big-M constants. The values $$\color{darkblue} M_1^d,\color{darkblue} M_1^p,\color{darkblue} M_2^d,\color{darkblue} M_2^p = 200$$ are valid bounds on $$\color{darkred}\lambda_1, \color{darkred}y, \color{darkred}\lambda_2, -\color{darkred}x +0.01\color{darkred}y + 1$$. This gives the solution:  LOWER LEVEL UPPER ---- VAR x . 2.0000 2.0000 ---- VAR y . 100.0000 +INF ---- VAR lambda1 . . +INF ---- VAR lambda2 . 100.0000 +INF ---- VAR delta1 . . 1.0000 ---- VAR delta2 . 1.0000 1.0000 ---- VAR z -INF 102.0000 +INF  The problem of choosing the right big M values is important. Choosing values that are too small, can lead to cutting off optimal values. Big-M values that are too large have other problems such as numerical issues and trickle flow. In [2] it is argued that many modelers use the following algorithm to detect if a big-M value is too small: 1. Select initial values for $$\color{darkblue} M_j^d, \color{darkblue} M_j^p$$. 2. Solve the linearized MIP model. 3. Look for binding big-M constraints that should be inactivated. If not found, stop: we have found the optimal solution. 4. Increase the corresponding big-M value. 5. Go to step 2. Unfortunately, this is algorithm is not guaranteed to work. [2] gives an example: use as starting values $$\color{darkblue} M_j^d = 50, \color{darkblue} M_j^p = 200$$. The solution looks like:  LOWER LEVEL UPPER ---- VAR x . 2.0000 2.0000 ---- VAR y . . +INF ---- VAR lambda1 . 1.0000 +INF ---- VAR lambda2 . . +INF ---- VAR delta1 . 1.0000 1.0000 ---- VAR delta2 . 1.0000 1.0000 ---- VAR z -INF 2.0000 +INF  The values $$\color{darkred}\delta_j=1$$ indicate we should look at the constraints: $$\color{darkred}\lambda_j \le \color{darkblue}M_j^d$$. As $$\color{darkred}\lambda_1=1, \color{darkred}\lambda_2=0$$ these are not binding. Hence, the erroneous conclusion is that this solution is optimal. Our conclusion is: we cannot reliably detect that big-M values are chosen too small by searching for binding constraints. #### Alternatives In [3], it is argued that SOS1 (special ordered sets of type 1) variables should be used instead of binary variables. For our example, this means we would replace our big-M constraints by: SOS1 representation \begin{align} & \color{darkred}\lambda_1, \color{darkred}y \in SOS1 \\ & \color{darkred}\lambda_2, \color{darkred}s \in SOS1 \\ &\color{darkred}s = -\color{darkred}x+0.01\color{darkred}y+1 \\ & \color{darkred}s \ge 0 \end{align} Many MIP solvers (but not all) support SOS1 variables. Binary variables can sometimes be faster, but SOS1 variables have the advantage that no big-M constants are needed. The same can be said for indicator constraints. A formulation using indicator constraints can look like: Indicator constraints \begin{align} & \color{darkred} \delta_1 = 0 \Rightarrow \color{darkred}\lambda_1 =0 \\ & \color{darkred} \delta_1 = 1\Rightarrow \color{darkred}y = 0 \\ & \color{darkred} \delta_2 = 0 \Rightarrow \color{darkred}\lambda_2 = 0 \\ & \color{darkred} \delta_2 = 1 \Rightarrow \color{darkred}x-0.01\color{darkred}y = 1 \\ & \color{darkred}x-0.01\color{darkred}y \le 1 \\ & \color{darkred} \delta_1,\color{darkred}\delta_2 \in \{0,1\} \end{align} We can also solve the quadratic model directly using global solvers like Antigone, Baron, Couenne, or Gurobi. Sometimes reasonable bounds are needed to help globals solvers. It is noted that the GAMS EMP tool can generate the quadratic model for you. I still had to specify to use a global solver for it to find the optimal solution. #### Conclusion Instead of using big-M constraints to handle complementarity conditions in linear bi-level optimization problems, there are a few alternatives: • SOS1 variables • Indicator constraints • Global solvers for non-convex quadratic problems #### References 1. Solving non-convex QP problems as a MIP, https://yetanothermathprogrammingconsultant.blogspot.com/2016/06/solving-non-convex-qp-problems-as-mip.html 2. Salvador Pineda and Juan Miguel Morales, Solving Linear Bilevel Problems Using Big-Ms: Not All That Glitters Is Gold, September 2018, https://arxiv.org/pdf/1809.10448.pdf 3. Thomas Kleinert and Martin Schmidt, Why there is no need to use a big-M in linear bilevel optimization: A computational study of two ready-to-use approaches, October 2020, http://www.optimization-online.org/DB_FILE/2020/10/8065.pdf ## Sunday, October 18, 2020 ### PuLP mystery PuLP is a popular Python-based modeling tool for LP and MIP models. In [1], a user asked a (somewhat trivial) question about PuLP syntax. But there was an interesting wrinkle in the model that was unfamiliar to me. The basic issue is that sometimes PuLP accepts a NumPy array in expressions: import pulp as p import numpy as np a=np.array([1,2,3]) x=p.LpVariable('x') y=p.LpVariable('y') prob = p.LpProblem("test",p.LpMaximize) prob += x+(a-y) prob  This fragment creates an objective. But we have a strange term here. a-y is funny because a is a NumPy array (vector) and y is a scalar PuLP variable. Usually, adding a scalar to a vector is interpreted as elementwise addition: $\begin{pmatrix} a_0 - y \\ a_1 - y \\ a_2-y \end{pmatrix}$ How one would interpret an objective like: $x + \begin{pmatrix} a_0 - y \\ a_1 - y \\ a_2-y \end{pmatrix}$ is not clear to me. I would say this is an error. However, PuLP accepts this, and prints the model as: test: MAXIMIZE 1*x + -3*y + 6 VARIABLES x free Continuous y free Continuous  So, apparently the interpretation is: $x + \sum_i (a_i-y)=x + \sum_i a_i - n\cdot y$ where $$n$$ is the length of vector $$a$$. But then again, we would expect PuLP to accept  prob += (a-y) as objective. However, this produces the error message: --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-6-bf3a3a41e9da> in <module>() 8 9 prob = p.LpProblem("test",p.LpMaximize) ---> 10 prob += (a-y) 11 prob /usr/local/lib/python3.6/dist-packages/pulp/pulp.py in __iadd__(self, other) 1528 self.objective.name = name 1529 else: -> 1530 raise TypeError("Can only add LpConstraintVar, LpConstraint, LpAffineExpression or True objects") 1531 return self 1532 TypeError: Can only add LpConstraintVar, LpConstraint, LpAffineExpression or True objects  I have no idea what is going on here. It could be all just a bug, but even that can not easily explain the behavior of sometimes accepting (a-y) and sometimes refusing it. The underlying problem can actually be traced back to operator overloading as shown by [2]. Who owns the + or -? The following experiment demonstrates the issue in stark terms: >>> a+x array([1*x + 1, 1*x + 2, 1*x + 3], dtype=object) >>> x+a 1*x + 6  In the first case, NumPy handles the +, leading to the interpretation: $\begin{pmatrix} a_0 \\ a_1 \\ a_2 \end{pmatrix} + x = \begin{pmatrix} x+a_0 \\ x+a_1 \\x+a_2 \end{pmatrix}$ In the second case, the + is dealt with by PuLP. This gives $x+\begin{pmatrix} a_0 \\ a_1 \\ a_2 \end{pmatrix}=x+\sum_i a_i$ #### Conclusions This is mind-bending (a.k.a. crazy). It would be better if things were a bit more predictable ("orthogonal" is the term used in programming language design). Preferably, we don't want to spend time on figuring out how to interpret a + or a - Advise: it is better not to mix PuLP with NumPy. #### References ## Thursday, October 8, 2020 ### Maximum Correlation, Global MINLP vs GA In [1], the following question is posed: Suppose we have data like: worker weight height weight2 height2 1 120 60 125 60 2 152 55 156 66 3 222 55 100 20 We can for each case pick (weight,height) or (weight2,height2). The problem is to pick one of these observations for each row such that the correlation between weight and height is maximized. #### Data First, let's invent some data that we can use for some experiments. ---- 27 PARAMETER data height1 weight1 height2 weight2 i1 67.433285 168.262871 67.445523 163.692389 i2 70.638374 174.437750 68.649190 160.084811 i3 71.317794 159.909672 69.503911 164.720010 i4 59.850261 145.704159 61.175728 142.708300 i5 65.341938 155.586984 68.483909 165.564991 i6 64.142009 154.335001 68.568683 166.169507 i7 67.030368 158.768813 65.780803 153.721717 i8 73.672863 175.126951 73.236515 164.704340 i9 65.203516 157.593587 63.279277 149.784500 i10 69.001848 160.063428 68.786656 162.278007 i11 64.455422 159.039195 63.930208 152.827710 i12 70.719334 164.885704 69.666096 157.356595 i13 65.688428 151.223468 63.614565 150.071072 i14 66.569252 160.978671 70.533320 160.722483 i15 78.417676 172.298652 80.070076 172.695207 i16 65.396154 158.234709 67.404942 158.310596 i17 62.504967 150.899428 61.000439 154.094647 i18 62.122630 150.024298 63.634554 153.644324 i19 70.598400 165.086523 72.999194 166.771223 i20 74.935107 170.820610 76.622182 169.013550 i21 63.233956 154.331546 60.372876 149.152520 i22 72.550105 173.961915 76.748649 167.462369 i23 74.086553 168.190867 75.433331 171.773607 i24 65.379648 163.577697 65.717553 160.134888 i25 64.003038 155.357607 67.301426 158.713710  The optimal solution that has the highest correlation coefficient between height and weight is: ---- 92 PARAMETER result optimal selected observations height1 weight1 height2 weight2 i1 67.445523 163.692389 i2 68.649190 160.084811 i3 69.503911 164.720010 i4 59.850261 145.704159 i5 65.341938 155.586984 i6 64.142009 154.335001 i7 67.030368 158.768813 i8 73.236515 164.704340 i9 63.279277 149.784500 i10 69.001848 160.063428 i11 63.930208 152.827710 i12 70.719334 164.885704 i13 63.614565 150.071072 i14 70.533320 160.722483 i15 78.417676 172.298652 i16 67.404942 158.310596 i17 62.504967 150.899428 i18 62.122630 150.024298 i19 72.999194 166.771223 i20 74.935107 170.820610 i21 60.372876 149.152520 i22 76.748649 167.462369 i23 75.433331 171.773607 i24 65.717553 160.134888 i25 67.301426 158.713710  The impact of being allowed to choose between observations (height1,weight1) and (height2,weight2) is significant: Correlation coefficients height1-weight10.868691 height2-weight20.894532 optimal0.956452 Below we shall see how we arrive at this conclusion. #### MINLP Model A high-level Mixed-Integer Non-Linear Programming model is simply: MINLP Model \begin{align}\max\> &\mathbf{cor}(\color{darkred}h,\color{darkred}w) \\ & \color{darkred}h_i = \color{darkblue}{\mathit{height1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{height2}}_i\cdot\color{darkred}x_i\\ & \color{darkred}w_i = \color{darkblue}{\mathit{weight1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{weight2}}_i\cdot\color{darkred}x_i \\ & \color{darkred}x_i \in \{0,1\} \end{align} Here $$\mathbf{cor}(h,w)$$ indicates the (Pearson) correlation between vectors $$h$$ and $$w$$. It is noted that height and weight are positively correlated, so maximizing makes sense. Of course, GAMS does not know about correlations, so we implement this model as: Implementation of MINLP Model \begin{align}\max\> & \color{darkred}z = \frac{\displaystyle\sum_i (\color{darkred}h_i-\bar{\color{darkred}h})(\color{darkred}w_i-\bar{\color{darkred}w})}{\sqrt{\displaystyle\sum_i(\color{darkred}h_i-\bar{\color{darkred}h})^2}\cdot \sqrt{\displaystyle\sum_i(\color{darkred}w_i-\bar{\color{darkred}w})^2}} \\ & \color{darkred}h_i = \color{darkblue}{\mathit{height1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{height2}}_i\cdot\color{darkred}x_i \\ & \color{darkred}w_i = \color{darkblue}{\mathit{weight1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{weight2}}_i\cdot\color{darkred}x_i \\ & \bar{\color{darkred}h} = \frac{1}{n}\displaystyle\sum_i \color{darkred}h_i \\ & \bar{\color{darkred}w} = \frac{1}{n}\displaystyle\sum_i \color{darkred}w_i \\ & \color{darkred}x_i \in \{0,1\} \end{align} We need to watch when using divisions. I typically like to reformulate divisions by multiplications: \begin{align} \max \>& \color{darkred}z\\ &\color{darkred}z\cdot \sqrt{\displaystyle\sum_i(\color{darkred}h_i-\bar{\color{darkred}h})^2}\cdot \sqrt{\displaystyle\sum_i(\color{darkred}w_i-\bar{\color{darkred}w})^2} = \displaystyle\sum_i (\color{darkred}h_i-\bar{\color{darkred}h})(\color{darkred}w_i-\bar{\color{darkred}w})\end{align} The advantage of this formulation is in the protection against division by zero. A disadvantage is that non-linearities are moved to the constraints. Many NLP solvers are happier when constraints are linear, and only the objective is non-linear. My answer: experiment with formulations. When we solve this model with GAMS/Baron, we see: ---- 79 VARIABLE x.L select 1 or 2 i1 1.000000, i2 1.000000, i3 1.000000, i8 1.000000, i9 1.000000, i11 1.000000, i13 1.000000 i14 1.000000, i16 1.000000, i19 1.000000, i21 1.000000, i22 1.000000, i23 1.000000, i24 1.000000 i25 1.000000 ---- 79 VARIABLE z.L = 0.956452 objective ---- 83 PARAMETER corr all1 0.868691, all2 0.894532, optimal 0.956452  The parameter corr shows different correlations: • When all $$x_i=0$$, that is when we compare height1 vs weight1. • When all $$x_i=1$$, so we compare height2 vs weight2. • When using an optimal solution for $$x$$. #### Nonconvex MIQCP Gurobi can solve non-convex quadratic models quite efficiently. To try this out, I reformulated the MINLP model into a Mixed-Integer Quadratically-Constrained Programming model. We need to add some variables and constraints to make this happen. Non-convex MIQCP Model \begin{align}\max\> & \color{darkred}z\\ & \color{darkred}z\cdot \color{darkred}{\mathit{denom}} = \sum_i (\color{darkred}h_i-\bar{\color{darkred}h})(\color{darkred}w_i-\bar{\color{darkred}w})\\ & \color{darkred}s^2_h = \sum_i(\color{darkred}h_i-\bar{\color{darkred}h})^2 \\ & \color{darkred}s^2_w = \sum_i(\color{darkred}w_i-\bar{\color{darkred}w})^2 \\ & \color{darkred}{\mathit{denom}} = \color{darkred}s_h \cdot \color{darkred}s_w\\ & \color{darkred}h_i = \color{darkblue}{\mathit{height1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{height2}}_i\cdot\color{darkred}x_i \\ & \color{darkred}w_i = \color{darkblue}{\mathit{weight1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{weight2}}_i\cdot\color{darkred}x_i \\ & \bar{\color{darkred}h} = \frac{1}{\color{darkblue}n}\displaystyle\sum_i \color{darkred}h_i \\ & \bar{\color{darkred}w} = \frac{1}{\color{darkblue}n}\displaystyle\sum_i \color{darkred}w_i \\ & \color{darkred}x_i \in \{0,1\} \\ & \color{darkred}s_h, \color{darkred}s_w \ge 0 \end{align} The final model is still quite small, but I encountered severe problems: Gurobi Optimizer version 9.0.1 build v9.0.1rc0 (win64) Optimize a model with 52 rows, 81 columns and 152 nonzeros Model fingerprint: 0xc1d6aa23 Model has 4 quadratic constraints Variable types: 56 continuous, 25 integer (25 binary) Coefficient statistics: Matrix range [1e-02, 1e+01] QMatrix range [1e+00, 2e+01] QLMatrix range [1e+00, 1e+00] Objective range [1e+00, 1e+00] Bounds range [1e+00, 2e+02] RHS range [6e+01, 2e+02] Presolve removed 50 rows and 50 columns Presolve time: 0.02s Presolved: 179 rows, 88 columns, 637 nonzeros Presolved model has 7 bilinear constraint(s) Variable types: 63 continuous, 25 integer (25 binary) Root relaxation: unbounded, 0 iterations, 0.00 seconds Nodes | Current Node | Objective Bounds | Work Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time 0 2 postponed 0 - - - - 0s 50873 43644 postponed 872 - - - 0.0 5s 124683 118126 postponed 805 - - - 0.0 10s 196009 189074 postponed 1409 - - - 0.0 15s 268890 261269 postponed 1771 - - - 0.0 20s 341602 333841 postponed 2653 - - - 0.0 25s 416698 408963 postponed 2652 - - - 0.0 30s 491794 484945 postponed 2764 - - - 0.0 35s 566890 559155 postponed 3533 - - - 0.0 40s 641986 634407 postponed 4413 - - - 0.0 45s 717082 709397 postponed 5294 - - - 0.0 50s 792178 784445 postponed 5296 - - - 0.0 55s 867274 859517 postponed 5295 - - - 0.0 60s 942370 934643 postponed 6177 - - - 0.0 65s 1016274 1009505 postponed 6177 - - - 0.0 70s 1091370 1083743 postponed 7052 - - - 0.0 75s 1166466 1158805 postponed 7939 - - - 0.0 80s 1241562 1233913 postponed 7938 - - - 0.0 85s 1316658 1308953 postponed 7939 - - - 0.0 90s 1391754 1384013 postponed 8820 - - - 0.0 95s 1465658 1457969 postponed 8820 - - - 0.0 100s . . .  This goes on like this for much longer: no feasible solution is found. This looks very bad. Very strange, because this MIQCP model can be solved without change using solvers like Baron, Antigone, and Couenne. Here is a clue: Root relaxation: unbounded This unbounded relaxation is the cause of the problem. The relaxation Gurobi solves is not a non-convex QCP model (that would not be unbounded), but rather something even more "relaxed". Luckily, we know a good bound on the objective. Correlation coefficients are always between $$-1$$ and $$+1$$. This means we can add the bound: $$z \le 1$$. After adding this, we see: Root relaxation: objective 1.000000e+00, 65 iterations, 0.00 seconds Nodes | Current Node | Objective Bounds | Work Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time 0 0 1.00000 0 7 - 1.00000 - - 0s 0 0 1.00000 0 5 - 1.00000 - - 0s 0 0 1.00000 0 5 - 1.00000 - - 0s 0 0 1.00000 0 5 - 1.00000 - - 0s 0 2 1.00000 0 6 - 1.00000 - - 0s * 324 297 44 0.9195020 1.00000 8.75% 7.7 0s * 416 297 42 0.9400483 1.00000 6.38% 7.5 0s * 556 511 28 0.9409507 1.00000 6.28% 7.6 0s * 598 511 27 0.9457779 1.00000 5.73% 7.5 0s * 603 511 28 0.9473171 1.00000 5.56% 7.4 0s * 643 511 29 0.9483207 1.00000 5.45% 7.2 0s * 1195 620 33 0.9494881 1.00000 5.32% 7.0 0s *10829 2567 40 0.9499222 1.00000 5.27% 7.7 0s *12064 2834 41 0.9526072 1.00000 4.98% 7.6 0s *13333 3393 39 0.9530905 1.00000 4.92% 7.4 0s *13341 3392 39 0.9531092 1.00000 4.92% 7.4 0s H27924 7927 0.9539760 1.00000 4.82% 6.4 1s *52593 15598 41 0.9543576 1.00000 4.78% 5.9 1s H53294 15799 0.9548335 1.00000 4.73% 5.9 2s H96539 25687 0.9564515 0.99711 4.25% 5.9 3s 155894 36175 cutoff 33 0.95645 0.99403 3.93% 5.2 5s 335621 57101 0.98359 32 10 0.95645 0.98804 3.30% 4.7 10s 518727 68913 0.95912 31 13 0.95645 0.98354 2.83% 4.5 15s 703811 73266 cutoff 31 0.95645 0.97999 2.46% 4.3 20s 890371 75019 cutoff 33 0.95645 0.97695 2.14% 4.3 25s 1082662 73850 cutoff 37 0.95645 0.97425 1.86% 4.2 30s 1270873 69325 0.96318 34 9 0.95645 0.97177 1.60% 4.1 35s 1451055 61577 cutoff 37 0.95645 0.96956 1.37% 4.1 40s 1628515 49114 cutoff 35 0.95645 0.96730 1.13% 4.0 45s 1800508 30288 0.95954 35 9 0.95645 0.96460 0.85% 4.0 50s Cutting planes: Gomory: 1 MIR: 2 Flow cover: 3 RLT: 5 Explored 1916074 nodes (7618484 simplex iterations) in 53.76 seconds Thread count was 8 (of 16 available processors) Solution count 10: 0.956452 0.954834 0.954358 ... 0.948321 No other solutions better than 0.956452  This saved Gurobi! Now we see that Gurobi finds the optimal solution with a correlation coefficient of 0.956452. Another possible improvement is to introduce mean-adjusted values (this is a linear operation). This will make the quadratic constraints a bit simpler, at the expense of extra variables and (linear) constraints. Mean-adjusted non-convex MIQCP Model \begin{align}\max\> & \color{darkred}z\\ & \color{darkred}z\cdot \color{darkred}{\mathit{denom}} = \sum_i \tilde{\color{darkred}h_i} \cdot \tilde{\color{darkred}w_i}\\ & \color{darkred}s^2_h = \sum_i \tilde{\color{darkred}h}^2_i \\ & \color{darkred}s^2_w = \sum_i \tilde{\color{darkred}w}^2_i \\ & \color{darkred}{\mathit{denom}} = \color{darkred}s_h \cdot \color{darkred}s_w\\ & \color{darkred}h_i = \color{darkblue}{\mathit{height1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{height2}}_i\cdot\color{darkred}x_i \\ & \color{darkred}w_i = \color{darkblue}{\mathit{weight1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{weight2}}_i\cdot\color{darkred}x_i \\ & \bar{\color{darkred}h} = \frac{1}{\color{darkblue}n}\displaystyle\sum_i \color{darkred}h_i \\ & \bar{\color{darkred}w} = \frac{1}{\color{darkblue}n}\displaystyle\sum_i \color{darkred}w_i \\ & \tilde{\color{darkred}h_i}= \color{darkred}h_i - \bar{\color{darkred}h}\\ & \tilde{\color{darkred}w_i} = \color{darkred}w_i - \bar{\color{darkred}w} \\ & \color{darkred}x_i \in \{0,1\} \\ & \color{darkred}s_h, \color{darkred}s_w \ge 0 \end{align} This formulation does not need the bound $$z\le 1$$, although it still performs better with it. #### Genetic Algorithm It is interesting to see how a simple meta-heuristic would do on this problem. Here are the results using the ga function from the GA package [2]. This was actually quite easy to implement. > df <- read.table(text=" + id height1 weight1 height2 weight2 + i1 67.433285 168.262871 67.445523 163.692389 + i2 70.638374 174.437750 68.649190 160.084811 + i3 71.317794 159.909672 69.503911 164.720010 + i4 59.850261 145.704159 61.175728 142.708300 + i5 65.341938 155.586984 68.483909 165.564991 + i6 64.142009 154.335001 68.568683 166.169507 + i7 67.030368 158.768813 65.780803 153.721717 + i8 73.672863 175.126951 73.236515 164.704340 + i9 65.203516 157.593587 63.279277 149.784500 + i10 69.001848 160.063428 68.786656 162.278007 + i11 64.455422 159.039195 63.930208 152.827710 + i12 70.719334 164.885704 69.666096 157.356595 + i13 65.688428 151.223468 63.614565 150.071072 + i14 66.569252 160.978671 70.533320 160.722483 + i15 78.417676 172.298652 80.070076 172.695207 + i16 65.396154 158.234709 67.404942 158.310596 + i17 62.504967 150.899428 61.000439 154.094647 + i18 62.122630 150.024298 63.634554 153.644324 + i19 70.598400 165.086523 72.999194 166.771223 + i20 74.935107 170.820610 76.622182 169.013550 + i21 63.233956 154.331546 60.372876 149.152520 + i22 72.550105 173.961915 76.748649 167.462369 + i23 74.086553 168.190867 75.433331 171.773607 + i24 65.379648 163.577697 65.717553 160.134888 + i25 64.003038 155.357607 67.301426 158.713710 + ", header=T) > > # > # print obvious cases > # > cor(dfweight1,df$height1) [1] 0.8686908 > cor(df$weight2,df$height2) [1] 0.894532 > > # > # fitness function > # > f <- function(x) { + w <- df$weight1*(1-x) + df$weight2*x + h <- df$height1*(1-x) + df\$height2*x
+   cor(w,h)
+ }
>
> library(GA)
> res <- ga(type=c("binary"),fitness=f,nBits=25,seed=123)
GA | iter = 1 | Mean = 0.8709318 | Best = 0.9237155
GA | iter = 2 | Mean = 0.8742004 | Best = 0.9237155
GA | iter = 3 | Mean = 0.8736450 | Best = 0.9237155
GA | iter = 4 | Mean = 0.8742228 | Best = 0.9384788
GA | iter = 5 | Mean = 0.8746517 | Best = 0.9384788
GA | iter = 6 | Mean = 0.8792048 | Best = 0.9486227
GA | iter = 7 | Mean = 0.8844841 | Best = 0.9486227
GA | iter = 8 | Mean = 0.8816874 | Best = 0.9486227
GA | iter = 9 | Mean = 0.8805522 | Best = 0.9486227
GA | iter = 10 | Mean = 0.8820974 | Best = 0.9486227
GA | iter = 11 | Mean = 0.8859074 | Best = 0.9486227
GA | iter = 12 | Mean = 0.8956467 | Best = 0.9486227
GA | iter = 13 | Mean = 0.8989140 | Best = 0.9486227
GA | iter = 14 | Mean = 0.9069327 | Best = 0.9486227
GA | iter = 15 | Mean = 0.9078787 | Best = 0.9486227
GA | iter = 16 | Mean = 0.9069163 | Best = 0.9489443
GA | iter = 17 | Mean = 0.9104712 | Best = 0.9489443
GA | iter = 18 | Mean = 0.9169900 | Best = 0.9489443
GA | iter = 19 | Mean = 0.9175285 | Best = 0.9489443
GA | iter = 20 | Mean = 0.9207076 | Best = 0.9489443
GA | iter = 21 | Mean = 0.9210288 | Best = 0.9489443
GA | iter = 22 | Mean = 0.9206928 | Best = 0.9489443
GA | iter = 23 | Mean = 0.9210399 | Best = 0.9489443
GA | iter = 24 | Mean = 0.9208985 | Best = 0.9489443
GA | iter = 25 | Mean = 0.9183778 | Best = 0.9511446
GA | iter = 26 | Mean = 0.9217391 | Best = 0.9511446
GA | iter = 27 | Mean = 0.9274271 | Best = 0.9522764
GA | iter = 28 | Mean = 0.9271156 | Best = 0.9522764
GA | iter = 29 | Mean = 0.9275347 | Best = 0.9522764
GA | iter = 30 | Mean = 0.9278315 | Best = 0.9522764
GA | iter = 31 | Mean = 0.9300289 | Best = 0.9522764
GA | iter = 32 | Mean = 0.9306409 | Best = 0.9528777
GA | iter = 33 | Mean = 0.9309087 | Best = 0.9528777
GA | iter = 34 | Mean = 0.9327691 | Best = 0.9528777
GA | iter = 35 | Mean = 0.9309344 | Best = 0.9549574
GA | iter = 36 | Mean = 0.9341977 | Best = 0.9549574
GA | iter = 37 | Mean = 0.9374437 | Best = 0.9559043
GA | iter = 38 | Mean = 0.9394410 | Best = 0.9559043
GA | iter = 39 | Mean = 0.9405482 | Best = 0.9559043
GA | iter = 40 | Mean = 0.9432749 | Best = 0.9564515
GA | iter = 41 | Mean = 0.9441814 | Best = 0.9564515
GA | iter = 42 | Mean = 0.9458232 | Best = 0.9564515
GA | iter = 43 | Mean = 0.9469625 | Best = 0.9564515
GA | iter = 44 | Mean = 0.9462313 | Best = 0.9564515
GA | iter = 45 | Mean = 0.9449716 | Best = 0.9564515
GA | iter = 46 | Mean = 0.9444071 | Best = 0.9564515
GA | iter = 47 | Mean = 0.9437149 | Best = 0.9564515
GA | iter = 48 | Mean = 0.9446355 | Best = 0.9564515
GA | iter = 49 | Mean = 0.9455424 | Best = 0.9564515
GA | iter = 50 | Mean = 0.9456497 | Best = 0.9564515
GA | iter = 51 | Mean = 0.9461382 | Best = 0.9564515
GA | iter = 52 | Mean = 0.9444960 | Best = 0.9564515
GA | iter = 53 | Mean = 0.9434671 | Best = 0.9564515
GA | iter = 54 | Mean = 0.9451851 | Best = 0.9564515
GA | iter = 55 | Mean = 0.9481903 | Best = 0.9564515
GA | iter = 56 | Mean = 0.9477778 | Best = 0.9564515
GA | iter = 57 | Mean = 0.9481829 | Best = 0.9564515
GA | iter = 58 | Mean = 0.9490952 | Best = 0.9564515
GA | iter = 59 | Mean = 0.9505670 | Best = 0.9564515
GA | iter = 60 | Mean = 0.9499329 | Best = 0.9564515
GA | iter = 61 | Mean = 0.9509299 | Best = 0.9564515
GA | iter = 62 | Mean = 0.9505341 | Best = 0.9564515
GA | iter = 63 | Mean = 0.9519624 | Best = 0.9564515
GA | iter = 64 | Mean = 0.9518618 | Best = 0.9564515
GA | iter = 65 | Mean = 0.9523598 | Best = 0.9564515
GA | iter = 66 | Mean = 0.9516766 | Best = 0.9564515
GA | iter = 67 | Mean = 0.9521926 | Best = 0.9564515
GA | iter = 68 | Mean = 0.9524419 | Best = 0.9564515
GA | iter = 69 | Mean = 0.9532865 | Best = 0.9564515
GA | iter = 70 | Mean = 0.9535871 | Best = 0.9564515
GA | iter = 71 | Mean = 0.9536049 | Best = 0.9564515
GA | iter = 72 | Mean = 0.9534035 | Best = 0.9564515
GA | iter = 73 | Mean = 0.9532859 | Best = 0.9564515
GA | iter = 74 | Mean = 0.9521064 | Best = 0.9564515
GA | iter = 75 | Mean = 0.9534997 | Best = 0.9564515
GA | iter = 76 | Mean = 0.9539987 | Best = 0.9564515
GA | iter = 77 | Mean = 0.9536670 | Best = 0.9564515
GA | iter = 78 | Mean = 0.9526224 | Best = 0.9564515
GA | iter = 79 | Mean = 0.9531871 | Best = 0.9564515
GA | iter = 80 | Mean = 0.9527495 | Best = 0.9564515
GA | iter = 81 | Mean = 0.9526061 | Best = 0.9564515
GA | iter = 82 | Mean = 0.9525577 | Best = 0.9564515
GA | iter = 83 | Mean = 0.9525084 | Best = 0.9564515
GA | iter = 84 | Mean = 0.9519052 | Best = 0.9564515
GA | iter = 85 | Mean = 0.9518549 | Best = 0.9564515
GA | iter = 86 | Mean = 0.9511299 | Best = 0.9564515
GA | iter = 87 | Mean = 0.9505129 | Best = 0.9564515
GA | iter = 88 | Mean = 0.9518203 | Best = 0.9564515
GA | iter = 89 | Mean = 0.9537234 | Best = 0.9564515
GA | iter = 90 | Mean = 0.9531017 | Best = 0.9564515
GA | iter = 91 | Mean = 0.9514525 | Best = 0.9564515
GA | iter = 92 | Mean = 0.9505517 | Best = 0.9564515
GA | iter = 93 | Mean = 0.9524752 | Best = 0.9564515
GA | iter = 94 | Mean = 0.9533879 | Best = 0.9564515
GA | iter = 95 | Mean = 0.9519166 | Best = 0.9564515
GA | iter = 96 | Mean = 0.9524416 | Best = 0.9564515
GA | iter = 97 | Mean = 0.9526676 | Best = 0.9564515
GA | iter = 98 | Mean = 0.9523745 | Best = 0.9564515
GA | iter = 99 | Mean = 0.9523710 | Best = 0.9564515
GA | iter = 100 | Mean = 0.9519255 | Best = 0.9564515
> res@solution
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 x20 x21 x22 x23 x24 x25
[1,]  1  1  1  0  0  0  0  1  1   0   1   0   1   1   0   1   0   0   1   0   1   1   1   1   1
> res@fitnessValue
[1] 0.9564515


When comparing to our proven optimal Baron and Gurobi solution, we see that the ga function finds the optimal solution. Of course, we would not know this if we did not solve the model with a global solver.

#### Conclusions

• Allowing us to pick and choose observations between two data sets can have a significant effect on the correlation coefficient.
• The problem to find the optimal mix is a difficult combinatorial problem.
• The obvious approach is to formulate this as a non-convex MINLP.
• An alternative is to use a non-convex MIQCP problem. This will bring Gurobi into the picture. However, if we don't impose an upper-bound on the objective, Gurobi is rendered totally helpless. After adding the bound, things look much better.
• This problem looks quite suited for a meta-heuristic. They are fast and simple to implement, but you only get good solutions (occasionally optimal ones), without any quality assurance.

#### Appendix: Gurobi LP file

This LP file causes real trouble for Gurobi due to its unbounded relaxation. Note this has to be solved with the option nonconvex 2

\ Model GAMS Model
\ LP format - for model browsing. Use MPS format to capture full model detail.
Maximize
z
Subject To
defwx(i1): 4.57048155290812 x(i1) + hwx(i1,weight) = 168.2628705894044
defwx(i2): 14.35293953397428 x(i2) + hwx(i2,weight) = 174.4377502565614
defwx(i3): - 4.810337240480266 x(i3) + hwx(i3,weight) = 159.9096723101254
defwx(i4): 2.995859558106929 x(i4) + hwx(i4,weight) = 145.7041593291271
defwx(i5): - 9.978006854397336 x(i5) + hwx(i5,weight) = 155.5869839889373
defwx(i6): - 11.83450629445829 x(i6) + hwx(i6,weight) = 154.335000536442
defwx(i7): 5.047095974632526 x(i7) + hwx(i7,weight) = 158.7688132554227
defwx(i8): 10.42261080570944 x(i8) + hwx(i8,weight) = 175.1269512206012
defwx(i9): 7.809086784903798 x(i9) + hwx(i9,weight) = 157.5935866164602
defwx(i10): - 2.214578422082724 x(i10) + hwx(i10,weight) = 160.0634282485366
defwx(i11): 6.211485020463414 x(i11) + hwx(i11,weight) = 159.0391947749885
defwx(i12): 7.529108790630033 x(i12) + hwx(i12,weight) = 164.8857039875973
defwx(i13): 1.152396641792848 x(i13) + hwx(i13,weight) = 151.2234684137615
defwx(i14): 0.2561878995556413 x(i14) + hwx(i14,weight) = 160.9786711234614
defwx(i15): - 0.3965543777673872 x(i15) + hwx(i15,weight) = 172.2986522599465
defwx(i16): - 0.0758869132981204 x(i16) + hwx(i16,weight) = 158.2347086210175
defwx(i17): - 3.195219014258043 x(i17) + hwx(i17,weight) = 150.8994283190054
defwx(i18): - 3.620025521749142 x(i18) + hwx(i18,weight) = 150.0242984063181
defwx(i19): - 1.684699946033305 x(i19) + hwx(i19,weight) = 165.0865228272624
defwx(i20): 1.807060121523165 x(i20) + hwx(i20,weight) = 170.8206099631946
defwx(i21): 5.179025632721846 x(i21) + hwx(i21,weight) = 154.3315455729319
defwx(i22): 6.499545761101757 x(i22) + hwx(i22,weight) = 173.9619147338905
defwx(i23): - 3.582740184390104 x(i23) + hwx(i23,weight) = 168.190866891775
defwx(i24): 3.442809290323368 x(i24) + hwx(i24,weight) = 163.5776971098937
defwx(i25): - 3.356103037296691 x(i25) + hwx(i25,weight) = 155.3576067743665
defhx(i1): - 0.012238100107453 x(i1) + hwx(i1,height) = 67.43328535705723
defhx(i2): 1.98918429361234 x(i2) + hwx(i2,height) = 70.6383740411321
defhx(i3): 1.813883224886396 x(i3) + hwx(i3,height) = 71.3177939118135
defhx(i4): - 1.325467115496522 x(i4) + hwx(i4,height) = 59.85026108005525
defhx(i5): - 3.141971238432788 x(i5) + hwx(i5,height) = 65.34193793182213
defhx(i6): - 4.426674002923676 x(i6) + hwx(i6,height) = 64.14200864133288
defhx(i7): 1.249564935900537 x(i7) + hwx(i7,height) = 67.03036779841625
defhx(i8): 0.4363485690300877 x(i8) + hwx(i8,height) = 73.67286344172466
defhx(i9): 1.924239239167797 x(i9) + hwx(i9,height) = 65.20351620466388
defhx(i10): 0.2151915024726918 x(i10) + hwx(i10,height) = 69.00184760175402
defhx(i11): 0.5252136910261171 x(i11) + hwx(i11,height) = 64.45542162911404
defhx(i12): 1.053237234263008 x(i12) + hwx(i12,height) = 70.71933367432223
defhx(i13): 2.073862555158748 x(i13) + hwx(i13,height) = 65.68842782277154
defhx(i14): - 3.964068185905262 x(i14) + hwx(i14,height) = 66.5692517581452
defhx(i15): - 1.652399901482397 x(i15) + hwx(i15,height) = 78.41767636858228
defhx(i16): - 2.008787567894402 x(i16) + hwx(i16,height) = 65.39615405205915
defhx(i17): 1.504528370533244 x(i17) + hwx(i17,height) = 62.50496743343454
defhx(i18): - 1.511924170502624 x(i18) + hwx(i18,height) = 62.12263020481269
defhx(i19): - 2.400794170619122 x(i19) + hwx(i19,height) = 70.5983998511028
defhx(i20): - 1.68707570335097 x(i20) + hwx(i20,height) = 74.93510670286524
defhx(i21): 2.861079762565417 x(i21) + hwx(i21,height) = 63.23395556183407
defhx(i22): - 4.198544675012556 x(i22) + hwx(i22,height) = 72.55010450534969
defhx(i23): - 1.346778063106839 x(i23) + hwx(i23,height) = 74.08655337740214
defhx(i24): - 0.3379045824620164 x(i24) + hwx(i24,height) = 65.37964799513293
defhx(i25): - 3.298388577170783 x(i25) + hwx(i25,height) = 64.0030375549771
defmean(height): - 0.04 hwx(i1,height) - 0.04 hwx(i2,height) - 0.04 hwx(i3,height) - 0.04 hwx(i4,height) - 0.04 hwx(i5,height)
- 0.04 hwx(i6,height) - 0.04 hwx(i7,height) - 0.04 hwx(i8,height) - 0.04 hwx(i9,height) - 0.04 hwx(i10,height) - 0.04 hwx(i11,height)
- 0.04 hwx(i12,height) - 0.04 hwx(i13,height) - 0.04 hwx(i14,height) - 0.04 hwx(i15,height) - 0.04 hwx(i16,height) - 0.04 hwx(i17,height)
- 0.04 hwx(i18,height) - 0.04 hwx(i19,height) - 0.04 hwx(i20,height) - 0.04 hwx(i21,height) - 0.04 hwx(i22,height) - 0.04 hwx(i23,height)
- 0.04 hwx(i24,height) - 0.04 hwx(i25,height) + meanx(height) = 0
defmean(weight): - 0.04 hwx(i1,weight) - 0.04 hwx(i2,weight) - 0.04 hwx(i3,weight) - 0.04 hwx(i4,weight) - 0.04 hwx(i5,weight)
- 0.04 hwx(i6,weight) - 0.04 hwx(i7,weight) - 0.04 hwx(i8,weight) - 0.04 hwx(i9,weight) - 0.04 hwx(i10,weight) - 0.04 hwx(i11,weight)
- 0.04 hwx(i12,weight) - 0.04 hwx(i13,weight) - 0.04 hwx(i14,weight) - 0.04 hwx(i15,weight) - 0.04 hwx(i16,weight) - 0.04 hwx(i17,weight)
- 0.04 hwx(i18,weight) - 0.04 hwx(i19,weight) - 0.04 hwx(i20,weight) - 0.04 hwx(i21,weight) - 0.04 hwx(i22,weight) - 0.04 hwx(i23,weight)
- 0.04 hwx(i24,weight) - 0.04 hwx(i25,weight) + meanx(weight) = 0
sigma(height): [ - hwx(i1,height) ^2 + 2 hwx(i1,height) * meanx(height) - hwx(i2,height) ^2 + 2 hwx(i2,height) * meanx(height)
- hwx(i3,height) ^2 + 2 hwx(i3,height) * meanx(height) - hwx(i4,height) ^2 + 2 hwx(i4,height) * meanx(height)
- hwx(i5,height) ^2 + 2 hwx(i5,height) * meanx(height) - hwx(i6,height) ^2 + 2 hwx(i6,height) * meanx(height)
- hwx(i7,height) ^2 + 2 hwx(i7,height) * meanx(height) - hwx(i8,height) ^2 + 2 hwx(i8,height) * meanx(height)
- hwx(i9,height) ^2 + 2 hwx(i9,height) * meanx(height) - hwx(i10,height) ^2 + 2 hwx(i10,height) * meanx(height)
- hwx(i11,height) ^2 + 2 hwx(i11,height) * meanx(height) - hwx(i12,height) ^2 + 2 hwx(i12,height) * meanx(height)
- hwx(i13,height) ^2 + 2 hwx(i13,height) * meanx(height) - hwx(i14,height) ^2 + 2 hwx(i14,height) * meanx(height)
- hwx(i15,height) ^2 + 2 hwx(i15,height) * meanx(height) - hwx(i16,height) ^2 + 2 hwx(i16,height) * meanx(height)
- hwx(i17,height) ^2 + 2 hwx(i17,height) * meanx(height) - hwx(i18,height) ^2 + 2 hwx(i18,height) * meanx(height)
- hwx(i19,height) ^2 + 2 hwx(i19,height) * meanx(height) - hwx(i20,height) ^2 + 2 hwx(i20,height) * meanx(height)
- hwx(i21,height) ^2 + 2 hwx(i21,height) * meanx(height) - hwx(i22,height) ^2 + 2 hwx(i22,height) * meanx(height)
- hwx(i23,height) ^2 + 2 hwx(i23,height) * meanx(height) - hwx(i24,height) ^2 + 2 hwx(i24,height) * meanx(height)
- hwx(i25,height) ^2 + 2 hwx(i25,height) * meanx(height) - 25 meanx(height) ^2 + s(height) ^2 ] = 0
sigma(weight): [ - hwx(i1,weight) ^2 + 2 hwx(i1,weight) * meanx(weight) - hwx(i2,weight) ^2 + 2 hwx(i2,weight) * meanx(weight)
- hwx(i3,weight) ^2 + 2 hwx(i3,weight) * meanx(weight) - hwx(i4,weight) ^2 + 2 hwx(i4,weight) * meanx(weight)
- hwx(i5,weight) ^2 + 2 hwx(i5,weight) * meanx(weight) - hwx(i6,weight) ^2 + 2 hwx(i6,weight) * meanx(weight)
- hwx(i7,weight) ^2 + 2 hwx(i7,weight) * meanx(weight) - hwx(i8,weight) ^2 + 2 hwx(i8,weight) * meanx(weight)
- hwx(i9,weight) ^2 + 2 hwx(i9,weight) * meanx(weight) - hwx(i10,weight) ^2 + 2 hwx(i10,weight) * meanx(weight)
- hwx(i11,weight) ^2 + 2 hwx(i11,weight) * meanx(weight) - hwx(i12,weight) ^2 + 2 hwx(i12,weight) * meanx(weight)
- hwx(i13,weight) ^2 + 2 hwx(i13,weight) * meanx(weight) - hwx(i14,weight) ^2 + 2 hwx(i14,weight) * meanx(weight)
- hwx(i15,weight) ^2 + 2 hwx(i15,weight) * meanx(weight) - hwx(i16,weight) ^2 + 2 hwx(i16,weight) * meanx(weight)
- hwx(i17,weight) ^2 + 2 hwx(i17,weight) * meanx(weight) - hwx(i18,weight) ^2 + 2 hwx(i18,weight) * meanx(weight)
- hwx(i19,weight) ^2 + 2 hwx(i19,weight) * meanx(weight) - hwx(i20,weight) ^2 + 2 hwx(i20,weight) * meanx(weight)
- hwx(i21,weight) ^2 + 2 hwx(i21,weight) * meanx(weight) - hwx(i22,weight) ^2 + 2 hwx(i22,weight) * meanx(weight)
- hwx(i23,weight) ^2 + 2 hwx(i23,weight) * meanx(weight) - hwx(i24,weight) ^2 + 2 hwx(i24,weight) * meanx(weight)
- hwx(i25,weight) ^2 + 2 hwx(i25,weight) * meanx(weight) - 25 meanx(weight) ^2 + s(weight) ^2 ] = 0
denom: shw + [ - s(height) * s(weight) ] = 0
obj: [ - hwx(i1,height) * hwx(i1,weight) + hwx(i1,height) * meanx(weight) + hwx(i1,weight) * meanx(height) - hwx(i2,height) * hwx(i2,weight)
+ hwx(i2,height) * meanx(weight) + hwx(i2,weight) * meanx(height) - hwx(i3,height) * hwx(i3,weight) + hwx(i3,height) * meanx(weight)
+ hwx(i3,weight) * meanx(height) - hwx(i4,height) * hwx(i4,weight) + hwx(i4,height) * meanx(weight) + hwx(i4,weight) * meanx(height)
- hwx(i5,height) * hwx(i5,weight) + hwx(i5,height) * meanx(weight) + hwx(i5,weight) * meanx(height) - hwx(i6,height) * hwx(i6,weight)
+ hwx(i6,height) * meanx(weight) + hwx(i6,weight) * meanx(height)  - hwx(i7,height) * hwx(i7,weight) + hwx(i7,height) * meanx(weight)
+ hwx(i7,weight) * meanx(height) - hwx(i8,height) * hwx(i8,weight) + hwx(i8,height) * meanx(weight) + hwx(i8,weight) * meanx(height)
- hwx(i9,height) * hwx(i9,weight) + hwx(i9,height) * meanx(weight) + hwx(i9,weight) * meanx(height) - hwx(i10,height) * hwx(i10,weight)
+ hwx(i10,height) * meanx(weight) + hwx(i10,weight) * meanx(height) - hwx(i11,height) * hwx(i11,weight) + hwx(i11,height) * meanx(weight)
+ hwx(i11,weight) * meanx(height) - hwx(i12,height) * hwx(i12,weight) + hwx(i12,height) * meanx(weight) + hwx(i12,weight) * meanx(height)
- hwx(i13,height) * hwx(i13,weight) + hwx(i13,height) * meanx(weight) + hwx(i13,weight) * meanx(height) - hwx(i14,height) * hwx(i14,weight)
+ hwx(i14,height) * meanx(weight) + hwx(i14,weight) * meanx(height) - hwx(i15,height) * hwx(i15,weight) + hwx(i15,height) * meanx(weight)
+ hwx(i15,weight) * meanx(height) - hwx(i16,height) * hwx(i16,weight) + hwx(i16,height) * meanx(weight) + hwx(i16,weight) * meanx(height)
- hwx(i17,height) * hwx(i17,weight) + hwx(i17,height) * meanx(weight) + hwx(i17,weight) * meanx(height) - hwx(i18,height) * hwx(i18,weight)
+ hwx(i18,height) * meanx(weight) + hwx(i18,weight) * meanx(height) - hwx(i19,height) * hwx(i19,weight) + hwx(i19,height) * meanx(weight)
+ hwx(i19,weight) * meanx(height) - hwx(i20,height) * hwx(i20,weight) + hwx(i20,height) * meanx(weight) + hwx(i20,weight) * meanx(height)
- hwx(i21,height) * hwx(i21,weight) + hwx(i21,height) * meanx(weight) + hwx(i21,weight) * meanx(height) - hwx(i22,height) * hwx(i22,weight)
+ hwx(i22,height) * meanx(weight) + hwx(i22,weight) * meanx(height) - hwx(i23,height) * hwx(i23,weight) + hwx(i23,height) * meanx(weight)
+ hwx(i23,weight) * meanx(height) - hwx(i24,height) * hwx(i24,weight) + hwx(i24,height) * meanx(weight) + hwx(i24,weight) * meanx(height)
- hwx(i25,height) * hwx(i25,weight) + hwx(i25,height) * meanx(weight) + hwx(i25,weight) * meanx(height) - 25 meanx(height) * meanx(weight)
+ z * shw ] = 0
Bounds
\ z <= 1
Binaries
x(i1) x(i2) x(i3) x(i4) x(i5) x(i6) x(i7) x(i8) x(i9) x(i10) x(i11) x(i12)
x(i13) x(i14) x(i15) x(i16) x(i17) x(i18) x(i19) x(i20) x(i21) x(i22)
x(i23) x(i24) x(i25)
End