Tuesday, October 27, 2020

Finding a cluster of closest points

Problem statement


From [1]:

Consider \(n=100\) points in an 12 dimensional space. Find \(m=8\) points such that they are as close as possible.


Models


The problem can be stated as a simple MIQP (Mixed Integer Quadratic Programming) model.

 
MINSUM MIQP Model
\[\begin{align} \min & \sum_{i\lt j} \color{darkred}x_i \cdot \color{darkred}x_j \cdot \color{darkblue}{\mathit{dist}}_{i,j} \\ & \sum_i \color{darkred}x_i = \color{darkblue}m \\ & \color{darkred}x_i \in \{0,1\} \end{align}\]


This is a simple model. The main wrinkle is that we want to make sure that we only count each distance once. For this reason, we only consider distances with \(i \lt j\). Of course, we can exploit this also in calculating the distance matrix \( \color{darkblue}{\mathit{dist}}_{i,j}\), and make this a strictly upper-triangular matrix. This saves time and memory.

As the \( \color{darkred}x\) variables are binary, we can easily linearize this problem:


MINSUM MIP Model
\[\begin{align} \min & \sum_{i\lt j} \color{darkred}y_{i,j} \cdot \color{darkblue}{\mathit{dist}}_{i,j} \\ & \color{darkred}y_{i,j} \ge \color{darkred}x_i + \color{darkred}x_j - 1 && \forall i \lt j\\ & \sum_i \color{darkred}x_i = \color{darkblue}m \\ & \color{darkred}x_i \in \{0,1\} \\ &\color{darkred}y_{i,j} \in [0,1] \end{align}\]


The inequality implements the implication \[\color{darkred}x_i= 1 \textbf{ and } \color{darkred}x_j = 1 \Rightarrow \color{darkred}y_{i,j} = 1 \] The variables \(\color{darkred}y_{i,j}\) can be binary or can be relaxed to be continuous between 0 and 1. The MIQP and MIP model should deliver the same results. 

Finally, we can also consider a slightly different problem. Instead of minimizing the sum of the distances between points inside the cluster of the selected points, we can also minimize the maximum distance between points inside the cluster. The (linear) model can look like:

MINMAX MIP Model
\[\begin{align} \min\> & \color{darkred}z \\ & \color{darkred}z \ge \color{darkred}y_{i,j} \cdot \color{darkblue}{\mathit{dist}}_{i,j} && \forall i \lt j \\ & \color{darkred}y_{i,j} \ge \color{darkred}x_i + \color{darkred}x_j - 1 && \forall i \lt j\\ & \sum_i \color{darkred}x_i = \color{darkblue}m \\ & \color{darkred}x_i \in \{0,1\} \\ &\color{darkred}y_{i,j} \in [0,1] \end{align}\]

We re-use our linearization here.

Finally, I also tried a very simple greedy heuristic:
  1. Select the two points that are closest to each other.
  2. Select a new unselected point that is closest to our already selected points.
  3. Repeat step 2 until we have selected 8 points.
We can expect our optimization models to do a bit better than this simplistic approach. 

In our methods here, we did not make any assumptions about the distance metric being used We just needed the distance matrix. This means you can use Euclidean, Manhattan, or other metrics. In addition, you can use some normalization or weighting before calculating the distances. This may be useful if the features have different units. These models also do not change whether one uses low- or high-dimensional data.

There are alternative models one could envision, such as finding the smallest enclosing sphere or box. These methods do not make things simpler and will likely not improve upon our models. 

Small data set


Let's start with selecting \(m=8\) points from  \(n=50\), using random 2d coordinates.


----     10 PARAMETER coord  coordinates

                  x           y

point1        0.172       0.843
point2        0.550       0.301
point3        0.292       0.224
point4        0.350       0.856
point5        0.067       0.500
point6        0.998       0.579
point7        0.991       0.762
point8        0.131       0.640
point9        0.160       0.250
point10       0.669       0.435
point11       0.360       0.351
point12       0.131       0.150
point13       0.589       0.831
point14       0.231       0.666
point15       0.776       0.304
point16       0.110       0.502
point17       0.160       0.872
point18       0.265       0.286
point19       0.594       0.723
point20       0.628       0.464
point21       0.413       0.118
point22       0.314       0.047
point23       0.339       0.182
point24       0.646       0.561
point25       0.770       0.298
point26       0.661       0.756
point27       0.627       0.284
point28       0.086       0.103
point29       0.641       0.545
point30       0.032       0.792
point31       0.073       0.176
point32       0.526       0.750
point33       0.178       0.034
point34       0.585       0.621
point35       0.389       0.359
point36       0.243       0.246
point37       0.131       0.933
point38       0.380       0.783
point39       0.300       0.125
point40       0.749       0.069
point41       0.202       0.005
point42       0.270       0.500
point43       0.151       0.174
point44       0.331       0.317
point45       0.322       0.964
point46       0.994       0.370
point47       0.373       0.772
point48       0.397       0.913
point49       0.120       0.735
point50       0.055       0.576


The MINSUM MIQP and MIP model gives the same solution, but (as expected) the MINMAX model is slightly different. Here are the results with Cplex:

 
          HEURISTIC        MIQP         MIP      MINMAX

point2        1.000
point3                    1.000       1.000       1.000
point9                                            1.000
point10       1.000
point11                   1.000       1.000
point12                                           1.000
point15       1.000
point18                   1.000       1.000       1.000
point20       1.000
point23                   1.000       1.000       1.000
point24       1.000
point25       1.000
point27       1.000
point29       1.000
point35                   1.000       1.000
point36                   1.000       1.000       1.000
point39                   1.000       1.000       1.000
point43                                           1.000
point44                   1.000       1.000
status                  Optimal     Optimal     Optimal
obj                       3.447       3.447       0.210
sum           4.997       3.447       3.447       3.522
max           0.291       0.250       0.250       0.210
time                     33.937       2.125       0.562


When we plot the results, we see:




Obviously, our optimization models do much better than the heuristic. Two optimization models have quite an overlap in the selected points. To compare the objectives MINSUM and MINMAX we can plot them against each other:



The MINSUM point is delivered by the MIQP and MIP models. Obviously, our heuristic is not doing that great.

Large data set 


Here we select \(m=8\) points from  \(n=100\), using random 12d coordinates. Using Gurobi on a faster machine we see:


----    112 PARAMETER results  

          HEURISTIC        MIQP         MIP      MINMAX

point5        1.000                               1.000
point8                    1.000       1.000
point17       1.000                               1.000
point19                   1.000       1.000
point24                                           1.000
point35                   1.000       1.000
point38                   1.000       1.000
point39                                           1.000
point42       1.000                               1.000
point43                                           1.000
point45       1.000       1.000       1.000
point51       1.000
point56       1.000
point76       1.000
point81                   1.000       1.000
point89                   1.000       1.000
point94       1.000       1.000       1.000       1.000
point97                                           1.000
status                TimeLimit     Optimal     Optimal
obj                      26.230      26.230       1.147
sum          30.731      26.230      26.230      27.621
max           1.586       1.357       1.357       1.147
time                   1015.984      49.000       9.594
gap%                     23.379


The MINSUM MIQP model hit a time limit of 1000 seconds but actually found the optimal solution. It just did not have the time to prove it. The overlap between the MINSUM models and the MINMAX model is very small (just one shared point). I expected more overlap.

Conclusions


The MINMAX model seems to solve faster than the MINSUM versions. This is a little bit surprising: often these max or min objectives are slower than direct sums. It always pays off to do some experiments with different formulations. Slightly different models can show a large difference in solution times. This is a great example of that.

These models are independent of the dimensionality of the data and the distance metric being used.

References


  1. Given a set of points or vectors, find the set of N points that are closest to each other, https://stackoverflow.com/questions/64542442/given-a-set-of-points-or-vectors-find-the-set-of-n-points-that-are-closest-to-e


Appendix: GAMS code


Some interesting features:
  • There are not that many models where I can use xor operators. Here it is used in the Greedy Heuristic where we want to consider points \(i\) and \(j\) where one of them is in the cluster and one outside. 
  • Macros are used to prevent repetitive code in the reporting of results.
  • Acronyms are used in reporting. 


$ontext

 
find cluster of m=8 points closest to each other

 
compare:
      
simple greedy heuristic (HEURISTIC)
      
quadratic model (MIQP)
      
linearized model (MIP)
      
minmax model (MINMAX)

$offtext

set
  c
'coordinates' /x,y/
  i
'points' /point1*point50/
;

scalar m 'number of points to select' /8/;

parameter coord(i,c) 'random coordinates';
coord(i,c) = uniform(0,1);
display coord;

alias(i,j);
set ij(i,j) 'upper triangular structure';
ij(i,j) =
ord(i) < ord(j);

parameter dist(i,j) 'euclidean distances';
dist(ij(i,j)) = sqrt(
sum(c, sqr(coord(i,c)-coord(j,c))));

binary variables x(i) 'selected points';
positive variable y(i,j) 'x(i)*x(j), relaxed to be continuous';
y.up(ij)=1;

variable z 'objective';

equations
    obj1 
'quadratic objective'
    obj2 
'linearized objective'
    obj3 
'minmax objective'
    select
'number of selected points'
    bound 
'implication x(i)=x(j)=1 ==> y(i,j)=1'
;

obj1.. z =e=
sum(ij(i,j), x(i)*x(j)*dist(ij));
obj2.. z =e=
sum(ij, y(ij)*dist(ij));
obj3(ij).. z =g= y(ij)*dist(ij);
select..
sum(i, x(i)) =e= m;
bound(ij(i,j))..  y(i,j) =g= x(i) + x(j) - 1;

model m1 /obj1,select/;
model m2 /obj2,select,bound/;
model m3 /obj3,select,bound/;
option optcr=0,mip=cplex,miqcp=cplex,threads=8;

*------------------------------------------------
* reporting macros
*------------------------------------------------

parameter results(*,*);

set dummy 'ordering for output' /  status, obj, sum, max, time, 'gap%' /;

acronym TimeLimit;
acronym Optimal;

* macros for reporting
$macro sumdist    sum(ij(i,j), x.l(i)*x.l(j)*dist(ij))
$macro maxdist    smax(ij(i,j), x.l(i)*x.l(j)*dist(ij))
$macro report(m,label)  \
    x.l(i) = round(x.l(i));  \
    results(i,label) = x.l(i); \
    results(
'status',label)$(m.solvestat=1) = Optimal; \
    results(
'status',label)$(m.solvestat=3) = TimeLimit; \
    results(
'obj',label) = z.l; \
    results(
'sum',label) = sumdist; \
    results(
'max',label) = maxdist; \
    results(
'time',label) = m.resusd; \
    results(
'gap%',label)$(m.solvestat=3) = 100*abs(m.objest - m.objval)/abs(m.objest);


*------------------------------------------------
* heuristic
*------------------------------------------------


* step 1 : select points 1 and 2 by minimizing distance
scalar dmin,dmin1,dmin2;
dmin =
smin(ij,dist(ij));
loop(ij(i,j)$(dist(ij)=dmin),
   x.l(i)=1;
   x.l(j)=1;
  
break;
);

* add points 3..m, closest to points selected earlier
scalar k;
for(k = 3 to m,
   dmin =
smin(ij(i,j)$(x.l(i) xor x.l(j)), dist(ij));
  
loop(ij(i,j)$(dist(ij)=dmin),
     x.l(i)=1;
     x.l(j)=1;
    
break;
   );
);

results(i,
'HEURISTIC') = x.l(i);
results(
'sum','HEURISTIC') = sumdist;
results(
'max','HEURISTIC') = maxdist;


*------------------------------------------------
* run models m1 through m3
*------------------------------------------------

solve m1 minimizing z using miqcp;
report(m1,
'MIQP')

solve m2 minimizing z using mip;
report(m2,
'MIP')

solve m3 minimizing z using mip;
report(m3,
'MINMAX')

display results;


Monday, October 26, 2020

Solution methods for linear bilevel problems

For some classes of problems, it is a strategy to form explicit KKT optimality conditions and convert the resulting complementarity conditions to big-M constraints. In case the original problem was linear (or in some cases quadratic), the resulting model is a MIP model. One such application is solving non-convex QP models [1]. Another application is a bilevel LP, where we reformulate the inner problem by stating the optimality conditions for the inner-problem. In [2], such a problem is given: 



bilevel LP
\[\begin{align}\max\> &\color{darkred}z = \color{darkred}x+\color{darkred}y \\ & \color{darkred}x \in [0,2] \\ & \min \>\color{darkred}y\\ & \qquad \color{darkred}y \ge 0 \perp \color{darkred} \lambda_1 \\ & \qquad \color{darkred}x - 0.01\color{darkred}y \le 1 \perp \color{darkred}\lambda_2 \end{align} \]

The notation \(\perp\) indicates the name of the dual variable for the inner constraints. After the inner problem is converted to optimality conditions, we have:

Inner LP as optimality conditions
\[\begin{align}\max\> & \color{darkred}z = \color{darkred}x+\color{darkred}y \\ & \color{darkred}x \in [0,2]\\ & 1 - \color{darkred}\lambda_1 - 0.01 \color{darkred}\lambda_2 = 0 \\ & \color{darkred}y \ge 0 \\ & \color{darkred}x - 0.01\color{darkred}y \le 1\\ & \color{darkred}\lambda_1 \cdot \color{darkred}y = 0 \\ & \color{darkred}\lambda_2 \cdot \left( \color{darkred}x -0.01\color{darkred}y - 1 \right) = 0 \\ & \color{darkred}\lambda_1,\color{darkred}\lambda_2\ge 0 \end{align} \]

The products can be interpreted as an "or" condition. These can be linearized using binary variables and big-M constraints:

Linearized problem
\[\begin{align}\max\> & \color{darkred}z = \color{darkred}x+\color{darkred}y \\ & \color{darkred}x \in [0,2]\\ & 1 - \color{darkred}\lambda_1 - 0.01 \color{darkred}\lambda_2 = 0 \\ & \color{darkred}y \ge 0 \\ & \color{darkred}x - 0.01\color{darkred}y \le 1\\ & \color{darkred}\lambda_1 \le \color{darkblue} M_1^d \color{darkred}\delta_1 \\ & \color{darkred}y \le \color{darkblue} M_1^p (1-\color{darkred}\delta_1) \\ & \color{darkred}\lambda_2 \le \color{darkblue} M_2^d \color{darkred}\delta_2 \\ & -\color{darkred}x +0.01\color{darkred}y + 1 \le \color{darkblue} M_2^p (1-\color{darkred}\delta_2) \\ & \color{darkred}\lambda_1,\color{darkred}\lambda_2\ge 0 \\ & \color{darkred}\delta_1,\color{darkred}\delta_2 \in \{0,1\} \end{align} \]

The letters \(p\) and \(d\) in the big-M constants indicate primal and dual.  Note that we flipped the sign of the second constraint of the inner problem. This was to make \(\color{darkred}\lambda_2\) a non-negative variable.

Choosing big-M values


One big issue is: choose appropriate values for the big-M constants. The values \(\color{darkblue} M_1^d,\color{darkblue} M_1^p,\color{darkblue} M_2^d,\color{darkblue} M_2^p = 200\) are valid bounds on \(\color{darkred}\lambda_1, \color{darkred}y, \color{darkred}\lambda_2, -\color{darkred}x +0.01\color{darkred}y + 1\). This gives the solution:

                           LOWER          LEVEL          UPPER

---- VAR x                   .             2.0000         2.0000      
---- VAR y                   .           100.0000        +INF         
---- VAR lambda1             .              .            +INF         
---- VAR lambda2             .           100.0000        +INF         
---- VAR delta1              .              .             1.0000      
---- VAR delta2              .             1.0000         1.0000      
---- VAR z                 -INF          102.0000        +INF         


The problem of choosing the right big M values is important. Choosing values that are too small, can lead to cutting off optimal values. Big-M values that are too large have other problems such as numerical issues and trickle flow. In [2] it is argued that many modelers use the following algorithm to detect if a big-M value is too small:

  1. Select initial values for \( \color{darkblue} M_j^d,  \color{darkblue} M_j^p \).
  2. Solve the linearized MIP model.
  3. Look for binding big-M constraints that should be inactivated. If not found, stop: we have found the optimal solution.
  4. Increase the corresponding big-M value.
  5. Go to step 2.

Unfortunately, this is algorithm is not guaranteed to work. [2] gives an example: use as starting values \(\color{darkblue} M_j^d = 50, \color{darkblue} M_j^p = 200\). The solution looks like:


                           LOWER          LEVEL          UPPER

---- VAR x                   .             2.0000         2.0000      
---- VAR y                   .              .            +INF         
---- VAR lambda1             .             1.0000        +INF         
---- VAR lambda2             .              .            +INF         
---- VAR delta1              .             1.0000         1.0000      
---- VAR delta2              .             1.0000         1.0000      
---- VAR z                 -INF            2.0000        +INF         

The values \(\color{darkred}\delta_j=1\) indicate we should look at the constraints: \(\color{darkred}\lambda_j \le \color{darkblue}M_j^d\). As \(\color{darkred}\lambda_1=1, \color{darkred}\lambda_2=0\) these are not binding. Hence, the erroneous conclusion is that this solution is optimal. 

Our conclusion is: we cannot reliably detect that big-M values are chosen too small by searching for binding constraints.

Alternatives


In [3], it is argued that SOS1 (special ordered sets of type 1) variables should be used instead of binary variables. For our example, this means we would replace our big-M constraints by: 


SOS1 representation
\[\begin{align} & \color{darkred}\lambda_1, \color{darkred}y \in SOS1 \\ & \color{darkred}\lambda_2, \color{darkred}s \in SOS1 \\ &\color{darkred}s = -\color{darkred}x+0.01\color{darkred}y+1 \\ & \color{darkred}s \ge 0 \end{align} \]


Many MIP solvers (but not all) support SOS1 variables. Binary variables can sometimes be faster, but SOS1 variables have the advantage that no big-M constants are needed. The same can be said for indicator constraints. A formulation using indicator constraints can look like:
 

Indicator constraints
\[\begin{align} & \color{darkred} \delta_1 = 0 \Rightarrow \color{darkred}\lambda_1 =0 \\ & \color{darkred} \delta_1 = 1\Rightarrow \color{darkred}y = 0 \\ & \color{darkred} \delta_2 = 0 \Rightarrow \color{darkred}\lambda_2 = 0 \\ & \color{darkred} \delta_2 = 1 \Rightarrow \color{darkred}x-0.01\color{darkred}y = 1 \\ & \color{darkred}x-0.01\color{darkred}y \le 1 \\ & \color{darkred} \delta_1,\color{darkred}\delta_2 \in \{0,1\} \end{align} \]


We can also solve the quadratic model directly using global solvers like Antigone, Baron, Couenne, or Gurobi. Sometimes reasonable bounds are needed to help globals solvers.

It is noted that the GAMS EMP tool can generate the quadratic model for you. I still had to specify to use a global solver for it to find the optimal solution.

Conclusion


Instead of using big-M constraints to handle complementarity conditions in linear bi-level optimization problems, there are a few alternatives:
  • SOS1 variables
  • Indicator constraints
  • Global solvers for non-convex quadratic problems

References


  1. Solving non-convex QP problems as a MIP,  https://yetanothermathprogrammingconsultant.blogspot.com/2016/06/solving-non-convex-qp-problems-as-mip.html 
  2. Salvador Pineda and Juan Miguel Morales, Solving Linear Bilevel Problems Using Big-Ms: Not All That Glitters Is Gold, September 2018,  https://arxiv.org/pdf/1809.10448.pdf
  3. Thomas Kleinert and Martin Schmidt, Why there is no need to use a big-M in linear bilevel optimization: A computational study of two ready-to-use approaches, October 2020,  http://www.optimization-online.org/DB_FILE/2020/10/8065.pdf 

Sunday, October 18, 2020

PuLP mystery



PuLP is a popular Python-based modeling tool for LP and MIP models. In [1], a user asked a (somewhat trivial) question about PuLP syntax. But there was an interesting wrinkle in the model that was unfamiliar to me. The basic issue is that sometimes PuLP accepts a NumPy array in expressions:
 
import pulp as p
import numpy as np

a=np.array([1,2,3])

x=p.LpVariable('x')
y=p.LpVariable('y')

prob = p.LpProblem("test",p.LpMaximize)
prob += x+(a-y)
prob

This fragment creates an objective. But we have a strange term here. a-y is funny because a is a NumPy array (vector) and y is a scalar PuLP variable. Usually, adding a scalar to a vector is interpreted as elementwise addition: \[\begin{pmatrix} a_0 - y \\ a_1 - y \\ a_2-y \end{pmatrix}\] How one would interpret an objective like: \[ x + \begin{pmatrix} a_0 - y \\ a_1 - y \\ a_2-y \end{pmatrix}\] is not clear to me. I would say this is an error. However, PuLP accepts this, and prints the model as:


test:
MAXIMIZE
1*x + -3*y + 6
VARIABLES
x free Continuous
y free Continuous

So, apparently the interpretation is: \[ x + \sum_i (a_i-y)=x + \sum_i a_i - n\cdot y\] where \(n\) is the length of vector \(a\). But then again, we would expect PuLP to accept
 
  prob += (a-y)

as objective. However, this produces the error message: 

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-bf3a3a41e9da> in <module>()
      8 
      9 prob = p.LpProblem("test",p.LpMaximize)
---> 10 prob += (a-y)
     11 prob

/usr/local/lib/python3.6/dist-packages/pulp/pulp.py in __iadd__(self, other)
   1528             self.objective.name = name
   1529         else:
-> 1530             raise TypeError("Can only add LpConstraintVar, LpConstraint, LpAffineExpression or True objects")
   1531         return self
   1532 

TypeError: Can only add LpConstraintVar, LpConstraint, LpAffineExpression or True objects


I have no idea what is going on here. It could be all just a bug, but even that can not easily explain the behavior of sometimes accepting (a-y) and sometimes refusing it.

The underlying problem can actually be traced back to operator overloading as shown by [2]. Who owns the + or -? The following experiment demonstrates the issue in stark terms:


>>> a+x
array([1*x + 1, 1*x + 2, 1*x + 3], dtype=object)
>>> x+a
1*x + 6

In the first case, NumPy handles the +, leading to the interpretation: \[\begin{pmatrix} a_0  \\ a_1  \\ a_2 \end{pmatrix} + x = \begin{pmatrix} x+a_0  \\ x+a_1  \\x+a_2  \end{pmatrix}\] In the second case, the + is dealt with by PuLP. This gives \[x+\begin{pmatrix} a_0  \\ a_1  \\ a_2 \end{pmatrix}=x+\sum_i a_i\]  

Conclusions


This is mind-bending (a.k.a. crazy). It would be better if things were a bit more predictable ("orthogonal" is the term used in programming language design). Preferably, we don't want to spend time on figuring out how to interpret a + or a -

Advise: it is better not to mix PuLP with NumPy.

References


Thursday, October 8, 2020

Maximum Correlation, Global MINLP vs GA

In [1], the following question is posed:


Suppose we have data like:

worker weight height weight2 height2
1      120    60     125     60
2      152    55     156     66
3      222    55     100     20

We can for each case pick (weight,height) or (weight2,height2).  The problem is to pick one of these observations for each row such that the correlation between weight and height is maximized.


Data 


First, let's invent some data that we can use for some experiments. 

----     27 PARAMETER data  

        height1     weight1     height2     weight2

i1    67.433285  168.262871   67.445523  163.692389
i2    70.638374  174.437750   68.649190  160.084811
i3    71.317794  159.909672   69.503911  164.720010
i4    59.850261  145.704159   61.175728  142.708300
i5    65.341938  155.586984   68.483909  165.564991
i6    64.142009  154.335001   68.568683  166.169507
i7    67.030368  158.768813   65.780803  153.721717
i8    73.672863  175.126951   73.236515  164.704340
i9    65.203516  157.593587   63.279277  149.784500
i10   69.001848  160.063428   68.786656  162.278007
i11   64.455422  159.039195   63.930208  152.827710
i12   70.719334  164.885704   69.666096  157.356595
i13   65.688428  151.223468   63.614565  150.071072
i14   66.569252  160.978671   70.533320  160.722483
i15   78.417676  172.298652   80.070076  172.695207
i16   65.396154  158.234709   67.404942  158.310596
i17   62.504967  150.899428   61.000439  154.094647
i18   62.122630  150.024298   63.634554  153.644324
i19   70.598400  165.086523   72.999194  166.771223
i20   74.935107  170.820610   76.622182  169.013550
i21   63.233956  154.331546   60.372876  149.152520
i22   72.550105  173.961915   76.748649  167.462369
i23   74.086553  168.190867   75.433331  171.773607
i24   65.379648  163.577697   65.717553  160.134888
i25   64.003038  155.357607   67.301426  158.713710


The optimal solution that has the highest correlation coefficient between height and weight is:


----     92 PARAMETER result  optimal selected observations

        height1     weight1     height2     weight2

i1                            67.445523  163.692389
i2                            68.649190  160.084811
i3                            69.503911  164.720010
i4    59.850261  145.704159
i5    65.341938  155.586984
i6    64.142009  154.335001
i7    67.030368  158.768813
i8                            73.236515  164.704340
i9                            63.279277  149.784500
i10   69.001848  160.063428
i11                           63.930208  152.827710
i12   70.719334  164.885704
i13                           63.614565  150.071072
i14                           70.533320  160.722483
i15   78.417676  172.298652
i16                           67.404942  158.310596
i17   62.504967  150.899428
i18   62.122630  150.024298
i19                           72.999194  166.771223
i20   74.935107  170.820610
i21                           60.372876  149.152520
i22                           76.748649  167.462369
i23                           75.433331  171.773607
i24                           65.717553  160.134888
i25                           67.301426  158.713710

The impact of being allowed to choose between observations (height1,weight1) and (height2,weight2) is significant:
 
Correlation coefficients
height1-weight10.868691
height2-weight20.894532
optimal0.956452


Below we shall see how we arrive at this conclusion.

MINLP Model


A high-level Mixed-Integer Non-Linear Programming model is simply:


MINLP Model
\[ \begin{align}\max\> &\mathbf{cor}(\color{darkred}h,\color{darkred}w) \\ & \color{darkred}h_i = \color{darkblue}{\mathit{height1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{height2}}_i\cdot\color{darkred}x_i\\ & \color{darkred}w_i = \color{darkblue}{\mathit{weight1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{weight2}}_i\cdot\color{darkred}x_i \\ & \color{darkred}x_i \in \{0,1\} \end{align}\]


Here \(\mathbf{cor}(h,w)\) indicates the (Pearson) correlation between vectors \(h\) and \(w\). It is noted that height and weight are positively correlated, so maximizing makes sense.  Of course, GAMS does not know about correlations, so we implement this model as:

 
Implementation of MINLP Model
\[ \begin{align}\max\> & \color{darkred}z = \frac{\displaystyle\sum_i (\color{darkred}h_i-\bar{\color{darkred}h})(\color{darkred}w_i-\bar{\color{darkred}w})}{\sqrt{\displaystyle\sum_i(\color{darkred}h_i-\bar{\color{darkred}h})^2}\cdot \sqrt{\displaystyle\sum_i(\color{darkred}w_i-\bar{\color{darkred}w})^2}} \\ & \color{darkred}h_i = \color{darkblue}{\mathit{height1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{height2}}_i\cdot\color{darkred}x_i \\ & \color{darkred}w_i = \color{darkblue}{\mathit{weight1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{weight2}}_i\cdot\color{darkred}x_i \\ & \bar{\color{darkred}h} = \frac{1}{n}\displaystyle\sum_i \color{darkred}h_i \\ & \bar{\color{darkred}w} = \frac{1}{n}\displaystyle\sum_i \color{darkred}w_i \\ & \color{darkred}x_i \in \{0,1\} \end{align}\]



We need to watch when using divisions. I typically like to reformulate divisions by multiplications: \[\begin{align} \max \>&  \color{darkred}z\\ &\color{darkred}z\cdot \sqrt{\displaystyle\sum_i(\color{darkred}h_i-\bar{\color{darkred}h})^2}\cdot \sqrt{\displaystyle\sum_i(\color{darkred}w_i-\bar{\color{darkred}w})^2} = \displaystyle\sum_i (\color{darkred}h_i-\bar{\color{darkred}h})(\color{darkred}w_i-\bar{\color{darkred}w})\end{align}\]

The advantage of this formulation is in the protection against division by zero. A disadvantage is that non-linearities are moved to the constraints. Many NLP solvers are happier when constraints are linear, and only the objective is non-linear. My answer: experiment with formulations.


When we solve this model with GAMS/Baron, we see:


----     79 VARIABLE x.L  select 1 or 2

i1  1.000000,    i2  1.000000,    i3  1.000000,    i8  1.000000,    i9  1.000000,    i11 1.000000,    i13 1.000000
i14 1.000000,    i16 1.000000,    i19 1.000000,    i21 1.000000,    i22 1.000000,    i23 1.000000,    i24 1.000000
i25 1.000000


----     79 VARIABLE z.L                   =     0.956452  objective

----     83 PARAMETER corr  

all1    0.868691,    all2    0.894532,    optimal 0.956452


The parameter corr shows different correlations:
  • When all \(x_i=0\), that is when we compare height1 vs weight1.
  • When all \(x_i=1\), so we compare  height2 vs weight2.
  • When using an optimal solution for \(x\). 

Nonconvex MIQCP

 
Gurobi can solve non-convex quadratic models quite efficiently. To try this out, I reformulated the MINLP model into a Mixed-Integer Quadratically-Constrained Programming model. We need to add some variables and constraints to make this happen. 


Non-convex MIQCP Model
\[ \begin{align}\max\> & \color{darkred}z\\ & \color{darkred}z\cdot \color{darkred}{\mathit{denom}} = \sum_i (\color{darkred}h_i-\bar{\color{darkred}h})(\color{darkred}w_i-\bar{\color{darkred}w})\\ & \color{darkred}s^2_h = \sum_i(\color{darkred}h_i-\bar{\color{darkred}h})^2 \\ & \color{darkred}s^2_w = \sum_i(\color{darkred}w_i-\bar{\color{darkred}w})^2 \\ & \color{darkred}{\mathit{denom}} = \color{darkred}s_h \cdot \color{darkred}s_w\\ & \color{darkred}h_i = \color{darkblue}{\mathit{height1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{height2}}_i\cdot\color{darkred}x_i \\ & \color{darkred}w_i = \color{darkblue}{\mathit{weight1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{weight2}}_i\cdot\color{darkred}x_i \\ & \bar{\color{darkred}h} = \frac{1}{\color{darkblue}n}\displaystyle\sum_i \color{darkred}h_i \\ & \bar{\color{darkred}w} = \frac{1}{\color{darkblue}n}\displaystyle\sum_i \color{darkred}w_i \\ & \color{darkred}x_i \in \{0,1\} \\ & \color{darkred}s_h, \color{darkred}s_w \ge 0 \end{align}\]


The final model is still quite small, but I encountered severe problems:

Gurobi Optimizer version 9.0.1 build v9.0.1rc0 (win64)
Optimize a model with 52 rows, 81 columns and 152 nonzeros
Model fingerprint: 0xc1d6aa23
Model has 4 quadratic constraints
Variable types: 56 continuous, 25 integer (25 binary)
Coefficient statistics:
  Matrix range     [1e-02, 1e+01]
  QMatrix range    [1e+00, 2e+01]
  QLMatrix range   [1e+00, 1e+00]
  Objective range  [1e+00, 1e+00]
  Bounds range     [1e+00, 2e+02]
  RHS range        [6e+01, 2e+02]
Presolve removed 50 rows and 50 columns
Presolve time: 0.02s
Presolved: 179 rows, 88 columns, 637 nonzeros
Presolved model has 7 bilinear constraint(s)
Variable types: 63 continuous, 25 integer (25 binary)

Root relaxation: unbounded, 0 iterations, 0.00 seconds

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     2  postponed    0               -          -      -     -    0s
 50873 43644  postponed  872               -          -      -   0.0    5s
 124683 118126  postponed  805               -          -      -   0.0   10s
 196009 189074  postponed 1409               -          -      -   0.0   15s
 268890 261269  postponed 1771               -          -      -   0.0   20s
 341602 333841  postponed 2653               -          -      -   0.0   25s
 416698 408963  postponed 2652               -          -      -   0.0   30s
 491794 484945  postponed 2764               -          -      -   0.0   35s
 566890 559155  postponed 3533               -          -      -   0.0   40s
 641986 634407  postponed 4413               -          -      -   0.0   45s
 717082 709397  postponed 5294               -          -      -   0.0   50s
 792178 784445  postponed 5296               -          -      -   0.0   55s
 867274 859517  postponed 5295               -          -      -   0.0   60s
 942370 934643  postponed 6177               -          -      -   0.0   65s
 1016274 1009505  postponed 6177               -          -      -   0.0   70s
 1091370 1083743  postponed 7052               -          -      -   0.0   75s
 1166466 1158805  postponed 7939               -          -      -   0.0   80s
 1241562 1233913  postponed 7938               -          -      -   0.0   85s
 1316658 1308953  postponed 7939               -          -      -   0.0   90s
 1391754 1384013  postponed 8820               -          -      -   0.0   95s
 1465658 1457969  postponed 8820               -          -      -   0.0  100s
. . .

This goes on like this for much longer: no feasible solution is found. This looks very bad. Very strange, because this MIQCP model can be solved without change using solvers like Baron, Antigone, and Couenne. Here is a clue:
 
     Root relaxation: unbounded 
 
This unbounded relaxation is the cause of the problem. The relaxation Gurobi solves is not a non-convex QCP model (that would not be unbounded), but rather something even more "relaxed".  Luckily, we know a good bound on the objective. Correlation coefficients are always between \(-1\) and \(+1\). This means we can add the bound: \(z \le 1\). After adding this, we see:


Root relaxation: objective 1.000000e+00, 65 iterations, 0.00 seconds

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0    1.00000    0    7          -    1.00000      -     -    0s
     0     0    1.00000    0    5          -    1.00000      -     -    0s
     0     0    1.00000    0    5          -    1.00000      -     -    0s
     0     0    1.00000    0    5          -    1.00000      -     -    0s
     0     2    1.00000    0    6          -    1.00000      -     -    0s
*  324   297              44       0.9195020    1.00000  8.75%   7.7    0s
*  416   297              42       0.9400483    1.00000  6.38%   7.5    0s
*  556   511              28       0.9409507    1.00000  6.28%   7.6    0s
*  598   511              27       0.9457779    1.00000  5.73%   7.5    0s
*  603   511              28       0.9473171    1.00000  5.56%   7.4    0s
*  643   511              29       0.9483207    1.00000  5.45%   7.2    0s
* 1195   620              33       0.9494881    1.00000  5.32%   7.0    0s
*10829  2567              40       0.9499222    1.00000  5.27%   7.7    0s
*12064  2834              41       0.9526072    1.00000  4.98%   7.6    0s
*13333  3393              39       0.9530905    1.00000  4.92%   7.4    0s
*13341  3392              39       0.9531092    1.00000  4.92%   7.4    0s
H27924  7927                       0.9539760    1.00000  4.82%   6.4    1s
*52593 15598              41       0.9543576    1.00000  4.78%   5.9    1s
H53294 15799                       0.9548335    1.00000  4.73%   5.9    2s
H96539 25687                       0.9564515    0.99711  4.25%   5.9    3s
 155894 36175     cutoff   33         0.95645    0.99403  3.93%   5.2    5s
 335621 57101    0.98359   32   10    0.95645    0.98804  3.30%   4.7   10s
 518727 68913    0.95912   31   13    0.95645    0.98354  2.83%   4.5   15s
 703811 73266     cutoff   31         0.95645    0.97999  2.46%   4.3   20s
 890371 75019     cutoff   33         0.95645    0.97695  2.14%   4.3   25s
 1082662 73850     cutoff   37         0.95645    0.97425  1.86%   4.2   30s
 1270873 69325    0.96318   34    9    0.95645    0.97177  1.60%   4.1   35s
 1451055 61577     cutoff   37         0.95645    0.96956  1.37%   4.1   40s
 1628515 49114     cutoff   35         0.95645    0.96730  1.13%   4.0   45s
 1800508 30288    0.95954   35    9    0.95645    0.96460  0.85%   4.0   50s

Cutting planes:
  Gomory: 1
  MIR: 2
  Flow cover: 3
  RLT: 5

Explored 1916074 nodes (7618484 simplex iterations) in 53.76 seconds
Thread count was 8 (of 16 available processors)

Solution count 10: 0.956452 0.954834 0.954358 ... 0.948321
No other solutions better than 0.956452


This saved Gurobi! Now we see that Gurobi finds the optimal solution with a correlation coefficient of 0.956452.

Another possible improvement is to introduce mean-adjusted values (this is a linear operation). This will make the quadratic constraints a bit simpler, at the expense of extra variables and (linear) constraints.



Mean-adjusted non-convex MIQCP Model
\[ \begin{align}\max\> & \color{darkred}z\\ & \color{darkred}z\cdot \color{darkred}{\mathit{denom}} = \sum_i  \tilde{\color{darkred}h_i} \cdot \tilde{\color{darkred}w_i}\\ & \color{darkred}s^2_h = \sum_i \tilde{\color{darkred}h}^2_i \\ & \color{darkred}s^2_w = \sum_i \tilde{\color{darkred}w}^2_i \\ & \color{darkred}{\mathit{denom}} = \color{darkred}s_h \cdot \color{darkred}s_w\\ & \color{darkred}h_i = \color{darkblue}{\mathit{height1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{height2}}_i\cdot\color{darkred}x_i \\ & \color{darkred}w_i = \color{darkblue}{\mathit{weight1}}_i\cdot(1-\color{darkred}x_i)+ \color{darkblue}{\mathit{weight2}}_i\cdot\color{darkred}x_i \\ & \bar{\color{darkred}h} = \frac{1}{\color{darkblue}n}\displaystyle\sum_i \color{darkred}h_i \\ & \bar{\color{darkred}w} = \frac{1}{\color{darkblue}n}\displaystyle\sum_i \color{darkred}w_i \\ & \tilde{\color{darkred}h_i}= \color{darkred}h_i - \bar{\color{darkred}h}\\ & \tilde{\color{darkred}w_i} = \color{darkred}w_i - \bar{\color{darkred}w}  \\ & \color{darkred}x_i \in \{0,1\} \\ & \color{darkred}s_h, \color{darkred}s_w \ge 0 \end{align}\]

This formulation does not need the bound \(z\le 1\), although it still performs better with it.

Genetic Algorithm


It is interesting to see how a simple meta-heuristic would do on this problem. Here are the results using the ga function from the GA package [2]. This was actually quite easy to implement.


> df <- read.table(text="
+ id      height1     weight1     height2     weight2
+ i1    67.433285  168.262871   67.445523  163.692389
+ i2    70.638374  174.437750   68.649190  160.084811
+ i3    71.317794  159.909672   69.503911  164.720010
+ i4    59.850261  145.704159   61.175728  142.708300
+ i5    65.341938  155.586984   68.483909  165.564991
+ i6    64.142009  154.335001   68.568683  166.169507
+ i7    67.030368  158.768813   65.780803  153.721717
+ i8    73.672863  175.126951   73.236515  164.704340
+ i9    65.203516  157.593587   63.279277  149.784500
+ i10   69.001848  160.063428   68.786656  162.278007
+ i11   64.455422  159.039195   63.930208  152.827710
+ i12   70.719334  164.885704   69.666096  157.356595
+ i13   65.688428  151.223468   63.614565  150.071072
+ i14   66.569252  160.978671   70.533320  160.722483
+ i15   78.417676  172.298652   80.070076  172.695207
+ i16   65.396154  158.234709   67.404942  158.310596
+ i17   62.504967  150.899428   61.000439  154.094647
+ i18   62.122630  150.024298   63.634554  153.644324
+ i19   70.598400  165.086523   72.999194  166.771223
+ i20   74.935107  170.820610   76.622182  169.013550
+ i21   63.233956  154.331546   60.372876  149.152520
+ i22   72.550105  173.961915   76.748649  167.462369
+ i23   74.086553  168.190867   75.433331  171.773607
+ i24   65.379648  163.577697   65.717553  160.134888
+ i25   64.003038  155.357607   67.301426  158.713710
+ ", header=T)
> 
> #
> # print obvious cases 
> #
> cor(df$weight1,df$height1)
[1] 0.8686908
> cor(df$weight2,df$height2)
[1] 0.894532
> 
> #
> # fitness function
> #
> f <- function(x) {
+   w <- df$weight1*(1-x) + df$weight2*x
+   h <- df$height1*(1-x) + df$height2*x
+   cor(w,h) 
+ }
> 
> library(GA)
> res <- ga(type=c("binary"),fitness=f,nBits=25,seed=123)
GA | iter = 1 | Mean = 0.8709318 | Best = 0.9237155
GA | iter = 2 | Mean = 0.8742004 | Best = 0.9237155
GA | iter = 3 | Mean = 0.8736450 | Best = 0.9237155
GA | iter = 4 | Mean = 0.8742228 | Best = 0.9384788
GA | iter = 5 | Mean = 0.8746517 | Best = 0.9384788
GA | iter = 6 | Mean = 0.8792048 | Best = 0.9486227
GA | iter = 7 | Mean = 0.8844841 | Best = 0.9486227
GA | iter = 8 | Mean = 0.8816874 | Best = 0.9486227
GA | iter = 9 | Mean = 0.8805522 | Best = 0.9486227
GA | iter = 10 | Mean = 0.8820974 | Best = 0.9486227
GA | iter = 11 | Mean = 0.8859074 | Best = 0.9486227
GA | iter = 12 | Mean = 0.8956467 | Best = 0.9486227
GA | iter = 13 | Mean = 0.8989140 | Best = 0.9486227
GA | iter = 14 | Mean = 0.9069327 | Best = 0.9486227
GA | iter = 15 | Mean = 0.9078787 | Best = 0.9486227
GA | iter = 16 | Mean = 0.9069163 | Best = 0.9489443
GA | iter = 17 | Mean = 0.9104712 | Best = 0.9489443
GA | iter = 18 | Mean = 0.9169900 | Best = 0.9489443
GA | iter = 19 | Mean = 0.9175285 | Best = 0.9489443
GA | iter = 20 | Mean = 0.9207076 | Best = 0.9489443
GA | iter = 21 | Mean = 0.9210288 | Best = 0.9489443
GA | iter = 22 | Mean = 0.9206928 | Best = 0.9489443
GA | iter = 23 | Mean = 0.9210399 | Best = 0.9489443
GA | iter = 24 | Mean = 0.9208985 | Best = 0.9489443
GA | iter = 25 | Mean = 0.9183778 | Best = 0.9511446
GA | iter = 26 | Mean = 0.9217391 | Best = 0.9511446
GA | iter = 27 | Mean = 0.9274271 | Best = 0.9522764
GA | iter = 28 | Mean = 0.9271156 | Best = 0.9522764
GA | iter = 29 | Mean = 0.9275347 | Best = 0.9522764
GA | iter = 30 | Mean = 0.9278315 | Best = 0.9522764
GA | iter = 31 | Mean = 0.9300289 | Best = 0.9522764
GA | iter = 32 | Mean = 0.9306409 | Best = 0.9528777
GA | iter = 33 | Mean = 0.9309087 | Best = 0.9528777
GA | iter = 34 | Mean = 0.9327691 | Best = 0.9528777
GA | iter = 35 | Mean = 0.9309344 | Best = 0.9549574
GA | iter = 36 | Mean = 0.9341977 | Best = 0.9549574
GA | iter = 37 | Mean = 0.9374437 | Best = 0.9559043
GA | iter = 38 | Mean = 0.9394410 | Best = 0.9559043
GA | iter = 39 | Mean = 0.9405482 | Best = 0.9559043
GA | iter = 40 | Mean = 0.9432749 | Best = 0.9564515
GA | iter = 41 | Mean = 0.9441814 | Best = 0.9564515
GA | iter = 42 | Mean = 0.9458232 | Best = 0.9564515
GA | iter = 43 | Mean = 0.9469625 | Best = 0.9564515
GA | iter = 44 | Mean = 0.9462313 | Best = 0.9564515
GA | iter = 45 | Mean = 0.9449716 | Best = 0.9564515
GA | iter = 46 | Mean = 0.9444071 | Best = 0.9564515
GA | iter = 47 | Mean = 0.9437149 | Best = 0.9564515
GA | iter = 48 | Mean = 0.9446355 | Best = 0.9564515
GA | iter = 49 | Mean = 0.9455424 | Best = 0.9564515
GA | iter = 50 | Mean = 0.9456497 | Best = 0.9564515
GA | iter = 51 | Mean = 0.9461382 | Best = 0.9564515
GA | iter = 52 | Mean = 0.9444960 | Best = 0.9564515
GA | iter = 53 | Mean = 0.9434671 | Best = 0.9564515
GA | iter = 54 | Mean = 0.9451851 | Best = 0.9564515
GA | iter = 55 | Mean = 0.9481903 | Best = 0.9564515
GA | iter = 56 | Mean = 0.9477778 | Best = 0.9564515
GA | iter = 57 | Mean = 0.9481829 | Best = 0.9564515
GA | iter = 58 | Mean = 0.9490952 | Best = 0.9564515
GA | iter = 59 | Mean = 0.9505670 | Best = 0.9564515
GA | iter = 60 | Mean = 0.9499329 | Best = 0.9564515
GA | iter = 61 | Mean = 0.9509299 | Best = 0.9564515
GA | iter = 62 | Mean = 0.9505341 | Best = 0.9564515
GA | iter = 63 | Mean = 0.9519624 | Best = 0.9564515
GA | iter = 64 | Mean = 0.9518618 | Best = 0.9564515
GA | iter = 65 | Mean = 0.9523598 | Best = 0.9564515
GA | iter = 66 | Mean = 0.9516766 | Best = 0.9564515
GA | iter = 67 | Mean = 0.9521926 | Best = 0.9564515
GA | iter = 68 | Mean = 0.9524419 | Best = 0.9564515
GA | iter = 69 | Mean = 0.9532865 | Best = 0.9564515
GA | iter = 70 | Mean = 0.9535871 | Best = 0.9564515
GA | iter = 71 | Mean = 0.9536049 | Best = 0.9564515
GA | iter = 72 | Mean = 0.9534035 | Best = 0.9564515
GA | iter = 73 | Mean = 0.9532859 | Best = 0.9564515
GA | iter = 74 | Mean = 0.9521064 | Best = 0.9564515
GA | iter = 75 | Mean = 0.9534997 | Best = 0.9564515
GA | iter = 76 | Mean = 0.9539987 | Best = 0.9564515
GA | iter = 77 | Mean = 0.9536670 | Best = 0.9564515
GA | iter = 78 | Mean = 0.9526224 | Best = 0.9564515
GA | iter = 79 | Mean = 0.9531871 | Best = 0.9564515
GA | iter = 80 | Mean = 0.9527495 | Best = 0.9564515
GA | iter = 81 | Mean = 0.9526061 | Best = 0.9564515
GA | iter = 82 | Mean = 0.9525577 | Best = 0.9564515
GA | iter = 83 | Mean = 0.9525084 | Best = 0.9564515
GA | iter = 84 | Mean = 0.9519052 | Best = 0.9564515
GA | iter = 85 | Mean = 0.9518549 | Best = 0.9564515
GA | iter = 86 | Mean = 0.9511299 | Best = 0.9564515
GA | iter = 87 | Mean = 0.9505129 | Best = 0.9564515
GA | iter = 88 | Mean = 0.9518203 | Best = 0.9564515
GA | iter = 89 | Mean = 0.9537234 | Best = 0.9564515
GA | iter = 90 | Mean = 0.9531017 | Best = 0.9564515
GA | iter = 91 | Mean = 0.9514525 | Best = 0.9564515
GA | iter = 92 | Mean = 0.9505517 | Best = 0.9564515
GA | iter = 93 | Mean = 0.9524752 | Best = 0.9564515
GA | iter = 94 | Mean = 0.9533879 | Best = 0.9564515
GA | iter = 95 | Mean = 0.9519166 | Best = 0.9564515
GA | iter = 96 | Mean = 0.9524416 | Best = 0.9564515
GA | iter = 97 | Mean = 0.9526676 | Best = 0.9564515
GA | iter = 98 | Mean = 0.9523745 | Best = 0.9564515
GA | iter = 99 | Mean = 0.9523710 | Best = 0.9564515
GA | iter = 100 | Mean = 0.9519255 | Best = 0.9564515
> res@solution
     x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 x20 x21 x22 x23 x24 x25
[1,]  1  1  1  0  0  0  0  1  1   0   1   0   1   1   0   1   0   0   1   0   1   1   1   1   1
> res@fitnessValue
[1] 0.9564515

When comparing to our proven optimal Baron and Gurobi solution, we see that the ga function finds the optimal solution. Of course, we would not know this if we did not solve the model with a global solver.


Conclusions


  • Allowing us to pick and choose observations between two data sets can have a significant effect on the correlation coefficient.
  • The problem to find the optimal mix is a difficult combinatorial problem.
  • The obvious approach is to formulate this as a non-convex MINLP.
  • An alternative is to use a non-convex MIQCP problem. This will bring Gurobi into the picture. However, if we don't impose an upper-bound on the objective, Gurobi is rendered totally helpless. After adding the bound, things look much better.
  • This problem looks quite suited for a meta-heuristic. They are fast and simple to implement, but you only get good solutions (occasionally optimal ones), without any quality assurance.  

References





Appendix: Gurobi LP file


This LP file causes real trouble for Gurobi due to its unbounded relaxation. Note this has to be solved with the option nonconvex 2
 

\ Model GAMS Model
\ LP format - for model browsing. Use MPS format to capture full model detail.
Maximize
  z
Subject To
 defwx(i1): 4.57048155290812 x(i1) + hwx(i1,weight) = 168.2628705894044
 defwx(i2): 14.35293953397428 x(i2) + hwx(i2,weight) = 174.4377502565614
 defwx(i3): - 4.810337240480266 x(i3) + hwx(i3,weight) = 159.9096723101254
 defwx(i4): 2.995859558106929 x(i4) + hwx(i4,weight) = 145.7041593291271
 defwx(i5): - 9.978006854397336 x(i5) + hwx(i5,weight) = 155.5869839889373
 defwx(i6): - 11.83450629445829 x(i6) + hwx(i6,weight) = 154.335000536442
 defwx(i7): 5.047095974632526 x(i7) + hwx(i7,weight) = 158.7688132554227
 defwx(i8): 10.42261080570944 x(i8) + hwx(i8,weight) = 175.1269512206012
 defwx(i9): 7.809086784903798 x(i9) + hwx(i9,weight) = 157.5935866164602
 defwx(i10): - 2.214578422082724 x(i10) + hwx(i10,weight) = 160.0634282485366
 defwx(i11): 6.211485020463414 x(i11) + hwx(i11,weight) = 159.0391947749885
 defwx(i12): 7.529108790630033 x(i12) + hwx(i12,weight) = 164.8857039875973
 defwx(i13): 1.152396641792848 x(i13) + hwx(i13,weight) = 151.2234684137615
 defwx(i14): 0.2561878995556413 x(i14) + hwx(i14,weight) = 160.9786711234614
 defwx(i15): - 0.3965543777673872 x(i15) + hwx(i15,weight) = 172.2986522599465
 defwx(i16): - 0.0758869132981204 x(i16) + hwx(i16,weight) = 158.2347086210175
 defwx(i17): - 3.195219014258043 x(i17) + hwx(i17,weight) = 150.8994283190054
 defwx(i18): - 3.620025521749142 x(i18) + hwx(i18,weight) = 150.0242984063181
 defwx(i19): - 1.684699946033305 x(i19) + hwx(i19,weight) = 165.0865228272624
 defwx(i20): 1.807060121523165 x(i20) + hwx(i20,weight) = 170.8206099631946
 defwx(i21): 5.179025632721846 x(i21) + hwx(i21,weight) = 154.3315455729319
 defwx(i22): 6.499545761101757 x(i22) + hwx(i22,weight) = 173.9619147338905
 defwx(i23): - 3.582740184390104 x(i23) + hwx(i23,weight) = 168.190866891775
 defwx(i24): 3.442809290323368 x(i24) + hwx(i24,weight) = 163.5776971098937
 defwx(i25): - 3.356103037296691 x(i25) + hwx(i25,weight) = 155.3576067743665
 defhx(i1): - 0.012238100107453 x(i1) + hwx(i1,height) = 67.43328535705723
 defhx(i2): 1.98918429361234 x(i2) + hwx(i2,height) = 70.6383740411321
 defhx(i3): 1.813883224886396 x(i3) + hwx(i3,height) = 71.3177939118135
 defhx(i4): - 1.325467115496522 x(i4) + hwx(i4,height) = 59.85026108005525
 defhx(i5): - 3.141971238432788 x(i5) + hwx(i5,height) = 65.34193793182213
 defhx(i6): - 4.426674002923676 x(i6) + hwx(i6,height) = 64.14200864133288
 defhx(i7): 1.249564935900537 x(i7) + hwx(i7,height) = 67.03036779841625
 defhx(i8): 0.4363485690300877 x(i8) + hwx(i8,height) = 73.67286344172466
 defhx(i9): 1.924239239167797 x(i9) + hwx(i9,height) = 65.20351620466388
 defhx(i10): 0.2151915024726918 x(i10) + hwx(i10,height) = 69.00184760175402
 defhx(i11): 0.5252136910261171 x(i11) + hwx(i11,height) = 64.45542162911404
 defhx(i12): 1.053237234263008 x(i12) + hwx(i12,height) = 70.71933367432223
 defhx(i13): 2.073862555158748 x(i13) + hwx(i13,height) = 65.68842782277154
 defhx(i14): - 3.964068185905262 x(i14) + hwx(i14,height) = 66.5692517581452
 defhx(i15): - 1.652399901482397 x(i15) + hwx(i15,height) = 78.41767636858228
 defhx(i16): - 2.008787567894402 x(i16) + hwx(i16,height) = 65.39615405205915
 defhx(i17): 1.504528370533244 x(i17) + hwx(i17,height) = 62.50496743343454
 defhx(i18): - 1.511924170502624 x(i18) + hwx(i18,height) = 62.12263020481269
 defhx(i19): - 2.400794170619122 x(i19) + hwx(i19,height) = 70.5983998511028
 defhx(i20): - 1.68707570335097 x(i20) + hwx(i20,height) = 74.93510670286524
 defhx(i21): 2.861079762565417 x(i21) + hwx(i21,height) = 63.23395556183407
 defhx(i22): - 4.198544675012556 x(i22) + hwx(i22,height) = 72.55010450534969
 defhx(i23): - 1.346778063106839 x(i23) + hwx(i23,height) = 74.08655337740214
 defhx(i24): - 0.3379045824620164 x(i24) + hwx(i24,height) = 65.37964799513293
 defhx(i25): - 3.298388577170783 x(i25) + hwx(i25,height) = 64.0030375549771
 defmean(height): - 0.04 hwx(i1,height) - 0.04 hwx(i2,height) - 0.04 hwx(i3,height) - 0.04 hwx(i4,height) - 0.04 hwx(i5,height)
   - 0.04 hwx(i6,height) - 0.04 hwx(i7,height) - 0.04 hwx(i8,height) - 0.04 hwx(i9,height) - 0.04 hwx(i10,height) - 0.04 hwx(i11,height)
   - 0.04 hwx(i12,height) - 0.04 hwx(i13,height) - 0.04 hwx(i14,height) - 0.04 hwx(i15,height) - 0.04 hwx(i16,height) - 0.04 hwx(i17,height)
   - 0.04 hwx(i18,height) - 0.04 hwx(i19,height) - 0.04 hwx(i20,height) - 0.04 hwx(i21,height) - 0.04 hwx(i22,height) - 0.04 hwx(i23,height)
   - 0.04 hwx(i24,height) - 0.04 hwx(i25,height) + meanx(height) = 0
 defmean(weight): - 0.04 hwx(i1,weight) - 0.04 hwx(i2,weight) - 0.04 hwx(i3,weight) - 0.04 hwx(i4,weight) - 0.04 hwx(i5,weight)
   - 0.04 hwx(i6,weight) - 0.04 hwx(i7,weight) - 0.04 hwx(i8,weight) - 0.04 hwx(i9,weight) - 0.04 hwx(i10,weight) - 0.04 hwx(i11,weight)
   - 0.04 hwx(i12,weight) - 0.04 hwx(i13,weight) - 0.04 hwx(i14,weight) - 0.04 hwx(i15,weight) - 0.04 hwx(i16,weight) - 0.04 hwx(i17,weight)
   - 0.04 hwx(i18,weight) - 0.04 hwx(i19,weight) - 0.04 hwx(i20,weight) - 0.04 hwx(i21,weight) - 0.04 hwx(i22,weight) - 0.04 hwx(i23,weight)
   - 0.04 hwx(i24,weight) - 0.04 hwx(i25,weight) + meanx(weight) = 0
 sigma(height): [ - hwx(i1,height) ^2 + 2 hwx(i1,height) * meanx(height) - hwx(i2,height) ^2 + 2 hwx(i2,height) * meanx(height)
   - hwx(i3,height) ^2 + 2 hwx(i3,height) * meanx(height) - hwx(i4,height) ^2 + 2 hwx(i4,height) * meanx(height)
   - hwx(i5,height) ^2 + 2 hwx(i5,height) * meanx(height) - hwx(i6,height) ^2 + 2 hwx(i6,height) * meanx(height)
   - hwx(i7,height) ^2 + 2 hwx(i7,height) * meanx(height) - hwx(i8,height) ^2 + 2 hwx(i8,height) * meanx(height)
   - hwx(i9,height) ^2 + 2 hwx(i9,height) * meanx(height) - hwx(i10,height) ^2 + 2 hwx(i10,height) * meanx(height)
   - hwx(i11,height) ^2 + 2 hwx(i11,height) * meanx(height) - hwx(i12,height) ^2 + 2 hwx(i12,height) * meanx(height)
   - hwx(i13,height) ^2 + 2 hwx(i13,height) * meanx(height) - hwx(i14,height) ^2 + 2 hwx(i14,height) * meanx(height)
   - hwx(i15,height) ^2 + 2 hwx(i15,height) * meanx(height) - hwx(i16,height) ^2 + 2 hwx(i16,height) * meanx(height)
   - hwx(i17,height) ^2 + 2 hwx(i17,height) * meanx(height) - hwx(i18,height) ^2 + 2 hwx(i18,height) * meanx(height)
   - hwx(i19,height) ^2 + 2 hwx(i19,height) * meanx(height) - hwx(i20,height) ^2 + 2 hwx(i20,height) * meanx(height)
   - hwx(i21,height) ^2 + 2 hwx(i21,height) * meanx(height) - hwx(i22,height) ^2 + 2 hwx(i22,height) * meanx(height)
   - hwx(i23,height) ^2 + 2 hwx(i23,height) * meanx(height) - hwx(i24,height) ^2 + 2 hwx(i24,height) * meanx(height)
   - hwx(i25,height) ^2 + 2 hwx(i25,height) * meanx(height) - 25 meanx(height) ^2 + s(height) ^2 ] = 0
 sigma(weight): [ - hwx(i1,weight) ^2 + 2 hwx(i1,weight) * meanx(weight) - hwx(i2,weight) ^2 + 2 hwx(i2,weight) * meanx(weight)
   - hwx(i3,weight) ^2 + 2 hwx(i3,weight) * meanx(weight) - hwx(i4,weight) ^2 + 2 hwx(i4,weight) * meanx(weight)
   - hwx(i5,weight) ^2 + 2 hwx(i5,weight) * meanx(weight) - hwx(i6,weight) ^2 + 2 hwx(i6,weight) * meanx(weight)
   - hwx(i7,weight) ^2 + 2 hwx(i7,weight) * meanx(weight) - hwx(i8,weight) ^2 + 2 hwx(i8,weight) * meanx(weight)
   - hwx(i9,weight) ^2 + 2 hwx(i9,weight) * meanx(weight) - hwx(i10,weight) ^2 + 2 hwx(i10,weight) * meanx(weight)
   - hwx(i11,weight) ^2 + 2 hwx(i11,weight) * meanx(weight) - hwx(i12,weight) ^2 + 2 hwx(i12,weight) * meanx(weight)
   - hwx(i13,weight) ^2 + 2 hwx(i13,weight) * meanx(weight) - hwx(i14,weight) ^2 + 2 hwx(i14,weight) * meanx(weight)
   - hwx(i15,weight) ^2 + 2 hwx(i15,weight) * meanx(weight) - hwx(i16,weight) ^2 + 2 hwx(i16,weight) * meanx(weight)
   - hwx(i17,weight) ^2 + 2 hwx(i17,weight) * meanx(weight) - hwx(i18,weight) ^2 + 2 hwx(i18,weight) * meanx(weight)
   - hwx(i19,weight) ^2 + 2 hwx(i19,weight) * meanx(weight) - hwx(i20,weight) ^2 + 2 hwx(i20,weight) * meanx(weight)
   - hwx(i21,weight) ^2 + 2 hwx(i21,weight) * meanx(weight) - hwx(i22,weight) ^2 + 2 hwx(i22,weight) * meanx(weight)
   - hwx(i23,weight) ^2 + 2 hwx(i23,weight) * meanx(weight) - hwx(i24,weight) ^2 + 2 hwx(i24,weight) * meanx(weight)
   - hwx(i25,weight) ^2 + 2 hwx(i25,weight) * meanx(weight) - 25 meanx(weight) ^2 + s(weight) ^2 ] = 0
 denom: shw + [ - s(height) * s(weight) ] = 0
 obj: [ - hwx(i1,height) * hwx(i1,weight) + hwx(i1,height) * meanx(weight) + hwx(i1,weight) * meanx(height) - hwx(i2,height) * hwx(i2,weight)
   + hwx(i2,height) * meanx(weight) + hwx(i2,weight) * meanx(height) - hwx(i3,height) * hwx(i3,weight) + hwx(i3,height) * meanx(weight)
   + hwx(i3,weight) * meanx(height) - hwx(i4,height) * hwx(i4,weight) + hwx(i4,height) * meanx(weight) + hwx(i4,weight) * meanx(height)
   - hwx(i5,height) * hwx(i5,weight) + hwx(i5,height) * meanx(weight) + hwx(i5,weight) * meanx(height) - hwx(i6,height) * hwx(i6,weight)
   + hwx(i6,height) * meanx(weight) + hwx(i6,weight) * meanx(height)  - hwx(i7,height) * hwx(i7,weight) + hwx(i7,height) * meanx(weight)
   + hwx(i7,weight) * meanx(height) - hwx(i8,height) * hwx(i8,weight) + hwx(i8,height) * meanx(weight) + hwx(i8,weight) * meanx(height)
   - hwx(i9,height) * hwx(i9,weight) + hwx(i9,height) * meanx(weight) + hwx(i9,weight) * meanx(height) - hwx(i10,height) * hwx(i10,weight)
   + hwx(i10,height) * meanx(weight) + hwx(i10,weight) * meanx(height) - hwx(i11,height) * hwx(i11,weight) + hwx(i11,height) * meanx(weight)
   + hwx(i11,weight) * meanx(height) - hwx(i12,height) * hwx(i12,weight) + hwx(i12,height) * meanx(weight) + hwx(i12,weight) * meanx(height)
   - hwx(i13,height) * hwx(i13,weight) + hwx(i13,height) * meanx(weight) + hwx(i13,weight) * meanx(height) - hwx(i14,height) * hwx(i14,weight)
   + hwx(i14,height) * meanx(weight) + hwx(i14,weight) * meanx(height) - hwx(i15,height) * hwx(i15,weight) + hwx(i15,height) * meanx(weight)
   + hwx(i15,weight) * meanx(height) - hwx(i16,height) * hwx(i16,weight) + hwx(i16,height) * meanx(weight) + hwx(i16,weight) * meanx(height)
   - hwx(i17,height) * hwx(i17,weight) + hwx(i17,height) * meanx(weight) + hwx(i17,weight) * meanx(height) - hwx(i18,height) * hwx(i18,weight)
   + hwx(i18,height) * meanx(weight) + hwx(i18,weight) * meanx(height) - hwx(i19,height) * hwx(i19,weight) + hwx(i19,height) * meanx(weight)
   + hwx(i19,weight) * meanx(height) - hwx(i20,height) * hwx(i20,weight) + hwx(i20,height) * meanx(weight) + hwx(i20,weight) * meanx(height)
   - hwx(i21,height) * hwx(i21,weight) + hwx(i21,height) * meanx(weight) + hwx(i21,weight) * meanx(height) - hwx(i22,height) * hwx(i22,weight)
   + hwx(i22,height) * meanx(weight) + hwx(i22,weight) * meanx(height) - hwx(i23,height) * hwx(i23,weight) + hwx(i23,height) * meanx(weight)
   + hwx(i23,weight) * meanx(height) - hwx(i24,height) * hwx(i24,weight) + hwx(i24,height) * meanx(weight) + hwx(i24,weight) * meanx(height)
   - hwx(i25,height) * hwx(i25,weight) + hwx(i25,height) * meanx(weight) + hwx(i25,weight) * meanx(height) - 25 meanx(height) * meanx(weight)
   + z * shw ] = 0
Bounds
\ z <= 1
Binaries
 x(i1) x(i2) x(i3) x(i4) x(i5) x(i6) x(i7) x(i8) x(i9) x(i10) x(i11) x(i12)
 x(i13) x(i14) x(i15) x(i16) x(i17) x(i18) x(i19) x(i20) x(i21) x(i22)
 x(i23) x(i24) x(i25)
End