Tuesday, March 8, 2016

MIP vs Matlab

In this post an interesting problem is posted. Too bad there is little or no background given on this problem. I am trying to suggest an optimization model opposed to some heuristic algorithm in Matlab. Looking at the zero points I get for my answer, without much success. I still think the MIP model is rather elegant. Here is the problem:

image

We have a large matrix with 10 columns and around 250k rows. The values are –1 and +1. We need to select N rows such that if we add up the columns of the selected rows, the average of the absolute values of those sums is minimized.

We can formulate this as very large MIP model with about 250k binary variables indicating if a row is selected.

There is however a data trick we can use. Observe that each cell has two possibilities, so there are only 2^10=1024 different possible rows. Our data is now essentially a fixed 1024 x 10 matrix and a vector of length 1024 of counts that tell us how many rows of each type we saw.

image

Now we only have to use 1024 integer variables. The upper bounds on the integer variables are the counts. Actually we can tighten that a little bit: it is the minimum of the count and N. The complete MIP model can look like:

250

We use a traditional variable splitting technique to model the absolute value. Note also that instead of the average or mean we just minimize the sum.

To test this formulation first we need to generate this 1,024 row matrix. In GAMS you can do this as follows:

set
  i
/n1*n1024/
  j
/j1*j10/
  val
/ '-1','+1'/
  ps(i,j,val)
/ system.PowerSetRight /
;

parameter a(i,j);
a(i,j)$ps(i,j,
'+1'
)=1;
a(i,j)$ps(i,j,
'-1'
)=-1;

The set ps is populated with:

image

Using this set, we form the parameter a as follows:

image

This is exactly the format we are looking for.

For some large values for N, the MIP solver will solve this problem surprisingly fast. Here is a log with the open source solver CBC:

COIN-OR Branch and Cut (CBC Library 2.8)
written by J. Forrest

Calling CBC main solution routine...
Integer solution of 6 found by DiveCoefficient after 0 iterations and 0 nodes (0.06 seconds)
Integer solution of 4 found by RINS after 0 iterations and 0 nodes (0.07 seconds)
Integer solution of 0 found by DiveCoefficient after 1 iterations and 0 nodes (0.15 seconds)
1 added rows had average density of 992
At root node, 1 cuts changed objective from 0 to 0 in 2 passes
Cut generator 0 (Probing) - 0 row cuts average 0.0 elements, 1 column cuts (1 active)  in 0.002 seconds - new frequency is 1
Cut generator 1 (Gomory) - 9 row cuts average 928.6 elements, 0 column cuts (1 active)  in 0.004 seconds - new frequency is -100
Cut generator 2 (Knapsack) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.000 seconds - new frequency is -100
Cut generator 3 (Clique) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.000 seconds - new frequency is -100
Cut generator 4 (MixedIntegerRounding2) - 0 row cuts average 0.0 elements, 0 column cuts (0 active)  in 0.001 seconds - new frequency is -100
Search completed - best objective 0, took 1 iterations and 0 nodes (0.15 seconds)
Maximum depth 0, 0 variables fixed on reduced cost

Solved to optimality.
MIP solution:   0.000000e+000   (0 nodes, 0.171 seconds)

Very impressive results (other good MIP solvers perform similar).