Thursday, March 19, 2009

MIP model

Hi,
I am dealing with an optimization task as follows:
1) Tom is given a large box which contains 1000 bags of marbles.
2) Inside each bag, there are between 1 and 50 marbles.
3) Within a given bag, there are no duplicate colors of marbles, however there are duplicate colors of marbles across the bags.
4) The bags are labeled and Tom has a list of what color marbles each bag contains.
Objective:  Tom wants to collect as many UNIQUE colors of marbles as possible, but is only allowed to pick out 100 bags from the box.  Tom derives absolutely no value from getting duplicate colors of marbles.  Tell Tom which 100 bags to pick.
My question:
First of all, is this even a problem that's appropriate for linear programming? I was able to solve this quite simply on a small scale in Excel's Solver, but when I moved over to GLPK I couldn't figure out how to avoid making my objective statement non-linear.  Initially I wrote out the problem like below.  However, glpk complained about having a variable in my if/else statement in the objective function.  I'm still rather new to LP so if someone could tell me whether or not I should even be using LP for this, and if so, I'd appreciate some suggestions on how to model the problem such that GLPK doesn't complain. Thanks.
/* sets */
set COLORS;
set BAGS;
/* parameters */
param colors_to_bags {i in COLORS, j in BAGS}
/* decision variables */
var selected_bags {j in BAGS} binary >= 0;
/* objective function */
maximize uniques: sum{i in COLORS} if (sum{j in BAGS} colors_to_bags[i, j] * selected_bags[j]) > 1 then 1 else 0;
/* constraints */
s.t. max_bags: sum{j in BAGS} selected_bags{j} <= 100;
data;
set BAGS := bag1 bag2 .... bag1000;
set COLORS := green yellow blue ...etc; (assume there are many many colors)
/* 1 means the bag contains that color marble, 0 means it does not */
praram colors_to_bags: bag1 bag2 .... bag1000:=
green 0 1 0 0 ...
yellow 0 1 0 1 ...
blue 1 0 0 1 ...
etc. 0 1 1 0 ...;
end;

You need to introduce an extra variable have_color[i]:

var have_color{i in COLORS} binary;

/* all colors_to_bags[i,j]*selected_bags[j] = 0 =>have_color[i]=0 */
eq1{i in COLORS}: have_color[i] <= sum{j in BAGS} colors_to_bags[i,j]*selected_bags[j];

/* any colors_to_bags[i,j]*selected_bags[j] = 1 =>have_color[i]=1 */
eq2{i in COLORS, j in BAGS}: have_color[i] >= colors_to_bags[i,j]*selected_bags[j];

Then the obj becomes:

maximize uniques: sum{i in COLORS} have_color[i];

You can improve performance by relaxing have_color to be a continuous variable between 0 and 1 (it is automatically integer). In addition you may drop eq2 as you are maximizing this anyway. We are lucky: eq2 is adding many equations.