Loading [MathJax]/jax/output/CommonHTML/jax.js

Thursday, January 26, 2017

MIP modeling: Selecting subsets

Consider the following problem, original from (1):

We have a population of items with values having a mean μ. Furthermore we have a number of subsets of this population. Each subset i has a known count ni and mean μi. It can be assumed the subsets do not overlap: the pairwise intersection between two subsets is always empty. We want to select N subsets such that the total average of all selected items is as close to μ as possible.

OK, let’s see. Let:

δi={1if subset i is selected0otherwise

We could model the problem as:

min|μiniμiδiiniδi|iδi=Nδi{0,1}

where |.| indicates `absolute value`.

The objective is somewhat complicated. The ratio is the average of the items in the selected subsets. We can see this as follows. The average is the sum of the values of the items in the selected subsets divided by the number if selected items. Of course, the sum of values in a subset i is niμi which explains the numerator in the fraction.

This model is quite nonlinear, but there is a way to linearize this. The first thing we can do is introduce a variable y defined by

y=iniμiδiiniδi

and reformulate as:

ini(yδi)=iniμiδi

This is nonlinear because of the multiplication yδi. However, a multiplication of a binary variable and a continuous variable can be linearized with a little bit of effort (2). Let’s introduce a new variable xi=yδi, then we can form the linear inequalities:

yLOδixiyUPδiyyUP(1δi)xiyyLO(1δi)

Here yLO,yUP is the lower- and upper-bound on y. We can use:

yLO:=miniμiyUP:=maxiμi

With this we can rewrite the condition ini(yδi)=iniμiδi as:

inixi=iniμiδi

The absolute value can be handled in a standard way, e.g.:

minzzμyz

Now we have all the tools to write the complete linear model:

minzzμyzyLOδixiyUPδiyyUP(1δi)xiyyLO(1δi)inixi=iniμiδiiδi=Nδi{0,1}z0yLOyyUPxifree

What if the subsets can overlap? That means a data element ak can be a member of multiple subsets. If we want it to be counted just once, we need to consider the data elements themselves. This implies the need to introduce a binary variable that indicates whether ak is in any of the selected subsets. I believe the high-level model can look like:

min|μkakαkkαk|αk=isi,kδiiδi=Nδi,αk{0,1}

Here we have data: si,k=1 if value ak is in subset i. This can be linearized in a similar fashion. 

References
  1. http://stackoverflow.com/questions/41879502/find-subset-with-similar-mean-as-full-set
  2. Multiplication of a continuous and a binary variable, http://yetanothermathprogrammingconsultant.blogspot.com/2008/05/multiplication-of-continuous-and-binary.html

No comments:

Post a Comment