Monday, May 1, 2017

Linearizing an average

In (1) a question is asked about averaging a certain allocation.

Let \(i\) be the set of prices to allocate to buckets \(j\). As each price can go to at most one bucket, it makes sense to use a binary variable:

\[x_{i,j} = \begin{cases} 1 & \text{if price $i$ is allocated to bucket $j$}\\
                                   0 & \text{otherwise}\end{cases}\]

The objective is to minimize the spread of the average prices seen in each bucket. A (nonlinear) model can look like:

\[
\bbox[lightcyan,10px,border:3px solid darkblue]
{\begin{align} \min\>& \left(\max_j  \mu_j – \min_j \mu_j \right)\\
                             & \mu_j = \frac{\displaystyle\sum_i p_i x_{i,j}}{\displaystyle\sum_i x_{i,j}}\\
                             &\sum_j x_{i,j} = 1 & \forall i\\
                             &x_{i,j} \in \{0,1\}
\end{align}}\]

Here the variable \(\mu_j\) is the average price in bucket \(j\). We have two nonlinearities in this model: the objective, and the calculation of \(\mu_j\).

The objective is easily linearized:

\[\begin{align}
\min\> & \mu_{max} – \mu_{min}\\
         & \mu_{max} \ge \mu_j & \forall j\\
         & \mu_{min} \le \mu_j & \forall j
\end{align}\]

But what about the calculation of the average? We can write:

\[\mu_j \sum_i x_{i,j} = \sum_i p_i x_{i,j} \]

This does not help much. If we introduce \(z_j = \sum_i x_{i,j}\) then the left-hand side is \(\mu_j z_j\) which is still not easy to linearize. However, a better approach is to write

\[\sum_i \mu_j x_{i,j} = \sum_i p_i x_{i,j} \]

The expressions \(\mu_j x_{i,j}\) are still non-linear, but they are a multiplication of a binary variables times a continuous variable. This we know how to linearize (2).

The complete linearized model can look like:

\[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align}
\min\> & \mu_{max} – \mu_{min}\\
         & \mu_{max} \ge \mu_j & \forall j\\
         & \mu_{min} \le \mu_j & \forall j \\
         & \sum_i y_{i,j} = \sum_i p_i x_{i,j} & \forall j\\
         & y_{i,j} \le U x_{i,j}\\
         & y_{i,j} \le \mu_j\\
         & y_{i,j} \ge \mu_j – U(1-x_{i,j})\\
         &\sum_j x_{i,j} = 1 & \forall i\\
         &y_{i,j} \in [0,U]\\
         &x_{i,j} \in \{0,1\}
\end{align}}\]

where \(U=\displaystyle \max_i p_i\) (this is data). The variable \(y_{i,j}\) can be interpreted as \(y_{i,j}=\mu_j x_{i,j}\).

In a real application we would have other restrictions as well. The most obvious is that we want to allocate at least one price to a bucket. This is simply done by requiring \(\displaystyle\sum_i x_{i,j} \ge 1\). Here is some example output while including this constraint:

----     53 SET i  prices

i1 ,    i2 ,    i3 ,    i4 ,    i5 ,    i6 ,    i7 ,    i8 ,    i9
i10


----     53 SET j  buckets

j1,    j2,    j3,    j4,    j5


----     53 PARAMETER p  price

i1  1.717,    i2  8.433,    i3  5.504,    i4  3.011,    i5  2.922
i6  2.241,    i7  3.498,    i8  8.563,    i9  0.671,    i10 5.002


----     53 VARIABLE x.L  assignment

             j1          j2          j3          j4          j5

i1            1
i2                        1
i3                                    1
i4                                                1
i5            1
i6                                    1
i7                                                            1
i8            1
i9                        1
i10                                               1


----     53 VARIABLE mu.L  average

j1 4.401,    j2 4.552,    j3 3.872,    j4 4.007,    j5 3.498


----     53 VARIABLE y.L  average or zero

             j1          j2          j3          j4          j5

i1        4.401
i2                    4.552
i3                                3.872
i4                                            4.007
i5        4.401
i6                                3.872
i7                                                        3.498
i8        4.401
i9                    4.552
i10                                           4.007


----     53 VARIABLE mumin.L               =        3.498 
            VARIABLE mumax.L               =        4.552 

Side note: What would happen if we allow zero prices to go into a bucket? The model would not restrict \(\mu_j\) for such a bucket. In extremis we can put all prices in the first bucket (with average price \(\mu_1\)), and then have all the remaining buckets empty. That would allow the model to pick \(\mu_j=\mu_1\) i.e. no spread. That is exactly what I got when I tried this (by dropping the constraint \(\displaystyle\sum_i x_{i,j} \ge 1\)). The results with the same data as before:

----     53 VARIABLE x.L  assignment

             j1

i1            1
i2            1
i3            1
i4            1
i5            1
i6            1
i7            1
i8            1
i9            1
i10           1


----     53 VARIABLE mu.L  average

j1 4.156,    j2 4.156,    j3 4.156,    j4 4.156,    j5 4.156


----     53 VARIABLE y.L  average or zero

             j1

i1        4.156
i2        4.156
i3        4.156
i4        4.156
i5        4.156
i6        4.156
i7        4.156
i8        4.156
i9        4.156
i10       4.156


----     53 VARIABLE mumin.L               =        4.156 
            VARIABLE mumax.L               =        4.156 

The same trick can be used to linearize the problem stated in (3).

References
  1. Fair allocations to buckets with constraints, http://stackoverflow.com/questions/43662264/fair-allocation-to-buckets-with-constraints
  2. Multiplication of a continuous and a binary variable, http://yetanothermathprogrammingconsultant.blogspot.com/2008/05/multiplication-of-continuous-and-binary.html 
  3. Convert nonlinear objective function to linear objective function, http://stackoverflow.com/questions/43591174/convert-nonlinear-objective-function-to-linear-objective-function


Appendix: GAMS Model

$ontext

   
Assign items to buckets (bins) such that the
   
average price is as equal as possible.


$offtext

sets
 i
'prices' /i1*i10/
 j
'buckets' /j1*j5/;


*-------------------------------------------------------
* data: random prices
*-------------------------------------------------------

* random data
parameter p(i) 'price';
p(i) = uniform(0,10);
display p;

* upperbound on avg price in a bucket
scalar U;
U =
smax(i,p(i));
display U;


*-------------------------------------------------------
* optimization model
*-------------------------------------------------------


binary variable x(i,j) 'assignment';
positive variable y(i,j) 'average or zero';
variables
   mu(j) 
'average'
   mumax 
'max of mu(j)'
   mumin 
'max of mu(j)'
   z     
'objective variable'
;

equations
  emumin(j) 
'minimum of averages'
  emumax(j) 
'maximum of averages'
  ratio(j)  
'reformulation of ratio mu(j)=sum(i,p(i)*x(i,j))/sum(i,x(i,j))'
  ydef1(i,j)
'linearization of y(i,j) = mu(j)*x(i,j)'
  ydef2(i,j)
'linearization of y(i,j) = mu(j)*x(i,j)'
  ydef3(i,j)
'linearization of y(i,j) = mu(j)*x(i,j)'
  assign1(i) 
'standard assignment constraint'
  assign2(j) 
'minimum one item per bin'
  obj       
'objective'
;


emumin(j).. mumin =l= mu(j);
emumax(j).. mumax =g= mu(j);
ratio(j)..
sum(i, y(i,j)) =e= sum(i, p(i)*x(i,j));

ydef1(i,j).. y(i,j) =l= U*x(i,j);
ydef2(i,j).. y(i,j) =l= mu(j);
ydef3(i,j).. y(i,j) =g= mu(j)-U*(1-x(i,j));

assign1(i)..
sum(j, x(i,j)) =e= 1;
assign2(j)..
sum(i, x(i,j)) =g= 1;


y.up(i,j) = U;

obj.. z =e= mumax-mumin;

model m /all/;
* global optimality
option optcr=0;
solve m  minimizing z using mip;


*-------------------------------------------------------
* print results
*-------------------------------------------------------

option x:0;
display i,j,p,x.l,mu.l,y.l,mumin.l,mumax.l;

10 comments:

  1. Thank you for a brilliant and straightforward discussion regarding how to approach this problem. I am so happy that I found your blog! As it happens, I've spent a solid 3+ days banging my head on the desk trying to figure out how to code the problem of minimizing the spread of averages across "buckets", and my approach was getting me nowhere (instead of treating the average value as a variable whose values are to be populated by the solver, I was trying to algorithmically compute the average values and then do a minimization on either the "excess" or "shortage" variables that are required for the "bucket sum" to match up to the average value. The problem I was wrangling with was how to deal with the fact that my average value computation (which used an incremental averaging/running sum technique to keep things nice and linear) was not suitable for computing the average of a set of numbers in which zeros are present and should not be allowed to "dilute" the mean value. Even that situation would have been OK, if all "buckets" were guaranteed to have the same number of nonzero values, but such was not the case for the problem I am solving. Your approach is sheer genius! I hope I can understand it for at least as long as it takes to transfer some of the fundamental methodology to my own problem!

    You very helpfully show the result for when all buckets have at least 1 price, and the other extreme when no buckets are required to be filled (which results in a "trivial" solution of only 1 bucket getting filled and fully meeting all constraints, lol!)
    *Question*: What if one were to require some *MINIMUM* number of buckets to be filled, but not necessarily *ALL* of them? So for example, require:

    \sum_i x_{i,j} + SUMMEDPRICES * ( 1 - isBucketUsed{j} ) \ge 1

    and

    \sum_i x_{i,j} - SUMMEDPRICES * isBucketUsed{j} \le 0


    where:

    SUMMEDPRICES is the sum over all prices, eg SUMMEDPRICES = \sum_i p_{i}

    isBucketUsed_{j} would be a binary variable equal to 1 when the bucket has one or more prices already assigned to it, and 0 otherwise


    And to insure that some required minimum number of buckets (minBucketsUsed) were assigned prices:

    \sum_j isBucketUsed{j} \ge minBucketsUsed

    Would this strategy work for the case that falls somewhere in between the 2 extremes that you present? Or have I just messed up the linear aspect of the problem by introducing another binary variable and additional constraints? (I have only been working with this kind of modeling for about 2 weeks now, so I am still very new at working with this kind of problem!) Thanks for any advice you are willing to give on my approach for this "in-between" situation (which is actually the situation that I am struggling with in the problem that I am trying to solve for my own application!)

    ReplyDelete
    Replies
    1. Your approach should work just fine. Best thing to convince yourself, is to try out the formulation with a small data set. Just like I did for this post. Such an experiment should not take much time.

      Delete
    2. I would write it as sum(i, x(i,j)) >= used(j), sum(j, used(j)) >= minused with used(j) a binary variable. Note that some additional used bins may have used(j)=0.

      Delete
    3. If all bins are identical, you can just use a parameter (constant) used(j) with the first minused entries equal to 1 and the others 0.

      Delete
    4. awesome! Thanks for the quick reply. I'll try to "digest" your last suggestion and incorporate into the code. I'm not quite sure what "identical" in the context of bins means, but I am sure such will become obvious once I ponder your suggestion.

      I again just want to express my gratitude for your writing this post! (back in 2017, a time when I would have had no clue what this post was talking about, lol!)

      Delete
  2. Now that I have tried programming some of the above into my own code, I realize how much of this algorithm I actually still don't fully understand! I have a couple of follow up questions, if you don't mind. My understanding starts falling apart shortly after the following equation is written, just above the blue box summarizing the complete linearized model:

    sum(i, mu(j)*x(i,j)) = sum(i, p(i)*x(i,j))

    At this point, you state that since mu(j)*x(i,j) is a product of a binary and continuous variable, we know how to linearize. At that point, I referred to your other blog: http://yetanothermathprogrammingconsultant.blogspot.com/2008/05/multiplication-of-continuous-and-binary.html

    In that other blog post, I think that what you are calling "z" is what you call "y" in this post, and the "x" referenced there is actually "mu" here, and the delta referenced there would be "x(i,j)" here. "U" in this post represents the upper bound of mu. Using that other post for guidance, we should be able to rewrite y(i,j) = mu(j)*x(i,j) as:

    (1) min{0, min_mu} <= y(i,j) <= max{0, U}
    (2) min_mu * x(i,j) <= y(i,j) <= U * x(i,j)
    (3) mu(j) - U * (1 - x(i,j)) <= y(i,j) <= mu(j) - min_mu * (1 - x(i,j))

    The above are the equations in the box at the top of your other post ("Multiplication of a continuous and a binary variable").

    But we're not talking about plain 'ole mu(j)*x(i,j) but rather the sum over this product, e.g., sum(i, mu(j)*x(i,j)) ... don't we need to sum the above inequalities over "i" to get the final constraints for the problem presented in this post? This was the first place where I got a bit confused (the complete linearized model presented in this post at the bottom in the blue box do not seem to have been summed over "i". But maybe an inequality summed over i is mathematically identical to the set of the constituent inequalities ... yeah, that's true as long as each thing being summed is equal to zero or a positive number, I think??).

    The 2nd point of confusion is in comparing the above equations I just translated from your other blog post and the blue box equations in this post. Let me list them and label them below, to be crystal clear:

    (A) y(i,j) <= U*x(i,j)
    (B) y(i,j) <= mu(j)
    (C) y(i,j) >= mu(j) - U*(1-x(i,j))

    (A) is the right hand side of inequality #2 above;
    (B) is the right hand side of inequality #3, BUT ONLY IF min_mu (lower bound of mu) is set to zero
    (C) is the left hand side of inequality #3

    My questions are:
    * Are you indeed assuming that the lower bound of mu is zero?
    * Why would the lower bound not instead be set to the lowest value within the set of prices, just like U is set by the highest value among the prices?
    * Does using zero as the lower bound of mu cause any problems and/or have advantages over using the actual lowest value in the set of prices (which would be 0.671 in your example)?

    Finally, I was wondering if you could verify that you used 8.563 as the value for U? And ... would be super-helpful if I could actually see the model/code that produced the results you show at the bottom of this post in the grey boxed. (Because I am still struggling with what is a constraint, what is something that gets minimized in the objective function ... all things that could be cleared up easily by seeing actual code). Just wondering if you would be willing to share the source code on your blog?

    Thanks!

    ReplyDelete
    Replies
    1. I have added the code (using the GAMS Modeling Language). Regarding your questions: (1) in my model mu(j) is a free variable but that can be tightened to what you suggest, (2) y(i,j) can be zero even if the lowerbound on mu(j) is non-zero. We need to be careful about this. (3) No.

      Delete
    2. Oh my gosh, thank you! Super-helpful to see the model!! (In the meantime, I think I've just about programmed my version in, but now I can check against your model to make sure that I am handling the variables correctly!!)

      Delete
  3. Hi Erwin, first of all, amazing work and thank you for sharing all this knowledge!! I have a similar problem here, where i want to minimize the average. I understood most of your article, however i have no clue how to implement it on pulp (python), are you familiar with pulp coding structure? how would it be defined the objective function input for this example of yours ?

    ReplyDelete
    Replies
    1. There is nothing complicated about that. Something like prob += z.

      Delete