Sunday, June 17, 2018

select weights to maximize count

In [1] a problem that is simple at first sight. Looking a bit further there are some interesting angles.

The example data set is:


----     27 PARAMETER a  

            j1          j2          j3

i1       0.870       0.730       0.410
i2       0.820       0.730       0.850
i3       0.820       0.370       0.850
i4       0.580       0.950       0.420
i5       1.000       1.000       0.900

The idea is that we can apply weights \(w_j\) to calculate a final score for each row:\[F_i = \sum_j w_j a_{i,j}\] Weights obey the usual constraints: \(w_j \in [0,1]\) and \(\sum_j w_j=1\). The goal is to find optimal weights such that the number of records with final score in the bucket \([0.9, 1]\) is maximized.

Looking at the data, \(w=(0,1,0)\) is a good choice. This gives us two \(F_i\)'s in our bucket,  Let's see if we can formalize this with a model.

MIP Model


Counting is done with binary variables. So let's define\[\delta_i = \begin{cases} 1 & \text{if $L\le F_i\le U$}\\ 0 & \text{otherwise}\end{cases}\] I used \(L\) and \(U\) to indicate the bucket.

A first model can look like:\[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align} \max & \sum_i \delta_i \\  & F_i = \sum_j w_j a_{i,j}\\& \sum_i w_i = 1\\ & L - M(1-\delta_i) \le F_i \le U+M(1-\delta_i)\\ & \delta_i \in \{0,1\} \\ & w_i \in [0,1]\end{align}}\] Here \(M\) is a large enough constant.

The sandwich equation models the implication \[\delta_i=1 \Rightarrow L\le F_i \le U\] The objective will make sure that \(L\le F_i \le U \Rightarrow \delta_i=1 \) holds for the optimal solution.

This is the big picture. This model works: it finds an optimal set of weights so that as many as possible \(F_i \in [L,U]\). In the rest of this post I dive into to some nitty-gritty details. No elegant math, just largely boring stuff. However, this is typically what we need to do when working on non-trivial MIP models.

Big-M


The data seems to suggest \(0 \le a_{i,j} \le 1\). Which means \(0 \le F_i \le 1\). This follows from \[\min_j a_{i,j} \le F_i \le \min_j a_{i,j}\] We can also assume \(0 \le L \le U \le 1\). We can conclude the largest difference possible between \(F_i\) and \(L\) (and \(F_i\) and \(U\)) is one. So, in this case an obvious value for \(M\) is \(M=1\). More generally, if we first make sure \(U \le \max\{a_{i,j}\}\) and \(L \ge \min\{a_{i,j}\}\) by the preprocessing step: \[\begin{align} & U := \min\{U, \max\{a_{i,j}\}\}\\ & L := \max\{L, \min\{a_{i,j}\}\}\end{align}\] we have: \[M = \max \{a_{i,j}\} - \min \{a_{i,j}\}\] We can do even better by using an \(M_i\) for each record instead of a single, global \(M\).  We will get back to this later.

More preprocessing


We can optimize things further by observing that not always both "branches" of \[L - M(1-\delta_i) \le F_i \le U+M(1-\delta_i)\] are needed. With our small example we have \(U=1\), but we know already that \(F_i \le 1\). So in this case we only need to worry about \(L - M(1-\delta_i) \le F_i\).

We can generalize this as follows. First calculate bounds \(\ell_i \le F_i\le u_i\): \[\begin{align} & \ell_i = \min_j a_{i,j}\\ & u_i = \max_j a_{i,j}\end{align}\] Then generate constraints: \[\begin{align} & L - M(1-\delta_i) \le F_i && \forall i | L > \ell_i\\ & F_i \le U+M(1-\delta_i) && \forall i | U < u_i\end{align}\]

Even more preprocessing


The first three records in the example data set are really no candidates for having \(F_i \in [0.9, 1]\). The reason is that for those records we have \(u_i \lt L\). In general we can skip all records with \(u_i \lt L\) or \(\ell_i \gt U\). These records will never have \(\delta_i=1\).

Combining things


I extended the \(a\) matrix with following columns:

  • lo: \(\ell_i=\min_j a_{i,j}\).
  • up: \(u_i = \max_j a_{i,j}\).
  • cand: boolean, indicates if this row is a candidate. We check: \(u_i \ge L\) and \(\ell_i \le U\).
  • chkL: boolean, indicates if we need to check the left "branch". This is equal to one if \(\ell_i\lt L\).
  • chkR: boolean, indicates if we need to check the right "branch". This is equal to one if \(u_i\gt L\). No records have this equal to 1. 
The matrix now looks like:


----     41 PARAMETER a  

            j1          j2          j3          lo          up        cand        chkL

i1       0.870       0.730       0.410       0.410       0.870                   1.000
i2       0.820       0.730       0.850       0.730       0.850                   1.000
i3       0.820       0.370       0.850       0.370       0.850                   1.000
i4       0.580       0.950       0.420       0.420       0.950       1.000       1.000
i5       1.000       1.000       0.900       0.900       1.000       1.000

The last record is interesting. It has \(\mathit{chkL}=\mathit{chkR}=0\). This is correct: it will always allow \(\delta_{i5}=1\) no matter what weights we choose. Note that the constraints \(L - M(1-\delta_i) \le F_i\) are only generated in case both \(\mathit{cand}=\mathit{chkL}=1\).

For a small data set this all does not make much difference, but for large ones, we make the model much smaller.

Results


The optimal weights for this small data set are not unique. Obviously I get the same optimal number of selected records:


----     71 VARIABLE w.L  weights

j1 0.135,    j2 0.865


----     71 VARIABLE f.L  final scores

i1 0.749,    i2 0.742,    i3 0.431,    i4 0.900,    i5 1.000


----     71 VARIABLE delta.L  selected

i4 1.000,    i5 1.000


----     71 VARIABLE z.L                   =        2.000  objective

Big-M revisited


I did not pay too much attention to my big-M's. I just used the calculation \(M = \max \{a_{i,j}\} - \min \{a_{i,j}\}\), which yielded:


----     46 PARAMETER M                    =        0.630  big-M

We can use a tailored \(M^L_i, M^U_i\) for each inequality. For the lower bounds our equation looks like \[L - M^L_i(1-\delta_i) \le F_i\] This means we have \(M^L_i \ge L - F_i\). We have bounds on \(F_i\), (we stored these in \(a_{i,up}\) and \(a_{i,lo}\)).  So we can set \(M^L_i =L - a_{i,lo}\). This gives:


----     78 PARAMETER ML  

i4 0.480

Similar for the U inequality. For our small example this is not so important, but for larger instances with a wider range of data this may be essential.

Larger problem


For a larger random problem:


----     43 PARAMETER a  data

             j1          j2          j3          lo          up        cand        chkL        chkU

i1        0.737       0.866       0.893       0.737       0.893                   1.000
i2        0.434       0.654       0.606       0.434       0.654                   1.000
i3        0.721       0.987       0.648       0.648       0.987       1.000       1.000
i4        0.800       0.618       0.869       0.618       0.869                   1.000
i5        0.668       0.703       1.177       0.668       1.177       1.000       1.000       1.000
i6        0.656       0.540       0.525       0.525       0.656                   1.000
i7        0.864       1.037       0.569       0.569       1.037       1.000       1.000       1.000
i8        0.942       1.003       0.655       0.655       1.003       1.000       1.000       1.000
i9        0.600       0.801       0.601       0.600       0.801                   1.000
i10       0.619       0.933       1.114       0.619       1.114       1.000       1.000       1.000
i11       1.243       0.675       0.756       0.675       1.243       1.000       1.000       1.000
i12       0.608       0.778       0.747       0.608       0.778                   1.000
i13       0.695       0.593       1.196       0.593       1.196       1.000       1.000       1.000
i14       0.965       1.001       0.650       0.650       1.001       1.000       1.000       1.000
i15       0.951       1.040       0.969       0.951       1.040       1.000                   1.000
i16       0.514       1.220       0.935       0.514       1.220       1.000       1.000       1.000
i17       0.834       1.130       1.054       0.834       1.130       1.000       1.000       1.000
i18       1.161       0.550       0.481       0.481       1.161       1.000       1.000       1.000
i19       0.638       0.640       0.691       0.638       0.691                   1.000
i20       1.057       0.724       0.657       0.657       1.057       1.000       1.000       1.000
i21       0.814       0.775       0.448       0.448       0.814                   1.000
i22       0.800       0.737       0.741       0.737       0.800                   1.000
i23       0.573       0.555       0.789       0.555       0.789                   1.000
i24       0.829       0.679       1.059       0.679       1.059       1.000       1.000       1.000
i25       0.761       0.956       0.687       0.687       0.956       1.000       1.000
i26       0.870       0.673       0.822       0.673       0.870                   1.000
i27       0.304       0.900       0.920       0.304       0.920       1.000       1.000
i28       0.544       0.659       0.495       0.495       0.659                   1.000
i29       0.755       0.589       0.520       0.520       0.755                   1.000
i30       0.492       0.617       0.681       0.492       0.681                   1.000
i31       0.657       0.767       0.634       0.634       0.767                   1.000
i32       0.752       0.684       0.596       0.596       0.752                   1.000
i33       0.616       0.846       0.803       0.616       0.846                   1.000
i34       0.677       1.098       1.132       0.677       1.132       1.000       1.000       1.000
i35       0.454       1.037       1.204       0.454       1.204       1.000       1.000       1.000
i36       0.815       0.605       0.890       0.605       0.890                   1.000
i37       0.814       0.587       0.939       0.587       0.939       1.000       1.000
i38       1.019       0.675       0.613       0.613       1.019       1.000       1.000       1.000
i39       0.547       0.946       0.843       0.547       0.946       1.000       1.000
i40       0.724       0.571       0.757       0.571       0.757                   1.000
i41       0.611       0.916       0.891       0.611       0.916       1.000       1.000
i42       0.680       0.624       1.111       0.624       1.111       1.000       1.000       1.000
i43       1.015       0.870       0.823       0.823       1.015       1.000       1.000       1.000
i44       0.587       0.866       0.691       0.587       0.866                   1.000
i45       0.789       1.090       0.649       0.649       1.090       1.000       1.000       1.000
i46       1.436       0.747       0.805       0.747       1.436       1.000       1.000       1.000
i47       0.791       0.885       0.723       0.723       0.885                   1.000
i48       0.718       1.028       0.869       0.718       1.028       1.000       1.000       1.000
i49       0.876       0.772       0.918       0.772       0.918       1.000       1.000
i50       0.498       0.882       0.599       0.498       0.882                   1.000

we see that we can remove a significant number of records and big-M constraints. Again we want to maximize the number of  final scores \(F_i \in [0.9, 1]\). Note that now we have some records with \(a_{i,j}\gt 1\). So the column \(\mathit{chkU}\) gets properly populated.

The model solves instantaneous and shows:


----     74 VARIABLE w.L  weights

j1 0.007,    j2 0.792,    j3 0.202


----     74 VARIABLE f.L  final scores

i1  0.870,    i2  0.643,    i3  0.917,    i4  0.670,    i5  0.798,    i6  0.538,    i7  0.942,    i8  0.933
i9  0.760,    i10 0.967,    i11 0.695,    i12 0.771,    i13 0.715,    i14 0.930,    i15 1.025,    i16 1.158
i17 1.112,    i18 0.541,    i19 0.650,    i20 0.713,    i21 0.710,    i22 0.738,    i23 0.602,    i24 0.757
i25 0.900,    i26 0.705,    i27 0.900,    i28 0.625,    i29 0.576,    i30 0.629,    i31 0.739,    i32 0.667
i33 0.836,    i34 1.102,    i35 1.067,    i36 0.664,    i37 0.660,    i38 0.665,    i39 0.923,    i40 0.609
i41 0.909,    i42 0.722,    i43 0.861,    i44 0.829,    i45 0.999,    i46 0.764,    i47 0.851,    i48 0.994
i49 0.802,    i50 0.822


----     74 VARIABLE delta.L  selected

i3  1.000,    i7  1.000,    i8  1.000,    i10 1.000,    i14 1.000,    i15 1.000,    i16 1.000,    i17 1.000
i25 1.000,    i27 1.000,    i34 1.000,    i35 1.000,    i39 1.000,    i41 1.000,    i45 1.000,    i48 1.000


----     74 VARIABLE z.L                   =       16.000  objective

Conclusion


A simple model becomes not that simple once we start "optimizing" it. Unfortunately this is typical for large MIP models.

References


  1. Simulation/Optimization Package in R for tuning weights to achieve maximum allocation for groups, https://stackoverflow.com/questions/50843023/simulation-optimization-package-in-r-for-tuning-weights-to-achieve-maximum-alloc