The example data set is:
---- 27 PARAMETER a j1 j2 j3 i1 0.870 0.730 0.410 i2 0.820 0.730 0.850 i3 0.820 0.370 0.850 i4 0.580 0.950 0.420 i5 1.000 1.000 0.900
The idea is that we can apply weights \(w_j\) to calculate a final score for each row:\[F_i = \sum_j w_j a_{i,j}\] Weights obey the usual constraints: \(w_j \in [0,1]\) and \(\sum_j w_j=1\). The goal is to find optimal weights such that the number of records with final score in the bucket \([0.9, 1]\) is maximized.
Looking at the data, \(w=(0,1,0)\) is a good choice. This gives us two \(F_i\)'s in our bucket, Let's see if we can formalize this with a model.
MIP Model
Counting is done with binary variables. So let's define\[\delta_i = \begin{cases} 1 & \text{if $L\le F_i\le U$}\\ 0 & \text{otherwise}\end{cases}\] I used \(L\) and \(U\) to indicate the bucket.
A first model can look like:\[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align} \max & \sum_i \delta_i \\ & F_i = \sum_j w_j a_{i,j}\\& \sum_i w_i = 1\\ & L - M(1-\delta_i) \le F_i \le U+M(1-\delta_i)\\ & \delta_i \in \{0,1\} \\ & w_i \in [0,1]\end{align}}\] Here \(M\) is a large enough constant.
The sandwich equation models the implication \[\delta_i=1 \Rightarrow L\le F_i \le U\] The objective will make sure that \(L\le F_i \le U \Rightarrow \delta_i=1 \) holds for the optimal solution.
This is the big picture. This model works: it finds an optimal set of weights so that as many as possible \(F_i \in [L,U]\). In the rest of this post I dive into to some nitty-gritty details. No elegant math, just largely boring stuff. However, this is typically what we need to do when working on non-trivial MIP models.
Big-M
The data seems to suggest \(0 \le a_{i,j} \le 1\). Which means \(0 \le F_i \le 1\). This follows from \[\min_j a_{i,j} \le F_i \le \min_j a_{i,j}\] We can also assume \(0 \le L \le U \le 1\). We can conclude the largest difference possible between \(F_i\) and \(L\) (and \(F_i\) and \(U\)) is one. So, in this case an obvious value for \(M\) is \(M=1\). More generally, if we first make sure \(U \le \max\{a_{i,j}\}\) and \(L \ge \min\{a_{i,j}\}\) by the preprocessing step: \[\begin{align} & U := \min\{U, \max\{a_{i,j}\}\}\\ & L := \max\{L, \min\{a_{i,j}\}\}\end{align}\] we have: \[M = \max \{a_{i,j}\} - \min \{a_{i,j}\}\] We can do even better by using an \(M_i\) for each record instead of a single, global \(M\). We will get back to this later.
More preprocessing
We can optimize things further by observing that not always both "branches" of \[L - M(1-\delta_i) \le F_i \le U+M(1-\delta_i)\] are needed. With our small example we have \(U=1\), but we know already that \(F_i \le 1\). So in this case we only need to worry about \(L - M(1-\delta_i) \le F_i\).
We can generalize this as follows. First calculate bounds \(\ell_i \le F_i\le u_i\): \[\begin{align} & \ell_i = \min_j a_{i,j}\\ & u_i = \max_j a_{i,j}\end{align}\] Then generate constraints: \[\begin{align} & L - M(1-\delta_i) \le F_i && \forall i | L > \ell_i\\ & F_i \le U+M(1-\delta_i) && \forall i | U < u_i\end{align}\]
Even more preprocessing
The first three records in the example data set are really no candidates for having \(F_i \in [0.9, 1]\). The reason is that for those records we have \(u_i \lt L\). In general we can skip all records with \(u_i \lt L\) or \(\ell_i \gt U\). These records will never have \(\delta_i=1\).
Combining things
I extended the \(a\) matrix with following columns:
- lo: \(\ell_i=\min_j a_{i,j}\).
- up: \(u_i = \max_j a_{i,j}\).
- cand: boolean, indicates if this row is a candidate. We check: \(u_i \ge L\) and \(\ell_i \le U\).
- chkL: boolean, indicates if we need to check the left "branch". This is equal to one if \(\ell_i\lt L\).
- chkR: boolean, indicates if we need to check the right "branch". This is equal to one if \(u_i\gt L\). No records have this equal to 1.
---- 41 PARAMETER a j1 j2 j3 lo up cand chkL i1 0.870 0.730 0.410 0.410 0.870 1.000 i2 0.820 0.730 0.850 0.730 0.850 1.000 i3 0.820 0.370 0.850 0.370 0.850 1.000 i4 0.580 0.950 0.420 0.420 0.950 1.000 1.000 i5 1.000 1.000 0.900 0.900 1.000 1.000
The last record is interesting. It has \(\mathit{chkL}=\mathit{chkR}=0\). This is correct: it will always allow \(\delta_{i5}=1\) no matter what weights we choose. Note that the constraints \(L - M(1-\delta_i) \le F_i\) are only generated in case both \(\mathit{cand}=\mathit{chkL}=1\).
For a small data set this all does not make much difference, but for large ones, we make the model much smaller.
Results
The optimal weights for this small data set are not unique. Obviously I get the same optimal number of selected records:
---- 71 VARIABLE w.L weights j1 0.135, j2 0.865 ---- 71 VARIABLE f.L final scores i1 0.749, i2 0.742, i3 0.431, i4 0.900, i5 1.000 ---- 71 VARIABLE delta.L selected i4 1.000, i5 1.000 ---- 71 VARIABLE z.L = 2.000 objective
Big-M revisited
---- 46 PARAMETER M = 0.630 big-M
We can use a tailored \(M^L_i, M^U_i\) for each inequality. For the lower bounds our equation looks like \[L - M^L_i(1-\delta_i) \le F_i\] This means we have \(M^L_i \ge L - F_i\). We have bounds on \(F_i\), (we stored these in \(a_{i,up}\) and \(a_{i,lo}\)). So we can set \(M^L_i =L - a_{i,lo}\). This gives:
---- 78 PARAMETER ML i4 0.480
Similar for the U inequality. For our small example this is not so important, but for larger instances with a wider range of data this may be essential.
Larger problem
For a larger random problem:
---- 43 PARAMETER a data j1 j2 j3 lo up cand chkL chkU i1 0.737 0.866 0.893 0.737 0.893 1.000 i2 0.434 0.654 0.606 0.434 0.654 1.000 i3 0.721 0.987 0.648 0.648 0.987 1.000 1.000 i4 0.800 0.618 0.869 0.618 0.869 1.000 i5 0.668 0.703 1.177 0.668 1.177 1.000 1.000 1.000 i6 0.656 0.540 0.525 0.525 0.656 1.000 i7 0.864 1.037 0.569 0.569 1.037 1.000 1.000 1.000 i8 0.942 1.003 0.655 0.655 1.003 1.000 1.000 1.000 i9 0.600 0.801 0.601 0.600 0.801 1.000 i10 0.619 0.933 1.114 0.619 1.114 1.000 1.000 1.000 i11 1.243 0.675 0.756 0.675 1.243 1.000 1.000 1.000 i12 0.608 0.778 0.747 0.608 0.778 1.000 i13 0.695 0.593 1.196 0.593 1.196 1.000 1.000 1.000 i14 0.965 1.001 0.650 0.650 1.001 1.000 1.000 1.000 i15 0.951 1.040 0.969 0.951 1.040 1.000 1.000 i16 0.514 1.220 0.935 0.514 1.220 1.000 1.000 1.000 i17 0.834 1.130 1.054 0.834 1.130 1.000 1.000 1.000 i18 1.161 0.550 0.481 0.481 1.161 1.000 1.000 1.000 i19 0.638 0.640 0.691 0.638 0.691 1.000 i20 1.057 0.724 0.657 0.657 1.057 1.000 1.000 1.000 i21 0.814 0.775 0.448 0.448 0.814 1.000 i22 0.800 0.737 0.741 0.737 0.800 1.000 i23 0.573 0.555 0.789 0.555 0.789 1.000 i24 0.829 0.679 1.059 0.679 1.059 1.000 1.000 1.000 i25 0.761 0.956 0.687 0.687 0.956 1.000 1.000 i26 0.870 0.673 0.822 0.673 0.870 1.000 i27 0.304 0.900 0.920 0.304 0.920 1.000 1.000 i28 0.544 0.659 0.495 0.495 0.659 1.000 i29 0.755 0.589 0.520 0.520 0.755 1.000 i30 0.492 0.617 0.681 0.492 0.681 1.000 i31 0.657 0.767 0.634 0.634 0.767 1.000 i32 0.752 0.684 0.596 0.596 0.752 1.000 i33 0.616 0.846 0.803 0.616 0.846 1.000 i34 0.677 1.098 1.132 0.677 1.132 1.000 1.000 1.000 i35 0.454 1.037 1.204 0.454 1.204 1.000 1.000 1.000 i36 0.815 0.605 0.890 0.605 0.890 1.000 i37 0.814 0.587 0.939 0.587 0.939 1.000 1.000 i38 1.019 0.675 0.613 0.613 1.019 1.000 1.000 1.000 i39 0.547 0.946 0.843 0.547 0.946 1.000 1.000 i40 0.724 0.571 0.757 0.571 0.757 1.000 i41 0.611 0.916 0.891 0.611 0.916 1.000 1.000 i42 0.680 0.624 1.111 0.624 1.111 1.000 1.000 1.000 i43 1.015 0.870 0.823 0.823 1.015 1.000 1.000 1.000 i44 0.587 0.866 0.691 0.587 0.866 1.000 i45 0.789 1.090 0.649 0.649 1.090 1.000 1.000 1.000 i46 1.436 0.747 0.805 0.747 1.436 1.000 1.000 1.000 i47 0.791 0.885 0.723 0.723 0.885 1.000 i48 0.718 1.028 0.869 0.718 1.028 1.000 1.000 1.000 i49 0.876 0.772 0.918 0.772 0.918 1.000 1.000 i50 0.498 0.882 0.599 0.498 0.882 1.000
we see that we can remove a significant number of records and big-M constraints. Again we want to maximize the number of final scores \(F_i \in [0.9, 1]\). Note that now we have some records with \(a_{i,j}\gt 1\). So the column \(\mathit{chkU}\) gets properly populated.
The model solves instantaneous and shows:
---- 74 VARIABLE w.L weights j1 0.007, j2 0.792, j3 0.202 ---- 74 VARIABLE f.L final scores i1 0.870, i2 0.643, i3 0.917, i4 0.670, i5 0.798, i6 0.538, i7 0.942, i8 0.933 i9 0.760, i10 0.967, i11 0.695, i12 0.771, i13 0.715, i14 0.930, i15 1.025, i16 1.158 i17 1.112, i18 0.541, i19 0.650, i20 0.713, i21 0.710, i22 0.738, i23 0.602, i24 0.757 i25 0.900, i26 0.705, i27 0.900, i28 0.625, i29 0.576, i30 0.629, i31 0.739, i32 0.667 i33 0.836, i34 1.102, i35 1.067, i36 0.664, i37 0.660, i38 0.665, i39 0.923, i40 0.609 i41 0.909, i42 0.722, i43 0.861, i44 0.829, i45 0.999, i46 0.764, i47 0.851, i48 0.994 i49 0.802, i50 0.822 ---- 74 VARIABLE delta.L selected i3 1.000, i7 1.000, i8 1.000, i10 1.000, i14 1.000, i15 1.000, i16 1.000, i17 1.000 i25 1.000, i27 1.000, i34 1.000, i35 1.000, i39 1.000, i41 1.000, i45 1.000, i48 1.000 ---- 74 VARIABLE z.L = 16.000 objective
Conclusion
A simple model becomes not that simple once we start "optimizing" it. Unfortunately this is typical for large MIP models.
References
- Simulation/Optimization Package in R for tuning weights to achieve maximum allocation for groups, https://stackoverflow.com/questions/50843023/simulation-optimization-package-in-r-for-tuning-weights-to-achieve-maximum-alloc
No comments:
Post a Comment