Sunday, November 19, 2023

Grouping items: a difficult combinatorial problem

In [1], a simple problem is described:

  • We have \(n\) items (or orders) with a certain width. 
  • We need to combine these items in groups (called patterns) with rather tight limits on the total width. The total length of a pattern (the sum of the lengths of the items assigned to this pattern) must be between 335 and 340.
  • As a result, we may not be able to assign all items. The remaining items cannot be formed into valid patterns.
  • The objective is to try to place as many items as possible into patterns.
  • An indication of the size of the problem: \(n \approx 500\).  

Data


Instead of immediately working on a full-known \(n=500\) problem, I generated a random data set with a very manageable \(n=50\) items. The widths were drawn from a discrete uniform distribution between 30 and 300. The data looks like:


----     15 PARAMETER w  item widths

order1   76.000,    order2  258.000,    order3  179.000,    order4  111.000,    order5  109.000,    order6   90.000
order7  124.000,    order8  262.000,    order9   48.000,    order10 165.000,    order11 300.000,    order12 186.000
order13 298.000,    order14 236.000,    order15  65.000,    order16 203.000,    order17  73.000,    order18  97.000
order19 211.000,    order20 147.000,    order21 127.000,    order22 125.000,    order23  65.000,    order24  70.000
order25 189.000,    order26 255.000,    order27  92.000,    order28 210.000,    order29 240.000,    order30 112.000
order31  59.000,    order32 166.000,    order33  73.000,    order34 266.000,    order35 101.000,    order36 107.000
order37 190.000,    order38 225.000,    order39 200.000,    order40 155.000,    order41 142.000,    order42  61.000
order43 115.000,    order44  42.000,    order45 121.000,    order46  79.000,    order47 204.000,    order48 181.000
order49 238.000,    order50 110.000


I stick to the pattern limits \(\color{darkblue}L=335\) and \(\color{darkblue}U=340\).

We need some estimate of the number of patterns to use. We could just guess. But a better approach is the following. An upper bound for the number patterns can be established quite easily: \[{\mathit{maxj}} = \left\lfloor \frac{\sum_i \color{darkblue}w_i}{\color{darkblue}L}\right\rfloor\] For our data set this number is:

----     29 PARAMETER maxj                 =       22.000  max number of patterns we can fill


This means we can safely use this number as the number of bins (patterns). 

MIP Formulations


In [1], the following MIP model is proposed:

MIP Model M1: original formulation
\[\begin{align}\max&\sum_{i,j}  \color{darkred}x_{i,j} \\ &  \sum_j \color{darkred}x_{i,j} \le 1 &&\forall i  \\ & \sum_i \color{darkblue}w_i \cdot  \color{darkred}x_{i,j} \ge  \color{darkblue}L \cdot \color{darkred}p_j && \forall j\\ & \sum_i \color{darkblue}w_i \cdot  \color{darkred}x_{i,j} \le  \color{darkblue}U \cdot \color{darkred}p_j && \forall j\\  & \color{darkred}x_{i,j} \in \{0,1\}\\ & \color{darkred}p_j \in \{0,1\} \end{align}\]

Here \(\color{darkred}x_{i,j}\in \{0,1\}\) indicates if order \(i\) is assigned to pattern \(j\). The binary variable \(\color{darkred}p_j \in \{0,1\}\) tells us if pattern \(j\) is used or left empty. This model is correct, but I don't really like the duplication of expressions in the last two constraints. 

In general, I would prefer a sparse version of this model: 


MIP Model M2: sparse formulation
\[\begin{align}\max&\sum_{i,j}  \color{darkred}x_{i,j} \\ &  \sum_j \color{darkred}x_{i,j} \le 1  &&\forall i \\ & \color{darkred}{pw}_j = \sum_i \color{darkblue}w_i \cdot  \color{darkred}x_{i,j} &&\forall j \\ & \color{darkred}{pw}_j  \ge  \color{darkblue}L \cdot \color{darkred}p_j&&\forall j\\ &\color{darkred}{pw}_j  \le  \color{darkblue}U \cdot \color{darkred}p_j&&\forall j\\  & \color{darkred}x_{i,j} \in \{0,1\}\\ & \color{darkred}p_j \in \{0,1\} \\ & \color{darkred}{pw}_j  \in [0,\color{darkblue}U]\end{align}\]

This formulation will increase the number of constraints, and it adds a number of continuous variables. But in exchange, we have fewer non-zero elements in the matrix. I would hope that, as the problem is sparser, we can execute more Simplex iterations per unit of time and thus evaluate more branch & bound nodes (per unit of time).

Another formulation is using semi-continuous variables. This directly tells the solver that a variable can assume values: zero or between a lower- and upper-bound. This would eliminate the need for the binary variables \(\color{darkred}p_j\) in the previous model.


MIP Model M3: semi-continuous formulation
\[\begin{align}\max&\sum_{i,j}  \color{darkred}x_{i,j} \\ &  \sum_j \color{darkred}x_{i,j} \le 1 &&\forall i  \\ & \color{darkred}{psc}_j = \sum_i \color{darkblue}w_i \cdot  \color{darkred}x_{i,j}  && \forall j\\  & \color{darkred}x_{i,j} \in \{0,1\}\\ & \color{darkred}{psc}_j \in \{0\} \cup [\color{darkblue}L,\color{darkblue}U] \end{align}\]

Further variations are:
  • Adding a symmetry-breaking ordering constraint: \[\color{darkred}{psc}_j  \ge \color{darkred}{psc}_{j+1}\] or depending on the formulation \[\color{darkred}{pw}_j  \ge \color{darkred}{pw}_{j+1}\]This will reduce the size of the feasible region, but it also makes it more difficult to find integer feasible solutions. Sometimes, such a constraint is very beneficial (but not always). For our model, this seems to cause way more harm than good.
  • Add some branching priorities so that we branch on large items first. This is approximately how we would approach things when doing it by hand: first place the big items and then worry about the smaller ones. Usually, I am not able to beat the solver by setting my own branching order, but in this case, it seems to help.

When looking at some results, using a time limit of 1000 seconds, we see:

----    156 PARAMETER report  performance statistics

              Original      Sparse    Sp+Prior    SC+Prior    Sp+Order

Variables         1123        1145        1145        1123        1145
 discrete         1122        1122        1122        1122        1122
Equations           95         117         117          73         138
Nonzeros          4445        3411        3411        3323        3453
Status         IntFeas     IntFeas     Optimal     IntFeas     IntFeas
Objective           44          44          44          44          35
Best bound          45          48          44          46          48
Time              1000        1003          70        1007        1004
Nodes          5534952     2711568      565030     3909796      775504
Iterations   214185798   232199570    14068906   195942501    46981242


This table indicates that the Sparse+Priorities formulation is a good candidate to work with. All methods, expect one, find the optimal solution, but only one is able to prove global optimality within 1,000 seconds.

Looking at the solution for the 'Sp+Prior' column, we see that 44 of the 50 items could be assigned. The number of patterns used in that solution was 18 (our bound for this number was 22).

A little bit of a haphazard picture:

The y-axis represents patterns and unassigned orders. The x-axis shows the widths. The first 6 lines are unassigned orders. Below are the patterns that fit. The picture illustrates how narrow the band is for the total lengths of patterns. Not much wiggle room there.


The big enchilada


Now let's circle back to the original question. We want to solve this with \(n=500\) orders. Let's be brave and form a data set:

----     16 PARAMETER w  item widths

order1    76.000,    order2   258.000,    order3   179.000,    order4   111.000,    order5   109.000,    order6    90.000
order7   124.000,    order8   262.000,    order9    48.000,    order10  165.000,    order11  300.000,    order12  186.000
order13  298.000,    order14  236.000,    order15   65.000,    order16  203.000,    order17   73.000,    order18   97.000
order19  211.000,    order20  147.000,    order21  127.000,    order22  125.000,    order23   65.000,    order24   70.000
order25  189.000,    order26  255.000,    order27   92.000,    order28  210.000,    order29  240.000,    order30  112.000
order31   59.000,    order32  166.000,    order33   73.000,    order34  266.000,    order35  101.000,    order36  107.000
order37  190.000,    order38  225.000,    order39  200.000,    order40  155.000,    order41  142.000,    order42   61.000
order43  115.000,    order44   42.000,    order45  121.000,    order46   79.000,    order47  204.000,    order48  181.000
order49  238.000,    order50  110.000,    order51  209.000,    order52  234.000,    order53  200.000,    order54  106.000
order55   53.000,    order56   57.000,    order57  203.000,    order58  177.000,    order59   38.000,    order60  244.000
order61   49.000,    order62   77.000,    order63  172.000,    order64  233.000,    order65   78.000,    order66   39.000
order67  188.000,    order68  198.000,    order69  135.000,    order70  127.000,    order71   95.000,    order72   96.000
order73   65.000,    order74  282.000,    order75  132.000,    order76  242.000,    order77  111.000,    order78   64.000
order79  232.000,    order80   48.000,    order81   84.000,    order82   31.000,    order83  103.000,    order84  165.000
order85   70.000,    order86   77.000,    order87  119.000,    order88  115.000,    order89  117.000,    order90  291.000
order91  299.000,    order92  130.000,    order93  131.000,    order94  239.000,    order95  137.000,    order96  277.000
order97   62.000,    order98  229.000,    order99   45.000,    order100 186.000,    order101  43.000,    order102  31.000
order103 138.000,    order104 170.000,    order105 200.000,    order106  91.000,    order107 137.000,    order108 104.000
order109  71.000,    order110 283.000,    order111 144.000,    order112  66.000,    order113 134.000,    order114 131.000
order115 102.000,    order116 287.000,    order117  81.000,    order118 110.000,    order119  50.000,    order120 138.000
order121  57.000,    order122 134.000,    order123 117.000,    order124  82.000,    order125  60.000,    order126 191.000
order127 168.000,    order128  42.000,    order129 242.000,    order130 286.000,    order131 191.000,    order132 194.000
order133 128.000,    order134 190.000,    order135 214.000,    order136 167.000,    order137  73.000,    order138 208.000
order139 171.000,    order140  63.000,    order141 297.000,    order142  91.000,    order143 213.000,    order144 240.000
order145 282.000,    order146  84.000,    order147 110.000,    order148  83.000,    order149  96.000,    order150 205.000
order151 229.000,    order152  53.000,    order153  70.000,    order154 147.000,    order155  80.000,    order156 217.000
order157 236.000,    order158  71.000,    order159 135.000,    order160 218.000,    order161 259.000,    order162 196.000
order163 294.000,    order164  37.000,    order165  80.000,    order166  53.000,    order167 176.000,    order168  64.000
order169 228.000,    order170  60.000,    order171 162.000,    order172 245.000,    order173 163.000,    order174 174.000
order175  32.000,    order176 177.000,    order177 152.000,    order178 294.000,    order179  79.000,    order180  74.000
order181  36.000,    order182  78.000,    order183  46.000,    order184  34.000,    order185 256.000,    order186 193.000
order187  37.000,    order188  83.000,    order189 287.000,    order190 120.000,    order191 191.000,    order192 100.000
order193 203.000,    order194  72.000,    order195 154.000,    order196 136.000,    order197 248.000,    order198 176.000
order199 135.000,    order200 181.000,    order201 282.000,    order202 124.000,    order203  32.000,    order204 287.000
order205 184.000,    order206 120.000,    order207 296.000,    order208 237.000,    order209  59.000,    order210 299.000
order211 187.000,    order212  75.000,    order213 204.000,    order214 123.000,    order215 277.000,    order216 273.000
order217  34.000,    order218 129.000,    order219 210.000,    order220 190.000,    order221  39.000,    order222 258.000
order223 282.000,    order224 167.000,    order225 111.000,    order226 164.000,    order227  42.000,    order228 239.000
order229 174.000,    order230 232.000,    order231 225.000,    order232 201.000,    order233  61.000,    order234 293.000
order235 221.000,    order236 297.000,    order237 261.000,    order238 198.000,    order239 220.000,    order240 219.000
order241 244.000,    order242 195.000,    order243  44.000,    order244 161.000,    order245  44.000,    order246 219.000
order247  82.000,    order248  91.000,    order249 250.000,    order250 298.000,    order251 233.000,    order252 224.000
order253  30.000,    order254 101.000,    order255 253.000,    order256 252.000,    order257 263.000,    order258  87.000
order259 153.000,    order260  40.000,    order261 117.000,    order262 149.000,    order263 115.000,    order264  66.000
order265 249.000,    order266 142.000,    order267  68.000,    order268 156.000,    order269 106.000,    order270 272.000
order271  47.000,    order272 142.000,    order273 122.000,    order274 156.000,    order275 204.000,    order276 204.000
order277 121.000,    order278  57.000,    order279 275.000,    order280  88.000,    order281 279.000,    order282 152.000
order283  54.000,    order284 131.000,    order285 142.000,    order286 139.000,    order287  60.000,    order288 233.000
order289 247.000,    order290  36.000,    order291 160.000,    order292 105.000,    order293 274.000,    order294  34.000
order295 214.000,    order296 287.000,    order297 273.000,    order298 273.000,    order299 266.000,    order300 135.000
order301 166.000,    order302 255.000,    order303 193.000,    order304  52.000,    order305 186.000,    order306 190.000
order307 215.000,    order308  73.000,    order309 119.000,    order310 115.000,    order311 170.000,    order312 128.000
order313  75.000,    order314 215.000,    order315 166.000,    order316 186.000,    order317 225.000,    order318 215.000
order319  35.000,    order320 257.000,    order321 222.000,    order322  72.000,    order323 195.000,    order324 209.000
order325  82.000,    order326 128.000,    order327 199.000,    order328 228.000,    order329 142.000,    order330  72.000
order331  33.000,    order332  32.000,    order333 288.000,    order334 294.000,    order335 291.000,    order336 262.000
order337  68.000,    order338  43.000,    order339 179.000,    order340  79.000,    order341 299.000,    order342 249.000
order343 112.000,    order344  53.000,    order345 146.000,    order346 124.000,    order347  61.000,    order348 188.000
order349 150.000,    order350 141.000,    order351 277.000,    order352  87.000,    order353  90.000,    order354 176.000
order355 201.000,    order356 118.000,    order357  70.000,    order358 281.000,    order359  98.000,    order360  46.000
order361 114.000,    order362  40.000,    order363 252.000,    order364  92.000,    order365 141.000,    order366 111.000
order367 150.000,    order368 224.000,    order369 190.000,    order370  65.000,    order371  73.000,    order372 115.000
order373 185.000,    order374 102.000,    order375  39.000,    order376 216.000,    order377 212.000,    order378 120.000
order379 235.000,    order380  77.000,    order381 214.000,    order382 212.000,    order383 255.000,    order384 169.000
order385 106.000,    order386 180.000,    order387 142.000,    order388  49.000,    order389 248.000,    order390 120.000
order391  52.000,    order392 185.000,    order393  35.000,    order394 231.000,    order395 275.000,    order396 181.000
order397 158.000,    order398 224.000,    order399 169.000,    order400 270.000,    order401 239.000,    order402  67.000
order403 101.000,    order404 214.000,    order405 151.000,    order406 291.000,    order407 289.000,    order408 273.000
order409 118.000,    order410 153.000,    order411 191.000,    order412 268.000,    order413  76.000,    order414 201.000
order415 239.000,    order416 184.000,    order417  37.000,    order418 249.000,    order419 105.000,    order420 147.000
order421 121.000,    order422 189.000,    order423 185.000,    order424 177.000,    order425 186.000,    order426 294.000
order427 117.000,    order428 236.000,    order429 290.000,    order430 287.000,    order431  99.000,    order432 118.000
order433  88.000,    order434  77.000,    order435 228.000,    order436 103.000,    order437 235.000,    order438 197.000
order439 108.000,    order440 230.000,    order441  32.000,    order442 264.000,    order443  34.000,    order444 146.000
order445 127.000,    order446 221.000,    order447 142.000,    order448 178.000,    order449 123.000,    order450 219.000
order451 282.000,    order452 157.000,    order453  87.000,    order454 168.000,    order455 129.000,    order456 283.000
order457  48.000,    order458 166.000,    order459 136.000,    order460  85.000,    order461 173.000,    order462 189.000
order463 123.000,    order464  98.000,    order465 178.000,    order466 178.000,    order467  45.000,    order468 132.000
order469 293.000,    order470 132.000,    order471  72.000,    order472 157.000,    order473 137.000,    order474  85.000
order475 200.000,    order476  30.000,    order477 166.000,    order478  30.000,    order479 171.000,    order480 256.000
order481  49.000,    order482 236.000,    order483 108.000,    order484  96.000,    order485 148.000,    order486 129.000
order487 179.000,    order488  50.000,    order489 276.000,    order490  43.000,    order491 252.000,    order492 244.000
order493 208.000,    order494 136.000,    order495 141.000,    order496 264.000,    order497 294.000,    order498 185.000
order499 115.000,    order500 153.000


----     30 PARAMETER maxj                 =      233.000  max number of patterns we can fill


This leads to a very big model:

MODEL STATISTICS

BLOCKS OF EQUATIONS           5     SINGLE EQUATIONS        1,200
BLOCKS OF VARIABLES           4     SINGLE VARIABLES      116,967
NON ZERO ELEMENTS       350,666     DISCRETE VARIABLES    116,733


But, low and behold, we can solve this model! The model can assign all 500 orders. The number of patterns used is 231: two unused patterns from our estimate. The time to solve this was about an hour (actually my time limit was 3600 seconds).

Obligatory picture:

Another data set


Here is a strange result with another data set:


----    160 PARAMETER report  performance statistics

              Original      Sparse    Sp+Prior    SC+Prior    Sp+Order

Variables         2223        2245        2245        2223        2245
 discrete         2222        2222        2222        2222        2222
Equations          145         167         167         123         188
Nonzeros          8845        6711        6711        6623        6753
Status         Optimal     Optimal     Optimal     Optimal     IntFeas
Objective           99          99          98          99          98
Best bound          99          99          98          99          99
Time                 5          15          19          20        1006
Nodes            30279      110096       62538       67546     2198318
Iterations      394979     1127380      903107     1129896    15112872
Maxj                22          22          22          22          22
Patterns            22          22          22          22          22


We see that the original formulation is the fastest here. But more worrisome, the solver Cplex delivers wrong results for the Sparse + Branching Priorities run. Again we see how good the maxj estimate is.
 

Conclusion


This problem of assigning orders with a given width to patterns with a short range for allowed total width is a combinatorial difficult problem, related to bin-packing. But, we actually can find optimal solutions for the large \(n=500\) problem using a standard formulation (and about an hour of time).


References


  1. MILP optimizer in Python Pyomo/PuLP not finding a feasible solution with open-source solvers, https://stackoverflow.com/questions/77492342/milp-optimizer-in-python-pyomo-pulp-not-finding-a-feasible-solution-with-open-so

4 comments:

  1. This is a really interesting problem, especially since it seems so simple at a first glance. I focused on seeing how HIGHS would solve this problem, and its not very good. I only used a problem size of 50 and after 1000 seconds the MIP gap is still around 17%. Sadly, HIGHS doesnt support branching priorities so I wasnt able to test how that would impact the solution time.
    If I find the time I may approach building some clique constraints and see how that helps (or hurts) performance.

    ReplyDelete
    Replies
    1. Adding a constraint to prevent pairwise grouping helps somewhat. Going beyond pairwise slows the solver down dramatically.

      Delete
  2. I tried solving this model using HIGHs and the results werent so great. HIGHs doesnt support using branch priorities, and only using the sparse formulation results in a gap of about 16% after 1000 seconds for the case of n=50.
    When I find the time id like to see how adding clique inequalities helps (or hurts) performance.

    ReplyDelete
  3. I was able to improve the solution for n=50 with HIGHs. Since they dont have priority branching implemented, I changed the objective to sum(w_i * x_ij) with the hope that these weights would act as pseudo branch priorities to return a good heuristic solution. This method returned the best feasible solution I could find. Very difficult problem to solve if branch priority isnt available!

    ReplyDelete