U-Haul Moving Van [link] |
This is a strange little problem [1]. We have individuals (of different categories) and regions with a given capacity. The initial set of assigned persons fits within the capacity of each region. Now a new batch of individuals arrives with their chosen zones. Unfortunately, the total demand (old + new individuals) does not fit anymore in all cases. This means we need to reallocate people. We can do this at a price (the costs are given). What is the optimal (least cost) new allocation?
Data
The data we need to consider is:
---- 35 PARAMETER popdata population data initial added final catA.zone1 115 23 138 catA.zone2 121 24 145 catA.zone3 112 22 134 catA.zone4 76 15 91 catB.zone1 70 29 99 catB.zone2 59 24 83 catB.zone3 86 35 121 catB.zone4 139 57 196 catC.zone1 142 18 160 catC.zone2 72 9 81 catC.zone3 29 4 33 catC.zone4 58 8 66 catD.zone1 22 25 47 catD.zone2 23 26 49 catD.zone3 16 18 34 catD.zone4 45 51 96 ---- 44 PARAMETER capacity total capacity per zone zone1 465, zone2 393, zone3 500, zone4 331 ---- 66 PARAMETER cost cost moving to target zone zone1 zone2 zone3 zone4 catA 0.1 0.1 0.1 1.3 catB 16.2 38.1 1.5 0.1 catC 0.1 12.7 97.7 46.3 catD 25.3 7.7 67.3 0.1 ---- 80 PARAMETER counts summary data zone1 zone2 zone3 zone4 total initial 349 275 243 318 1185 final 444 358 322 449 1573 capacity 465 393 500 331 1689
Model
LP model |
---|
\[\begin{align}\min&\sum_{c,z,z'} \color{darkblue}{\mathit{cost}}_{c,z'}\cdot\color{darkred}{\mathit{move}}_{c,z,z'} \\ & \color{darkred}{\mathit{alloc}}_{c,z} = \color{darkblue}{\mathit{demand}}_{c,z} + \sum_{z'}\color{darkred}{\mathit{move}}_{c,z',z} - \sum_{z'}\color{darkred}{\mathit{move}}_{c,z,z'} && \forall c,z \\ & \sum_c \color{darkred}{\mathit{alloc}}_{c,z} \le \color{darkblue}{\mathit{capacity}}_{z} && \forall z \\ & \color{darkred}{\mathit{move}}_{c,z,z}= 0 && \forall c,z \\ & \color{darkred}{\mathit{move}}_{c,z',z} \ge 0 \\ &\color{darkred}{\mathit{alloc}}_{c,z} \ge 0 \end{align}\] |
Notes:
- The first constraint says: \[\text{allocation$=$demand $+$ inflow $-$ outflow}\] This is just proper bookkeeping.
- The model seems to deliver integer solutions, which makes it easier to interpret results. Not sure if this is guaranteed.
- We cannot substitute out the variable \(\color{darkred}{\mathit{alloc}}_{c,z}\) as it is non-negative. I'll show below what happens when we do that anyway.
- We can drop the constraint \(\color{darkred}{\mathit{move}}_{c,z,z}= 0\). This will automatically hold in the optimal solution as we are minimizing cost. If we choose to keep it, we can use fixing (setting the upper bound to zero) instead of writing a full-blown equation. For large models, it may be a good strategy to drop \(\color{darkred}{\mathit{move}}_{c,z,z} \) from the model altogether.
In any case, a good presolver will remove \(\color{darkred}{\mathit{move}}_{c,z,z} \) from the model. These variables cancel out in the first constraint, so they only appear in the objective. - Here we assumed that all people can be moved. Later on, we see how things change when we only allow the new, additional individuals are allowed to be reassigned.
---- 122 VARIABLE move.L moves needed to meet capacity zone1 zone2 zone3 catA.zone1 6 catA.zone4 29 62 catC.zone4 27 ---- 122 VARIABLE alloc.L new allocation zone1 zone2 zone3 zone4 catA 132 180 196 catB 99 83 121 196 catC 187 81 33 39 catD 47 49 34 96 ---- 122 VARIABLE totcost.L = 12.400 total cost ---- 126 PARAMETER counts summary data zone1 zone2 zone3 zone4 total initial 349 275 243 318 1185 final 444 358 322 449 1573 capacity 465 393 500 331 1689 newalloc 465 393 384 331 1573
This is an interesting solution. Obviously, we need to move people away from zone 4. The model first moves a few persons from zone 1 to zone 2 to make room in zone 1. I.e., we actually move two people here to move 1 person away from zone 4. Of course, this highly depends on the cost structure.
Alternative 1: substitute out the variable \(\color{darkred}{\mathit{alloc}}_{c,z}\)
This makes the model smaller. So what is not to like. Here are the results:
---- 144 VARIABLE move.L moves needed to meet capacity zone2 zone3 catA.zone4 35 83 ---- 144 VARIABLE totcost.L = 11.800 total cost ---- 144 PARAMETER alloc2 recalculate allocations zone1 zone2 zone3 zone4 catA 138.000 180.000 217.000 -27.000 catB 99.000 83.000 121.000 196.000 catC 160.000 81.000 33.000 66.000 catD 47.000 49.000 34.000 96.000
The objective improves (from 12.4 to 11.8)! Without further analysis, it is not immediately clear why this happens. So I added a post solution calculation of the new allocations: \[\color{darkblue}{\mathit{alloc2}}_{c,z} := \color{darkblue}{\mathit{demand}}_{c,z} + \sum_{z'}\color{darkred}{\mathit{move}}^*_{c,z',z} - \sum_{z'}\color{darkred}{\mathit{move}}^*_{c,z,z'}\] This shows we have a negative allocation for category 4 and zone 4. The reason is of course that with substituting out \(\color{darkred}{\mathit{alloc}}_{c,z}\) we also removed the non-negativity restriction \(\color{darkred}{\mathit{alloc}}_{c,z} \ge 0\). As a result we get an illegal solution.
Conclusion: we cannot do this.
Alternative 2: only move new people
--- 158 VARIABLE move.L moves needed to meet capacity zone1 zone2 zone3 catA.zone2 3 catA.zone4 13 2 catB.zone4 57 catC.zone4 8 catD.zone4 38 ---- 158 VARIABLE alloc.L new allocation zone1 zone2 zone3 zone4 catA 36 21 27 catB 29 24 92 catC 26 9 4 catD 25 64 18 13 ---- 158 VARIABLE totcost.L = 380.700 total cost ---- 162 PARAMETER counts summary data zone1 zone2 zone3 zone4 total initial 349 275 243 318 1185 final 444 358 322 449 1573 capacity 465 393 500 331 1689 newalloc 465 393 384 331 1573
Indeed the cost of moving just from the pool of added persons is much higher.
Network formulation
In the comments, Rob Pratt uses a network interpretation to argue that the model should indeed deliver integer solutions. A second comment further helped in simplifying the network model.
This is a good example to show how to set up a network LP in GAMS. The idea is to form a network as follows:
Network representation |
The dashed arrows indicate fixed, exogenous in- or outflow. The inflow into the first layer is demand. The second layer contains the zones with the arcs indicating the new allocation after moving people. The final layer collects the total number of persons. The main use for that last layer is to provide capacitated arcs and to have a single outflow. The costs are placed on the arcs between the first and second layer.
To implement this in GAMS, I first create a 2-dimensional set with nodes. The dimensions are (category, zone). I use a 'all' when there is no category, and 'sink' for the aggregated zones. So the sink is identified by ('all','sink'). This means, our arcs have 4 dimensions. These two sets can look like:
---- 98 SET n nodes zone1 zone2 zone3 zone4 sink catA YES YES YES YES catB YES YES YES YES catC YES YES YES YES catD YES YES YES YES all YES YES YES YES YES ---- 98 SET a arcs all.zone1 all.zone2 all.zone3 all.zone4 all.sink catA.zone1 YES YES YES YES catA.zone2 YES YES YES YES catA.zone3 YES YES YES YES catA.zone4 YES YES YES YES catB.zone1 YES YES YES YES catB.zone2 YES YES YES YES catB.zone3 YES YES YES YES catB.zone4 YES YES YES YES catC.zone1 YES YES YES YES catC.zone2 YES YES YES YES catC.zone3 YES YES YES YES catC.zone4 YES YES YES YES catD.zone1 YES YES YES YES catD.zone2 YES YES YES YES catD.zone3 YES YES YES YES catD.zone4 YES YES YES YES all .zone1 YES all .zone2 YES all .zone3 YES all .zone4 YES
We also have some capacities, costs, and exogenous inflows. They are:
---- 112 PARAMETER acost cost of an arc all.zone1 all.zone2 all.zone3 all.zone4 catA.zone1 0.1 0.1 1.3 catA.zone2 0.1 0.1 1.3 catA.zone3 0.1 0.1 1.3 catA.zone4 0.1 0.1 0.1 catB.zone1 38.1 1.5 0.1 catB.zone2 16.2 1.5 0.1 catB.zone3 16.2 38.1 0.1 catB.zone4 16.2 38.1 1.5 catC.zone1 12.7 97.7 46.3 catC.zone2 0.1 97.7 46.3 catC.zone3 0.1 12.7 46.3 catC.zone4 0.1 12.7 97.7 catD.zone1 7.7 67.3 0.1 catD.zone2 25.3 67.3 0.1 catD.zone3 25.3 7.7 0.1 catD.zone4 25.3 7.7 67.3 ---- 112 PARAMETER inflow exogenous in- or outflow zone1 zone2 zone3 zone4 sink catA 138 145 134 91 catB 99 83 121 196 catC 160 81 33 66 catD 47 49 34 96 all -1573 ---- 112 PARAMETER cap capacity of an arc all.sink all.zone1 465 all.zone2 393 all.zone3 500 all.zone4 331
With this we can form a standard min-cost-flow LP model:
Min-Cost-Flow Network LP Model |
---|
\[\begin{align}\min&\sum_{a} \color{darkblue}c_{a} \cdot \color{darkred}f_{a} \\ & \sum_{a(j,i)} \color{darkred}f_{j,i} + \color{darkblue}{\mathit{inflow}}_i = \sum_{a(i,j)} \color{darkred}f_{i,j} && \forall i \\ & \color{darkred}f_a \in [0,\color{darkblue}{\mathit{cap}}_a] \end{align}\] |
---- 141 VARIABLE flow.L flows along arcs all.zone1 all.zone2 all.zone3 all.zone4 all.sink catA.zone1 132 6 catA.zone2 145 catA.zone3 134 catA.zone4 29 62 catB.zone1 99 catB.zone2 83 catB.zone3 121 catB.zone4 196 catC.zone1 160 catC.zone2 81 catC.zone3 33 catC.zone4 27 39 catD.zone1 47 catD.zone2 49 catD.zone3 34 catD.zone4 96 all .zone1 465 all .zone2 393 all .zone3 384 all .zone4 331 ---- 141 VARIABLE totcost.L = 12.400 total cost ---- 146 PARAMETER moves computed from flows zone1 zone2 zone3 catA.zone1 6 catA.zone4 29 62 catC.zone4 27
Notes:
- Although we formulate the problem as a network, I still solve it as an LP. For very large problems it is better to use a specialized network solver.
- The arc costs need a bit of attention. It is important to give the arcs from layer 'demand' to 'zone' a zero cost when the zone doesn't change. In the original LP model we did not care, as we modeled this slightly differently.
- When we want to move only newly added persons, we can preprocess the data, as was demonstrated for the LP formulation.
- My first attempt had an extra layer in the graph. As a commenter noted, this could be simplified. My description and the network topology reflect this simplification.
Conclusion
This is a small but interesting model. In [1] an R/LPSolve implementation is given. When comparing this with my GAMS model (reproduced in the appendix below), I have to conclude that using R/LpSolve is slightly masochistic. The code is unwieldy and not very close to the original problem. This makes reasoning about the problem much more difficult.
Setting up the alternative network model is a very different exercise than the earlier LP models. The min-cost-flow model is quite standard, so we don't have to think long about the model equations. But we have to pay attention to the network topology: that is where the complexity is. It is interesting to compare the two modeling approaches.
References
- Minimise cost of reallocating individuals, https://stackoverflow.com/questions/69046256/minimise-cost-of-reallocating-individuals
Appendix A: GAMS LP model
set |
Appendix B: GAMS Network LP model
$ontext |
This model is equivalent to minimum-cost network flow with 16 supply nodes, 16 transshipment nodes, and 4 demand nodes, so integer optimal solutions are expected.
ReplyDeleteI have to think a bit about this. Is the LP model not too far away from a network approach so that we can guarantee integrality? A pure network formulation would look a bit differently.
DeleteCould we not link the first layer to the third directly ?
ReplyDeleteYou are right! Thanks.
Delete