Friday, September 17, 2021

Reallocate people: a very small but interesting LP

This is a strange little problem [1]. We have individuals (of different categories) and regions with a given capacity. The initial set of assigned persons fits within the capacity of each region. Now a new batch of individuals arrives with their chosen zones. Unfortunately, the total demand (old + new individuals) does not fit anymore in all cases. This means we need to reallocate people. We can do this at a price (the costs are given). What is the optimal (least cost) new allocation?

Data

The data we need to consider is:

----     35 PARAMETER popdata  population data

catA.zone1         115          23         138
catA.zone2         121          24         145
catA.zone3         112          22         134
catA.zone4          76          15          91
catB.zone1          70          29          99
catB.zone2          59          24          83
catB.zone3          86          35         121
catB.zone4         139          57         196
catC.zone1         142          18         160
catC.zone2          72           9          81
catC.zone3          29           4          33
catC.zone4          58           8          66
catD.zone1          22          25          47
catD.zone2          23          26          49
catD.zone3          16          18          34
catD.zone4          45          51          96

----     44 PARAMETER capacity  total capacity per zone

zone1 465,    zone2 393,    zone3 500,    zone4 331

----     66 PARAMETER cost  cost moving to target zone

zone1       zone2       zone3       zone4

catA         0.1         0.1         0.1         1.3
catB        16.2        38.1         1.5         0.1
catC         0.1        12.7        97.7        46.3
catD        25.3         7.7        67.3         0.1

----     80 PARAMETER counts  summary data

zone1       zone2       zone3       zone4       total

initial          349         275         243         318        1185
final            444         358         322         449        1573
capacity         465         393         500         331        1689


We can see our initial set of people fitted within the capacity of each zone. However, the final set does not (zone 4 is out of room). We can assume that the total capacity is enough to accommodate all people. The optimization problem is: how to move people around at minimum cost, such that all capacity constraints are met.

Model

The LP model I suggest has two variables: \begin{align}\color{darkred}{\mathit{move}}_{c,z,z'}&:\>\text{number of people of category c moved from zone z to z'}\\ \color{darkred}{\mathit{alloc}}_{c,z}&:\>\text{new allocation of people of category c to zone z} \end{align} With these, the LP model can look like:

LP model
\begin{align}\min&\sum_{c,z,z'} \color{darkblue}{\mathit{cost}}_{c,z'}\cdot\color{darkred}{\mathit{move}}_{c,z,z'} \\ & \color{darkred}{\mathit{alloc}}_{c,z} = \color{darkblue}{\mathit{demand}}_{c,z} + \sum_{z'}\color{darkred}{\mathit{move}}_{c,z',z} - \sum_{z'}\color{darkred}{\mathit{move}}_{c,z,z'} && \forall c,z \\ & \sum_c \color{darkred}{\mathit{alloc}}_{c,z} \le \color{darkblue}{\mathit{capacity}}_{z} && \forall z \\ & \color{darkred}{\mathit{move}}_{c,z,z}= 0 && \forall c,z \\ & \color{darkred}{\mathit{move}}_{c,z',z} \ge 0 \\ &\color{darkred}{\mathit{alloc}}_{c,z} \ge 0 \end{align}

Notes:
• The first constraint says: $\text{allocation=demand + inflow - outflow}$ This is just proper bookkeeping.
• The model seems to deliver integer solutions, which makes it easier to interpret results. Not sure if this is guaranteed.
• We cannot substitute out the variable $$\color{darkred}{\mathit{alloc}}_{c,z}$$ as it is non-negative. I'll show below what happens when we do that anyway.
• We can drop the constraint $$\color{darkred}{\mathit{move}}_{c,z,z}= 0$$. This will automatically hold in the optimal solution as we are minimizing cost. If we choose to keep it, we can use fixing (setting the upper bound to zero) instead of writing a full-blown equation. For large models, it may be a good strategy to drop $$\color{darkred}{\mathit{move}}_{c,z,z}$$ from the model altogether.
In any case, a good presolver will remove $$\color{darkred}{\mathit{move}}_{c,z,z}$$ from the model. These variables cancel out in the first constraint, so they only appear in the objective.
• Here we assumed that all people can be moved. Later on, we see how things change when we only allow the new, additional individuals are allowed to be reassigned.

The solution looks like:

----    122 VARIABLE move.L  moves needed to meet capacity

zone1       zone2       zone3

catA.zone1                       6
catA.zone4                      29          62
catC.zone4          27

----    122 VARIABLE alloc.L  new allocation

zone1       zone2       zone3       zone4

catA         132         180         196
catB          99          83         121         196
catC         187          81          33          39
catD          47          49          34          96

----    122 VARIABLE totcost.L             =       12.400  total cost

----    126 PARAMETER counts  summary data

zone1       zone2       zone3       zone4       total

initial          349         275         243         318        1185
final            444         358         322         449        1573
capacity         465         393         500         331        1689
newalloc         465         393         384         331        1573


This is an interesting solution. Obviously, we need to move people away from zone 4. The model first moves a few persons from zone 1 to zone 2 to make room in zone 1. I.e., we actually move two people here to move 1 person away from zone 4. Of course, this highly depends on the cost structure.

Alternative 1: substitute out the variable $$\color{darkred}{\mathit{alloc}}_{c,z}$$

It may be tempting to get rid of the variable $$\color{darkred}{\mathit{alloc}}_{c,z}$$ and write the model as \begin{align}\min&\sum_{c,z,z'} \color{darkblue}{\mathit{cost}}_{c,z'}\cdot\color{darkred}{\mathit{move}}_{c,z,z'} \\ & \sum_c \left[\color{darkblue}{\mathit{demand}}_{c,z} + \sum_{z'}\color{darkred}{\mathit{move}}_{c,z',z} - \sum_{z'}\color{darkred}{\mathit{move}}_{c,z,z'} \right] \le \color{darkblue}{\mathit{capacity}}_{z} &&\forall z \end{align}

This makes the model smaller. So what is not to like. Here are the results:

----    144 VARIABLE move.L  moves needed to meet capacity

zone2       zone3

catA.zone4          35          83

----    144 VARIABLE totcost.L             =       11.800  total cost

----    144 PARAMETER alloc2  recalculate allocations

zone1       zone2       zone3       zone4

catA     138.000     180.000     217.000     -27.000
catB      99.000      83.000     121.000     196.000
catC     160.000      81.000      33.000      66.000
catD      47.000      49.000      34.000      96.000


The objective improves (from 12.4 to 11.8)! Without further analysis, it is not immediately clear why this happens. So I added a post solution calculation of the new allocations: $\color{darkblue}{\mathit{alloc2}}_{c,z} := \color{darkblue}{\mathit{demand}}_{c,z} + \sum_{z'}\color{darkred}{\mathit{move}}^*_{c,z',z} - \sum_{z'}\color{darkred}{\mathit{move}}^*_{c,z,z'}$ This shows we have a negative allocation for category 4 and zone 4. The reason is of course that with substituting out $$\color{darkred}{\mathit{alloc}}_{c,z}$$ we also removed the non-negativity restriction $$\color{darkred}{\mathit{alloc}}_{c,z} \ge 0$$. As a result we get an illegal solution.

Conclusion: we cannot do this.

Alternative 2: only move new people

In the first model, we allowed both initial and added persons to be moved. We can easily change this to: allow only new persons to be moved and keep the initial persons at their location. We do this by: \begin{align} & \color{darkblue}{\mathit{demand}}_{c,z}:=\color{darkblue}{\mathit{popdata}}_{c,z,'\color{darkgreen}{\mathit{Added}}\:'} \\ & \color{darkblue}{\mathit{capacity}}_{z} := \color{darkblue}{\mathit{capacity}}_{z} - \color{darkblue}{\mathit{popdata}}_{c,z,'\color{darkgreen}{\mathit{Initial}}\:'}\end{align} and then just run the model again. We would expect a larger cost because we removed some flexibility from the model.

---    158 VARIABLE move.L  moves needed to meet capacity

zone1       zone2       zone3

catA.zone2                                   3
catA.zone4          13                       2
catB.zone4                                  57
catC.zone4           8
catD.zone4                      38

----    158 VARIABLE alloc.L  new allocation

zone1       zone2       zone3       zone4

catA          36          21          27
catB          29          24          92
catC          26           9           4
catD          25          64          18          13

----    158 VARIABLE totcost.L             =      380.700  total cost

----    162 PARAMETER counts  summary data

zone1       zone2       zone3       zone4       total

initial          349         275         243         318        1185
final            444         358         322         449        1573
capacity         465         393         500         331        1689
newalloc         465         393         384         331        1573


Indeed the cost of moving just from the pool of added persons is much higher.

Network formulation

In the comments, Rob Pratt uses a network interpretation to argue that the model should indeed deliver integer solutions. A second comment further helped in simplifying the network model.

This is a good example to show how to set up a network LP in GAMS. The idea is to form a network as follows:

 Network representation

The dashed arrows indicate fixed, exogenous in- or outflow. The inflow into the first layer is demand. The second layer contains the zones with the arcs indicating the new allocation after moving people. The final layer collects the total number of persons. The main use for that last layer is to provide capacitated arcs and to have a single outflow. The costs are placed on the arcs between the first and second layer.

To implement this in GAMS, I first create a 2-dimensional set with nodes. The dimensions are (category, zone). I use a 'all' when there is no category, and 'sink' for the aggregated zones. So the sink is identified by ('all','sink'). This means, our arcs have 4 dimensions. These two sets can look like:

----     98 SET n  nodes

zone1       zone2       zone3       zone4        sink

catA         YES         YES         YES         YES
catB         YES         YES         YES         YES
catC         YES         YES         YES         YES
catD         YES         YES         YES         YES
all          YES         YES         YES         YES         YES

----     98 SET a  arcs

all.zone1   all.zone2   all.zone3   all.zone4    all.sink

catA.zone1         YES         YES         YES         YES
catA.zone2         YES         YES         YES         YES
catA.zone3         YES         YES         YES         YES
catA.zone4         YES         YES         YES         YES
catB.zone1         YES         YES         YES         YES
catB.zone2         YES         YES         YES         YES
catB.zone3         YES         YES         YES         YES
catB.zone4         YES         YES         YES         YES
catC.zone1         YES         YES         YES         YES
catC.zone2         YES         YES         YES         YES
catC.zone3         YES         YES         YES         YES
catC.zone4         YES         YES         YES         YES
catD.zone1         YES         YES         YES         YES
catD.zone2         YES         YES         YES         YES
catD.zone3         YES         YES         YES         YES
catD.zone4         YES         YES         YES         YES
all .zone1                                                         YES
all .zone2                                                         YES
all .zone3                                                         YES
all .zone4                                                         YES


We also have some capacities, costs, and exogenous inflows. They are:

----    112 PARAMETER acost  cost of an arc

all.zone1   all.zone2   all.zone3   all.zone4

catA.zone1                     0.1         0.1         1.3
catA.zone2         0.1                     0.1         1.3
catA.zone3         0.1         0.1                     1.3
catA.zone4         0.1         0.1         0.1
catB.zone1                    38.1         1.5         0.1
catB.zone2        16.2                     1.5         0.1
catB.zone3        16.2        38.1                     0.1
catB.zone4        16.2        38.1         1.5
catC.zone1                    12.7        97.7        46.3
catC.zone2         0.1                    97.7        46.3
catC.zone3         0.1        12.7                    46.3
catC.zone4         0.1        12.7        97.7
catD.zone1                     7.7        67.3         0.1
catD.zone2        25.3                    67.3         0.1
catD.zone3        25.3         7.7                     0.1
catD.zone4        25.3         7.7        67.3

----    112 PARAMETER inflow  exogenous in- or outflow

zone1       zone2       zone3       zone4        sink

catA         138         145         134          91
catB          99          83         121         196
catC         160          81          33          66
catD          47          49          34          96
all                                                        -1573

----    112 PARAMETER cap  capacity of an arc

all.sink

all.zone1         465
all.zone2         393
all.zone3         500
all.zone4         331


With this we can form a standard min-cost-flow LP model:

Min-Cost-Flow Network LP Model
\begin{align}\min&\sum_{a} \color{darkblue}c_{a} \cdot \color{darkred}f_{a} \\ & \sum_{a(j,i)} \color{darkred}f_{j,i} + \color{darkblue}{\mathit{inflow}}_i = \sum_{a(i,j)} \color{darkred}f_{i,j} && \forall i \\ & \color{darkred}f_a \in [0,\color{darkblue}{\mathit{cap}}_a] \end{align}

The constraint is just flow-conservation: what goes into node $$i$$ should also go out. When I solve this, I get the same solution as before:

----    141 VARIABLE flow.L  flows along arcs

all.zone1   all.zone2   all.zone3   all.zone4    all.sink

catA.zone1         132           6
catA.zone2                     145
catA.zone3                                 134
catA.zone4                      29          62
catB.zone1          99
catB.zone2                      83
catB.zone3                                 121
catB.zone4                                             196
catC.zone1         160
catC.zone2                      81
catC.zone3                                  33
catC.zone4          27                                  39
catD.zone1          47
catD.zone2                      49
catD.zone3                                  34
catD.zone4                                              96
all .zone1                                                         465
all .zone2                                                         393
all .zone3                                                         384
all .zone4                                                         331

----    141 VARIABLE totcost.L             =       12.400  total cost

----    146 PARAMETER moves  computed from flows

zone1       zone2       zone3

catA.zone1                       6
catA.zone4                      29          62
catC.zone4          27


The GAMS code is shown below in appendix B.

Notes:

• Although we formulate the problem as a network, I still solve it as an LP. For very large problems it is better to use a specialized network solver.
• The arc costs need a bit of attention. It is important to give the arcs from layer 'demand' to 'zone' a zero cost when the zone doesn't change. In the original LP model we did not care, as we modeled this slightly differently.
• When we want to move only newly added persons, we can preprocess the data, as was demonstrated for the LP formulation.
• My first attempt had an extra layer in the graph. As a commenter noted, this could be simplified. My description and the network topology reflect this simplification.

Conclusion

This is a small but interesting model. In [1] an R/LPSolve implementation is given. When comparing this with my GAMS model (reproduced in the appendix below), I have to conclude that using R/LpSolve is slightly masochistic. The code is unwieldy and not very close to the original problem. This makes reasoning about the problem much more difficult.

Setting up the alternative network model is a very different exercise than the earlier LP models. The min-cost-flow model is quite standard, so we don't have to think long about the model equations. But we have to pay attention to the network topology: that is where the complexity is. It is interesting to compare the two modeling approaches.

References

1. Minimise cost of reallocating individuals, https://stackoverflow.com/questions/69046256/minimise-cost-of-reallocating-individuals

Appendix A: GAMS LP model

 set    dummy 'for ordering (printing)' /initial,added,final/    c 'category' /catA*catD/    z 'zone'     /zone1*zone4/ ; *---------------------------------------------------------------- * data *---------------------------------------------------------------- table popdata(c,z,*) 'population data'                     initial   final     catA.zone1         115     138     catA.zone2         121     145     catA.zone3         112     134     catA.zone4          76      91     catB.zone1          70      99     catB.zone2          59      83     catB.zone3          86     121     catB.zone4         139     196     catC.zone1         142     160     catC.zone2          72      81     catC.zone3          29      33     catC.zone4          58      66     catD.zone1          22      47     catD.zone2          23      49     catD.zone3          16      34     catD.zone4          45      96 ; popdata(c,z,'added') = popdata(c,z,'final')-popdata(c,z,'initial'); option popdata:0; display popdata; parameter capacity(z) 'total capacity per zone' /     zone1   465     zone2   393     zone3   500     zone4   331 /; option capacity:0; display capacity; parameter cost(c,z) 'cost moving to target zone' /         catA.zone1   0.1         catA.zone2   0.1         catA.zone3   0.1         catA.zone4   1.3         catB.zone1  16.2         catB.zone2  38.1         catB.zone3   1.5         catB.zone4   0.1         catC.zone1   0.1         catC.zone2  12.7         catC.zone3  97.7         catC.zone4  46.3         catD.zone1  25.3         catD.zone2   7.7         catD.zone3  67.3         catD.zone4   0.1 /; option cost:1; display cost; *---------------------------------------------------------------- * summary data *---------------------------------------------------------------- parameter counts(*,*) 'summary data'; counts('initial',z) = sum(c,popdata(c,z,'initial')); counts('initial','total') = sum(z,counts('initial',z)); counts('final',z) = sum(c,popdata(c,z,'final')); counts('final','total') = sum(z,counts('final',z)); counts('capacity',z) = capacity(z); counts('capacity','total') = sum(z,capacity(z)); option counts:0; display counts; *---------------------------------------------------------------- * optimal reallocation *---------------------------------------------------------------- alias(z,z1,z2); parameter    demand(c,z) 'current demand'    cap(z)      'capacity (net)' ; demand(c,z) = popdata(c,z,'final'); cap(z) = capacity(z); positive variable move(c,z1,z2) 'moves needed to meet capacity'; positive variable alloc(c,z)  'new allocation'; variable totcost 'total cost'; equations  eobj             'objective'  ealloc(c,z)      'allocation bookkeeping'  ecap             'capacity limits' ; eobj.. totcost =e= sum((c,z1,z2),move(c,z1,z2)*cost(c,z2)); ealloc(c,z).. alloc(c,z) =e= demand(c,z) + sum(z1,move(c,z1,z)) - sum(z2,move(c,z,z2)); ecap(z).. sum(c,alloc(c,z)) =l= cap(z); * not needed move.fx(c,z,z) = 0; model m /all/; solve m minimizing totcost using lp; option move:0, alloc:0; display move.l,alloc.l,totcost.l; counts('newalloc',z) = sum(c,alloc.l(c,z)); counts('newalloc','total') = sum(z,counts('newalloc',z)); display counts; *---------------------------------------------------------------- * this is wrong: substitute out alloc *---------------------------------------------------------------- equations   ecap2(z)      'capacity constraint (WRONG!)' ; ecap2(z).. sum(c, demand(c,z) + sum(z1,move(c,z1,z)) - sum(z2,move(c,z,z2))) =l= cap(z); model m2 /eobj,ecap2/; solve m2 minimizing totcost using lp; parameter alloc2(c,z) 'recalculate allocations'; alloc2(c,z) = demand(c,z) + sum(z1,move.l(c,z1,z)) - sum(z2,move.l(c,z,z2)) display move.l,totcost.l,alloc2; *---------------------------------------------------------------- * optimal reallocation while keeping initial population where * they were. *---------------------------------------------------------------- demand(c,z) = popdata(c,z,'added'); cap(z) = capacity(z) - sum(c,popdata(c,z,'initial')); solve m minimizing totcost using lp; option move:0, alloc:0; display move.l,alloc.l,totcost.l; counts('newalloc',z) = counts('initial',z) + sum(c,alloc.l(c,z)); counts('newalloc','total') = sum(z,counts('newalloc',z)); display counts;

Appendix B: GAMS Network LP model

 $ontext Small LP about reallocating persons Formulated as a network model https://yetanothermathprogrammingconsultant.blogspot.com/2021/09/reallocate-people-very-small-but.html erwin@amsterdamoptimization.com$offtext set    dummy  'for ordering (printing)' /initial,added,final/    c0     'category superset'       /catA*catD, all/    c(c0)  'category'                /catA*catD/    z0     'zone superset'           /zone1*zone4, sink/    z(z0)  'zone'                    /zone1*zone4/ ; *---------------------------------------------------------------- * data *---------------------------------------------------------------- table popdata(c,z,*) 'population data'                     initial   final     catA.zone1         115     138     catA.zone2         121     145     catA.zone3         112     134     catA.zone4          76      91     catB.zone1          70      99     catB.zone2          59      83     catB.zone3          86     121     catB.zone4         139     196     catC.zone1         142     160     catC.zone2          72      81     catC.zone3          29      33     catC.zone4          58      66     catD.zone1          22      47     catD.zone2          23      49     catD.zone3          16      34     catD.zone4          45      96 ; popdata(c,z,'added') = popdata(c,z,'final')-popdata(c,z,'initial'); option popdata:0; display popdata; parameter capacity(z) 'total capacity per zone' /     zone1   465     zone2   393     zone3   500     zone4   331 /; option capacity:0; display capacity; parameter cost(c,z) 'cost moving to target zone' /         catA.zone1   0.1         catA.zone2   0.1         catA.zone3   0.1         catA.zone4   1.3         catB.zone1  16.2         catB.zone2  38.1         catB.zone3   1.5         catB.zone4   0.1         catC.zone1   0.1         catC.zone2  12.7         catC.zone3  97.7         catC.zone4  46.3         catD.zone1  25.3         catD.zone2   7.7         catD.zone3  67.3         catD.zone4   0.1 /; option cost:1; display cost; *---------------------------------------------------------------- * network topology *---------------------------------------------------------------- alias (z,z1,z2),(c,c1,c2); set   n(c0,z0) 'nodes' /        (catA*catD,all).(zone1*zone4)        all.sink    /    a(c0,z0,c0,z0) 'arcs' ; a(c,z1,'all',z2) = yes; a('all',z,'all','sink') = yes; option a:0:2:2; display n,a; parameter    acost(c0,z0,c0,z0)     'cost of an arc'    inflow(c0,z0)          'exogenous in- or outflow'    cap(c0,z0,c0,z0)       'capacity of an arc' ; acost(c,z1,'all',z2)$(not sameas(z1,z2)) = cost(c,z2); inflow(c,z) = popdata(c,z,'final'); inflow('all','sink') = -sum((c,z),popdata(c,z,'final')); cap('all',z,'all','sink') = capacity(z); option acost:1:2:2,inflow:0,cap:0:2:2; display acost,inflow,cap; *---------------------------------------------------------------- * min cost flow network model *---------------------------------------------------------------- positive variable flow(c0,z0,c0,z0) 'flows along arcs'; variable totcost 'total cost'; equations nodbal(c0,z0) 'node balance' objective ; alias(n,n1,n2); objective.. totcost =e= sum(a,acost(a)*flow(a)); nodbal(n1).. sum(a(n2,n1),flow(a)) + inflow(n1) =e= sum(a(n1,n2),flow(a)); flow.up(a)$cap(a) = cap(a); model mincostflow /all/; solve mincostflow minimizing totcost using lp; option flow:0:2:2; display flow.l, totcost.l; parameter moves(c,z,z) 'computed from flows'; moves(c,z1,z2)\$(not sameas(z1,z2)) = flow.l(c,z1,'all',z2); option moves:0; display moves;