#### Problem

The problem is as follows [1]:

- We have \(N\) persons.
- They have to be assigned to \(M\) groups.
- Each group has a certain, given, size.
- Each person has certain preferences to be in the same group with some other persons. For this, each person provides a list of persons (s)he would like or dislike to be in the same group with.
- Design an assignment plan that takes these preferences into account.

If we don't allow empty spots in groups, we have the following condition on the data: \[\sum_g \mathit{size}_g = N\]

#### Data

The data is provided as a fancy spreadsheet:

We see we have \(N=28\) persons.

The matrix with preferences has three different colored symbols:

- Yellow exclamation mark for 0
- Green tick mark for +1
- and a red x for -1

The matrix is not symmetric. E.g. there are some columns with all zero entries. Such rows do not exist. We need to read the matrix row wise: each row represents the preferences of a person. It is noted that the diagonal has all zeros, which is what I would expect.

For the problem, I want to have the total group capacity equal to \(N=28\), so I consider:

- 4 groups of 6
- 1 group of 4

#### High-level Model

A first attempt to model this problem is as follows. We use: \[x_{p,g} = \begin{cases} 1 & \text{if person $p$ is assigned to group $g$} \\ 0 & \text{otherwise}\end{cases} \] as our decision variable. The objective is to maximize the total number of preferences \(+1\) or \(-1\) that are honored. With this we can write:

Quadratic Model |
---|

\[\begin{align}\max & \sum_{p_1,p_2,g} \color{darkblue}{\mathit{Pref}}_{p_1,p_2} \color{darkred} x_{p_1,g} \color{darkred} x_{p_2,g} \\ & \sum_g \color{darkred}x_{p,g}=1 & \forall p \\ & \sum_p \color{darkred}x_{p,g} = \color{darkblue}{\mathit{Size}}_{g} &\forall g \\ & \color{darkred} x_{p,g} \in \{0,1\}\end{align}\] |

We can feed this into an MIQP solver. The solution looks like:

---- 80 PARAMETER solution using MIQP model group1 group2 group3 group4 group5 aimee 1 amber-la 1 amber-le 1 andrina 1 catelyn-t 1 charlie 1 charlotte 1 cory 1 daniel 1 ellie 1 ellis 1 eve 1 grace-c 1 grace-g 1 holly 1 jack 1 jade 1 james 1 kadie 1 kieran 1 kristiana 1 lily 1 luke 1 naz 1 nibah 1 niko 1 wiki 1 zeina 1 COUNT 6 6 6 6 4

#### Linearization

We can linearize this model in different ways:

- Let the solver reformulate. Cplex, for instance, will automatically reformulate this problem and will actually solve this as a linear MIP.
- DIY. We can linearize the products: \[y_{p_1,p_2,g} = x_{p_1,g}\cdot x_{p_2,g}\] using \[\begin{align} & y_{p_1,p_2,g} \le x_{p_1,g} \\& y_{p_1,p_2,g} \le x_{p_2,g} \\ & y_{p_1,p_2,g} \ge x_{p_1,g}+x_{p_2,g}-1 \end{align} \] We can save a bit by only using \(y_{p_1,p_2,g}\) for \(p_1 \le p_2\). This requires a bit of careful modeling. Let's give this a try.

Linear Model I |
---|

\[\begin{align}\max & \sum_{p_1\lt p_2,g} (\color{darkblue}{\mathit{Pref}}_{p_1,p_2}+\color{darkblue}{\mathit{Pref}}_{p_2,p_1}) \color{darkred} y_{p_1,p_2,g} \\ & \sum_g \color{darkred}x_{p,g}=1 && \forall p \\ & \sum_p \color{darkred}x_{p,g} = \color{darkblue}{\mathit{Size}}_{g} && \forall g \\ & \color{darkred}y_{p_1,p_2,g} \le \color{darkred}x_{p_1,g} && \forall p_1\lt p_2, g \\ & \color{darkred}y_{p_1,p_2,g} \le \color{darkred}x_{p_2,g} && \forall p_1\lt p_2, g \\ &\color{darkred}y_{p_1,p_2,g} \ge \color{darkred}x_{p_1,g}+\color{darkred}x_{p_2,g} -1&& \forall p_1\lt p_2, g\\ & \color{darkred} x_{p,g} \in \{0,1\} \\&\color{darkred}y_{p_1,p_2,g} \in [0,1] && \forall p_1\lt p_2, g \end{align}\] |

There is one extra optimization, we can do. If the objective coefficient is positive, we can drop the constraint \(y_{p_1,p_2,g} \ge x_{p_1,g}+x_{p_2,g}-1\). Similarly, if the objective coefficient is negative, we can drop the constraints \(y_{p_1,p_2,g} \le x_{p_1,g}\) and \(y_{p_1,p_2,g} \le x_{p_1,g}\). This leads to:

Linear Model II |
---|

\[\begin{align}\max & \sum_{p_1\lt p_2,g} (\color{darkblue}{\mathit{Pref}}_{p_1,p_2}+\color{darkblue}{\mathit{Pref}}_{p_2,p_1}) \color{darkred} y_{p_1,p_2,g} \\ & \sum_g \color{darkred}x_{p,g}=1 && \forall p \\ & \sum_p \color{darkred}x_{p,g} = \color{darkblue}{\mathit{Size}}_{g} && \forall g \\ & \color{darkred}y_{p_1,p_2,g} \le \color{darkred}x_{p_1,g} && \forall p_1, p_2, g | p_1\lt p_2, \color{darkblue}{\mathit{Pref}}_{p_1,p_2}+\color{darkblue}{\mathit{Pref}}_{p_2,p_1} \gt 0 \\ & \color{darkred}y_{p_1,p_2,g} \le \color{darkred}x_{p_2,g} && \forall p_1, p_2, g | p_1\lt p_2, \color{darkblue}{\mathit{Pref}}_{p_1,p_2}+\color{darkblue}{\mathit{Pref}}_{p_2,p_1} \gt 0\\ &\color{darkred}y_{p_1,p_2,g} \ge \color{darkred}x_{p_1,g}+\color{darkred}x_{p_2,g} -1&& \forall p_1,p_2,g | p_1\lt p_2, \color{darkblue}{\mathit{Pref}}_{p_1,p_2}+\color{darkblue}{\mathit{Pref}}_{p_2,p_1} \lt 0\\ & \color{darkred} x_{p,g} \in \{0,1\} \\&\color{darkred}y_{p_1,p_2,g} \in [0,1] && \forall p_1\lt p_2, g \end{align}\] |

Some computational results:

model | MIQP | MIP I | MIP II |
---|---|---|---|

objective | 84 | 84 | 84 |

iterations | 41841 | 67026 | 40262 |

nodes | 822 | 308 | 594 |

time (seconds) | 6 | 15 | 6 |

Looks like MIP II is closer to what Cplex is doing.

#### Equality

In this model we optimize the sum of the achieved preferences for all persons. This may actually hurt some individuals. Let's look at the number of +1 and -1 preferences in the data and how many were honored in the solution:

---- 96 PARAMETER count preferences stated and achieved +1 prefs -1 prefs +/-1 prefs +1 ok -1 ok +/-1 ok aimee 9 3 12 4 3 7 amber-la 3 2 5 3 2 5 amber-le 2 6 8 1 6 7 andrina 5 5 4 4 catelyn-t 5 5 4 4 charlie 6 4 10 4 4 charlotte 1 1 1 1 cory 4 4 8 4 4 8 daniel 4 3 7 4 3 7 ellie 6 6 4 4 ellis 11 1 12 3 1 4 eve 7 7 4 4 grace-c 2 2 2 2 grace-g 4 2 6 4 2 6 holly 3 2 5 3 2 5 jack 3 8 11 3 7 10 jade 1 6 7 1 6 7 james 6 4 10 5 4 9 kadie 6 1 7 4 1 5 kieran 3 3 2 2 kristiana 4 2 6 4 2 6 lily 2 7 9 7 7 luke 5 3 8 5 3 8 naz 6 2 8 4 2 6 nibah 6 6 4 4 niko 4 4 8 4 4 8 wiki 9 9 4 4 zeina 3 2 5 3 2 5 sum 127 69 196 86 67 153

The left three columns summarize the input data. The last three columns is what has materialized in the solution. There is some inequality here. E.g. Charlie gets only 4 of his 10 wishes, but Jack gets 10 of his 11 preferences obeyed. It would be interesting if we can find a more equitable solution that still has a good overall performance.

Looking at the above table we can see that 86 of 127 +1 preferences were taken care of (68%) while 67 of 69 -1 preferences were obeyed (97%). There are more seats not at the table than at the table, so that asymmetry makes sense.

Notes:

- the objective value (84) can be recovered from: \[86 - (69-67) = 84\]
- to increase our objective, we can increase the size of the groups (and having fewer groups). For instance if we use groups of sizes 8,8,6,6, I achieve an optimal objective of 99.

A possible quality measure for each person is: \[\frac{\text{Number of preferences honored}}{\text{Number of preferences specified}}\] If we look at this, we see that

**ellis**and

**charlie**are doing poorly with 33% and 40%. To help prevent these bad cases, we can require that at least 50% (say) of preferences are met. For this data that would decrease the objective from 84 to 83. Actually this 50% is the best we can do.

#### References

- Algorithms for optimal student seating arrangements, https://stackoverflow.com/questions/59599718/algorithms-for-optimal-student-seating-arrangements

Hi, I'm interested to solve this by myself in cvxpy. Where did you get the dataset from? Or how did you generate the dataset?

ReplyDeleteThanks!

Check the comments in the reference [1]. There is a dropbox link with the spreadsheet data.

DeleteGot it. Thanks!

Delete