Thursday, January 9, 2020

Preferences in assignments

Problem


The problem is as follows [1]:

  • We have \(N\) persons.
  • They have to be assigned to \(M\) groups.
  • Each group has a certain, given, size.
  • Each person has certain preferences to be in the same group with some other persons. For this, each person provides a list of persons (s)he would like or dislike to be in the same group with.
  • Design an assignment plan that takes these preferences into account.
If we don't allow empty spots in groups, we have the following condition on the data: \[\sum_g \mathit{size}_g = N\]



Data


The data is provided as a fancy spreadsheet:



We see we have \(N=28\) persons.

The matrix with preferences has three different colored symbols:

  • Yellow exclamation mark for 0
  • Green tick mark for +1
  • and a red x for -1

The matrix is not symmetric. E.g. there are some columns with all zero entries. Such rows do not exist. We need to read the matrix row wise: each row represents the preferences of a person. It is noted that the diagonal has all zeros, which is what I would expect.

For the problem, I want to have the total group capacity equal to \(N=28\), so I consider:

  • 4 groups of 6
  • 1 group of 4  


High-level Model


A first attempt to model this problem is as follows. We use: \[x_{p,g} = \begin{cases} 1 & \text{if person $p$ is assigned to group $g$} \\  0 & \text{otherwise}\end{cases} \] as our decision variable. The objective is to maximize the total number of preferences \(+1\) or \(-1\) that are honored. With this we can write:

Quadratic Model
\[\begin{align}\max & \sum_{p_1,p_2,g} \color{darkblue}{\mathit{Pref}}_{p_1,p_2} \color{darkred} x_{p_1,g} \color{darkred} x_{p_2,g} \\ & \sum_g \color{darkred}x_{p,g}=1 & \forall p \\ & \sum_p \color{darkred}x_{p,g} = \color{darkblue}{\mathit{Size}}_{g} &\forall g \\ & \color{darkred} x_{p,g} \in \{0,1\}\end{align}\]

We can feed this into an MIQP solver. The solution looks like:


----     80 PARAMETER solution  using MIQP model

               group1      group2      group3      group4      group5

aimee               1
amber-la                                                1
amber-le                                                            1
andrina             1
catelyn-t                                   1
charlie                                                 1
charlotte                                   1
cory                            1
daniel                          1
ellie               1
ellis               1
eve                                         1
grace-c                                                 1
grace-g                                                 1
holly                                                   1
jack                            1
jade                                                                1
james                           1
kadie                                       1
kieran                                                              1
kristiana                                   1
lily                                                                1
luke                            1
naz                 1
nibah                                       1
niko                            1
wiki                1
zeina                                                   1
COUNT               6           6           6           6           4

Linearization


We can linearize this model in different ways:
  • Let the solver reformulate. Cplex, for instance, will automatically reformulate this problem and will actually solve this as a linear MIP.
  • DIY. We can linearize the products: \[y_{p_1,p_2,g} = x_{p_1,g}\cdot x_{p_2,g}\] using \[\begin{align} & y_{p_1,p_2,g} \le x_{p_1,g} \\& y_{p_1,p_2,g} \le x_{p_2,g} \\ & y_{p_1,p_2,g} \ge x_{p_1,g}+x_{p_2,g}-1 \end{align} \]  We can save a bit by only using \(y_{p_1,p_2,g}\) for \(p_1 \le p_2\). This requires a bit of careful modeling. Let's give this a try.

Linear Model I
\[\begin{align}\max & \sum_{p_1\lt p_2,g} (\color{darkblue}{\mathit{Pref}}_{p_1,p_2}+\color{darkblue}{\mathit{Pref}}_{p_2,p_1}) \color{darkred} y_{p_1,p_2,g}  \\ & \sum_g \color{darkred}x_{p,g}=1 && \forall p \\ & \sum_p \color{darkred}x_{p,g} = \color{darkblue}{\mathit{Size}}_{g} && \forall g \\ & \color{darkred}y_{p_1,p_2,g} \le \color{darkred}x_{p_1,g} && \forall p_1\lt p_2, g \\ & \color{darkred}y_{p_1,p_2,g} \le \color{darkred}x_{p_2,g} && \forall p_1\lt p_2, g \\ &\color{darkred}y_{p_1,p_2,g} \ge \color{darkred}x_{p_1,g}+\color{darkred}x_{p_2,g} -1&&   \forall p_1\lt p_2, g\\ & \color{darkred} x_{p,g} \in \{0,1\} \\&\color{darkred}y_{p_1,p_2,g} \in [0,1] && \forall p_1\lt p_2, g \end{align}\]


There is one extra optimization, we can do. If the objective coefficient is positive, we can drop the constraint \(y_{p_1,p_2,g}  \ge x_{p_1,g}+x_{p_2,g}-1\). Similarly, if the objective coefficient is negative, we can drop the constraints \(y_{p_1,p_2,g} \le x_{p_1,g}\) and \(y_{p_1,p_2,g} \le x_{p_1,g}\). This leads to:

Linear Model II
\[\begin{align}\max & \sum_{p_1\lt p_2,g} (\color{darkblue}{\mathit{Pref}}_{p_1,p_2}+\color{darkblue}{\mathit{Pref}}_{p_2,p_1}) \color{darkred} y_{p_1,p_2,g}  \\ & \sum_g \color{darkred}x_{p,g}=1 && \forall p \\ & \sum_p \color{darkred}x_{p,g} = \color{darkblue}{\mathit{Size}}_{g} && \forall g \\ & \color{darkred}y_{p_1,p_2,g} \le \color{darkred}x_{p_1,g} && \forall p_1, p_2, g | p_1\lt p_2, \color{darkblue}{\mathit{Pref}}_{p_1,p_2}+\color{darkblue}{\mathit{Pref}}_{p_2,p_1} \gt 0 \\ & \color{darkred}y_{p_1,p_2,g} \le \color{darkred}x_{p_2,g} && \forall p_1, p_2, g | p_1\lt p_2, \color{darkblue}{\mathit{Pref}}_{p_1,p_2}+\color{darkblue}{\mathit{Pref}}_{p_2,p_1} \gt 0\\ &\color{darkred}y_{p_1,p_2,g} \ge \color{darkred}x_{p_1,g}+\color{darkred}x_{p_2,g} -1&&   \forall p_1,p_2,g | p_1\lt p_2, \color{darkblue}{\mathit{Pref}}_{p_1,p_2}+\color{darkblue}{\mathit{Pref}}_{p_2,p_1} \lt 0\\ & \color{darkred} x_{p,g} \in \{0,1\} \\&\color{darkred}y_{p_1,p_2,g} \in [0,1] && \forall p_1\lt p_2, g \end{align}\]


Some computational results:

modelMIQPMIP IMIP II
objective848484
iterations418416702640262
nodes822308594
time (seconds)6156

Looks like MIP II is closer to what Cplex is doing.

Equality


In this model we optimize the sum of the achieved preferences for all persons. This may actually hurt some individuals.  Let's look at the number of +1 and -1 preferences in the data and how many were honored in the solution:


----     96 PARAMETER count  preferences stated and achieved

             +1 prefs    -1 prefs  +/-1 prefs       +1 ok       -1 ok     +/-1 ok

aimee               9           3          12           4           3           7
amber-la            3           2           5           3           2           5
amber-le            2           6           8           1           6           7
andrina             5                       5           4                       4
catelyn-t           5                       5           4                       4
charlie             6           4          10                       4           4
charlotte           1                       1           1                       1
cory                4           4           8           4           4           8
daniel              4           3           7           4           3           7
ellie               6                       6           4                       4
ellis              11           1          12           3           1           4
eve                 7                       7           4                       4
grace-c             2                       2           2                       2
grace-g             4           2           6           4           2           6
holly               3           2           5           3           2           5
jack                3           8          11           3           7          10
jade                1           6           7           1           6           7
james               6           4          10           5           4           9
kadie               6           1           7           4           1           5
kieran                          3           3                       2           2
kristiana           4           2           6           4           2           6
lily                2           7           9                       7           7
luke                5           3           8           5           3           8
naz                 6           2           8           4           2           6
nibah               6                       6           4                       4
niko                4           4           8           4           4           8
wiki                9                       9           4                       4
zeina               3           2           5           3           2           5
sum               127          69         196          86          67         153

The left three columns summarize the input data. The last three columns is what has materialized in the solution.  There is some inequality here. E.g. Charlie gets only 4 of his 10 wishes, but Jack gets 10 of his 11 preferences obeyed.  It would be interesting if we can find a more equitable solution that still has a good overall performance.

Looking at the above table we can see that 86 of 127 +1 preferences were taken care of (68%) while 67 of 69 -1 preferences were obeyed (97%). There are more seats not at the table than at the table, so that asymmetry makes sense.

Notes:

  • the objective value (84) can be recovered from: \[86 - (69-67) = 84\]
  • to increase our objective, we can increase the size of the groups (and having fewer groups). For instance if we use groups of sizes 8,8,6,6, I achieve an optimal objective of 99.


A possible quality measure for each person is: \[\frac{\text{Number of preferences honored}}{\text{Number of preferences specified}}\] If we look at this, we see that ellis and charlie are doing poorly with 33% and 40%. To help prevent these bad cases, we can require that at least 50% (say) of preferences are met. For this data that would decrease the objective from 84 to 83. Actually this 50% is the best we can do.

References


  1. Algorithms for optimal student seating arrangements, https://stackoverflow.com/questions/59599718/algorithms-for-optimal-student-seating-arrangements

3 comments:

  1. Hi, I'm interested to solve this by myself in cvxpy. Where did you get the dataset from? Or how did you generate the dataset?
    Thanks!

    ReplyDelete
    Replies
    1. Check the comments in the reference [1]. There is a dropbox link with the spreadsheet data.

      Delete