Sunday, November 11, 2018

Selecting Chess Players



In [1] a simple problem was posted:
  • We have a population of \(N=24\) players, each with an ELO rating \(r_i\)
  • We need to select \(2 \times 6\) players for 2 teams (each team has \(n=6\) members).
  • We want to minimize the difference in average ELO ratings of the teams.
The poster asked for an algorithm. But, of course, this looks like a problem we can solve as a mathematical programming model. 

First I generated some random data:


----     11 PARAMETER r  ELO rating

player1  1275,    player2  1531,    player3  1585,    player4   668,    player5  1107,    player6  1011
player7  1242,    player8  1774,    player9  1096,    player10 1400,    player11 1036,    player12 1538
player13 1135,    player14 1206,    player15 2153,    player16 1112,    player17  880,    player18  850
player19 1528,    player20 1875,    player21  939,    player22 1684,    player23 1807,    player24 1110

I have no idea if these numbers are realistic or not.

It make sense to look at this from an assignment problem point of view. It is amazing how often this concept is encountered in modeling. So we define:\[x_{i,j}=\begin{cases} 1 & \text{if player $i$ is assigned to team $j$}\\ 0 & \text{otherwise}\end{cases}\]

A high-level model can look like:


High-level Model
\[\begin{align}\min\>& | \color{DarkRed}{\mathit{avg}}_2 - \color{DarkRed}{\mathit{avg}}_1 | \\ & \sum_j \color{DarkRed} x_{i,j} \le 1 && \forall i\\ & \sum_i \color{DarkRed} x_{i,j} = \color{DarkBlue} n && \forall j \\ & \color{DarkRed}{\mathit{avg}}_j = \frac{\displaystyle \sum_i \color{DarkBlue} r_i \color{DarkRed} x_{i,j}}{\color{DarkBlue} n}  \\ & \color{DarkRed}x_{i,j} \in \{0,1\}\end{align} \]

As you can see, this model is largely an assignment problem plus some average calculations.

Notes:

  • The absolute value is easily linearized in different ways:
    • Bounding \[\begin{align} \min\> & z \\ & -z \le \mathit{avg}_2-\mathit{avg}_1\le z\end{align}\]
    • Variable splitting: \[\begin{align} \min\> & z^{+}+z^{-} \\ & z^{+}-z^{-} = \mathit{avg}_2-\mathit{avg}_1 \\ & z^{+},z^{-}\ge 0 \end{align}\]
    • Removing symmetry. We can require that the average rating of team 1 is not lower than of team 2. Now we just can minimize difference: \[\begin{align} \min\> &  \mathit{avg}_1 - \mathit{avg}_2\\ &  \mathit{avg}_1 \ge \mathit{avg}_2 \end{align}\]
  • We can use the sum instead of the average

Surprisingly, the solution looks like:


----     43 VARIABLE x.L  assignment

               team1       team2

player1        1.000
player2                    1.000
player4        1.000
player5                    1.000
player6                    1.000
player7        1.000
player8        1.000
player9        1.000
player10                   1.000
player11                   1.000
player17       1.000
player18                   1.000


----     43 VARIABLE avg.L  average rating of team

team1 1155.833,    team2 1155.833


----     43 PARAMETER report  solution report

               team1       team2

player1     1275.000
player2                 1531.000
player4      668.000
player5                 1107.000
player6                 1011.000
player7     1242.000
player8     1774.000
player9     1096.000
player10                1400.000
player11                1036.000
player17     880.000
player18                 850.000
sum         6935.000    6935.000
avg         1155.833    1155.833

We achieved a perfect match!

The probability of this must be somewhat low. Well, no. If we try this 10 times with different random \(r_i\), we get 10 times a perfect match.


----     50 PARAMETER trace  results for each solve

           avg1        avg2

k1     1155.833    1155.833
k2     1583.333    1583.333
k3     1385.333    1385.333
k4     1258.500    1258.500
k5     1423.167    1423.167
k6     1491.833    1491.833
k7     1262.167    1262.167
k8     1736.167    1736.167
k9     1514.167    1514.167
k10    1483.667    1483.667


Using a solve loop of length 100 gives the same result. It looks that we almost always can form two teams with equal average ELO rating. I suspect it will be very difficult to derive anything analytically about the probability of not being able to find a perfect match. I was surprised by these results.

Discussion


Often, people with a Computer Science/programming background immediately think about algorithms instead of mathematical models. They miss out on a paradigm that can be very powerful and efficient  (in terms of time needed to design, implement and maintain an algorithm or model). This model was thought out, implemented and tested in less than 20 minutes (including finding out what this ELO is). I am sure developing some special purpose algorithm will take much more time. In addition, for this model, the solver will find proven optimal solutions while developing an algorithm yourself will likely result in some heuristic without any concept of optimality.

References