## Monday, July 5, 2021

### Another sports schedule problem

 Wimbledon 2010 Mixed Doubles (source)

In [1] a question is asked about designing a sports schedule as follows:

• There are 16 players, 7 rounds, and 4 games per round.
• A game consists of 2 teams with 2 players per team.
• Each player has a rating. A team rating is the sum of the ratings of the two members.
• A player plays exactly one game in each round.
• A player can be partnered with another player only once.
• A player can play against another player only twice.
• Design a schedule that has the minimum difference in ratings between opposing teams.

I'll formulate two different models. One with a small number of binary variables. And one with a large number of binary variables. As we shall see the model with the larger number of variables will win.

#### Data

Here is a recapitulation of the data.

----     40 SET p  player

Joe   ,    Tom   ,    Sam   ,    Bill  ,    Fred  ,    John  ,    Wex   ,    Chip  ,    Mike  ,    Jeff  ,    Steve
Kumar ,    Connor,    Matt  ,    Peter ,    Cindy

----     40 SET r  rounds

round1,    round2,    round3,    round4,    round5,    round6,    round7

----     40 SET g  game

game1,    game2,    game3,    game4

----     40 SET t  2 teams per game

team1,    team2

----     40 PARAMETER rating

Joe    5.000,    Tom    4.200,    Sam    4.300,    Bill   5.100,    Fred   4.400,    John   3.700,    Wex    3.800
Chip   4.600,    Mike   3.200,    Jeff   3.600,    Steve  3.800,    Kumar  4.700,    Connor 4.300,    Matt   4.600
Peter  4.200,    Cindy  3.400


#### Model 1

Here we model directly the assignment of players to teams.

MIP Model 1
Variables
\begin{align}&\color{darkred}x_{p,r,g,t} \in \{0,1\} && \text{Assignment of player p to team t in round r, game g}\\ &\color{darkred}{\mathit{isPartner}}_{p,p',r,g,t} \in [0,1] && \text{This is 1 if p,p' are playing in the same team}\\ &\color{darkred}{\mathit{isOpponent}}_{p,p',r,g,t} \in [0,1] && \text{This is 1 if p,p' are playing in the opposite team}\\ &\color{darkred}{\mathit{diff}}_{r,g} \ge 0 && \text{Rating difference between teams in game}\end{align}
Equations
\begin{align}\min&\sum_{r,g}\color{darkred}{\mathit{diff}}_{r,g} && && \mbox{Minimize the rating differences between teams}\\ &\sum_{g,t} \color{darkred}x_{p,r,g,t} = 1 && \forall p,r && \text{Player p is scheduled in each round r} \\ &\sum_p \color{darkred}x_{p,r,g,t} = 2 && \forall r,g,t && \text{Teams consist of two persons}\\ &\color{darkred}{\mathit{isPartner}}_{p,p',r,g,t} \ge \color{darkred}x_{p,r,g,t}+\color{darkred}x_{p',r,g,t}-1 && \forall p\le p',r,g,t && \text{p and p' are partnering at r,g,t}\\ &\sum_{r,g,t}\color{darkred}{\mathit{isPartner}}_{p,p',r,g,t} \le 1 && \forall p\le p' && \text{Limit number of times p,p' can partner} \\ &\color{darkred}{\mathit{isOpponent}}_{p,p',r,g,t} \ge \color{darkred}x_{p,r,g,t}+\color{darkred}x_{p',r,g,t++1}-1 && \forall p\le p',r,g,t && \text{p and p' are opponents at r,g,t} \\ &\sum_{r,g,t} \color{darkred}{\mathit{isOpponent}}_{p,p',r,g,t} \le 2 && \forall p\le p' && \text{Limit numbers of times p,p' can be opponents} \\ &\color{darkred}{\mathit{diff}}_{r,g} = \sum_p\color{darkred}x_{p,r,g,\color{darkgreen}{\mathit{team2}}}\cdot\color{darkblue}{\mathit{rating}}_p - \sum_p\color{darkred}x_{p,r,g,\color{darkgreen}{\mathit{team1}}}\cdot\color{darkblue}{\mathit{rating}}_p && \forall r,g && \text{Force team 2 to be the higher rated team} \end{align}

There are some interesting wrinkles here.
• We have some linearizations. I wanted to use $\color{darkred}{\mathit{isPartner}}_{p,p',r,g,t} = \color{darkred}x_{p,r,g,t} \cdot \color{darkred}x_{p',r,g,t}$ A  multiplication with binary variables $\color{darkred}z=\color{darkred}x\cdot \color{darkred}y$ can be linearized as \begin{align} & \color{darkred}z \le \color{darkred}x \\ &\color{darkred}z \le \color{darkred}y \\ & \color{darkred}z\ge \color{darkred}x+\color{darkred}y-1\end{align} In our case we can drop the first two inequalities as we are only interesting in setting some upper bound on $$\color{darkred}{\mathit{isPartner}}$$, as can be seen from $\sum_{r,g,t}\color{darkred}{\mathit{isPartner}}_{p,p',r,g,t} \le 1$ So we can replace the nonlinear constraint by: $\color{darkred}{\mathit{isPartner}}_{p,p',r,g,t} \ge \color{darkred}x_{p,r,g,t}+\color{darkred}x_{p',r,g,t}-1$
• The other linearization contains a funny operator: $\color{darkred}{\mathit{isOpponent}}_{p,p',r,g,t} \ge \color{darkred}x_{p,r,g,t}+\color{darkred}x_{p',r,g,t++1}-1$ The expression $$t\text{++}1$$ means a circular lead. $$t$$ stands for teams. We have $$t = \{\color{darkgreen}{\mathit{team1}},\color{darkgreen}{\mathit{team2}}\}$$. In this case $$t\text{++}1$$ means: "the other team". I.e. if $$t=\color{darkgreen}{\mathit{team1}}$$ then $$t\text{++}1$$ is $$\color{darkgreen}{\mathit{team2}}$$. And vice versa. Note that I could have used $$t\text{--}1$$ as well. A circular lag is identical to a circular lead when there are just two elements in the set.
• One would expect to minimize the sum of the absolute differences between the team ratings. Here I did not do this. As $$\color{darkred}{\mathit{diff}}_{r,g}$$ is a non-negative variable, we enforce that team 2 has an equal or better rating than team 1. This is not an issue. Actually, the idea is to reduce symmetry in the problem to help it solve faster.
• The number of binary variables in this model is 896. Note that the variables $$\color{darkred}{\mathit{isPartner}}_{p,p',r,g,t}$$ and $$\color{darkred}{\mathit{isOpponent}}_{p,p',r,g,t}$$ can be either binary or continuous between zero and one. I chose to make them continous.
• This model is slow as molasses. See the results below. So we need to look at another formulation.

#### Model 2

In this model, we try to enumerate the possible games. First, we can enumerate the combinations of four players. As we have $$n=16$$ players, the number of combinations of four players is ${16 \choose 4}=1820$ Four players A,B,C,D can play in three different ways: $\begin{array}{cc} \text{team 1} & \text{team 2}\\ \hline A,B & C,D \\ A,C & B,D \\ A,D & B,C\end{array}$ This means, we have 5460 possible games. I generated them all. This looks like:

----    109 SET game  possible game configurations

game1   .team1.Joe   ,    game1   .team1.Tom   ,    game1   .team2.Sam   ,    game1   .team2.Bill
game2   .team1.Joe   ,    game2   .team1.Sam   ,    game2   .team2.Tom   ,    game2   .team2.Bill
game3   .team1.Joe   ,    game3   .team1.Bill  ,    game3   .team2.Tom   ,    game3   .team2.Sam
game4   .team1.Joe   ,    game4   .team1.Tom   ,    game4   .team2.Sam   ,    game4   .team2.Fred
game5   .team1.Joe   ,    game5   .team1.Sam   ,    game5   .team2.Tom   ,    game5   .team2.Fred
game6   .team1.Joe   ,    game6   .team1.Fred  ,    game6   .team2.Tom   ,    game6   .team2.Sam
game7   .team1.Joe   ,    game7   .team1.Tom   ,    game7   .team2.Sam   ,    game7   .team2.John
game8   .team1.Joe   ,    game8   .team1.Sam   ,    game8   .team2.Tom   ,    game8   .team2.John
game9   .team1.Joe   ,    game9   .team1.John  ,    game9   .team2.Tom   ,    game9   .team2.Sam
game10  .team1.Joe   ,    game10  .team1.Tom   ,    game10  .team2.Sam   ,    game10  .team2.Wex
. . .
game5455.team1.Kumar ,    game5455.team1.Matt  ,    game5455.team2.Peter ,    game5455.team2.Cindy
game5456.team1.Kumar ,    game5456.team1.Peter ,    game5456.team2.Matt  ,    game5456.team2.Cindy
game5457.team1.Kumar ,    game5457.team1.Cindy ,    game5457.team2.Matt  ,    game5457.team2.Peter
game5458.team1.Connor,    game5458.team1.Matt  ,    game5458.team2.Peter ,    game5458.team2.Cindy
game5459.team1.Connor,    game5459.team1.Peter ,    game5459.team2.Matt  ,    game5459.team2.Cindy
game5460.team1.Connor,    game5460.team1.Cindy ,    game5460.team2.Matt  ,    game5460.team2.Peter


This means we "just" have to select four different games from this set for each round (subject to some constraints) and we are done. As we have all this precalculated data, we can also precalculate the rating difference in each game. In the model below this is stored as $$\color{darkblue}{\mathit{ratingDiff}}_g$$. Similarly we can precalculate the sets $$\color{darkblue}{\mathit{partner}}_{g,p,p'}$$ and $$\color{darkblue}{\mathit{opponent}}_{g,p,p'}$$ to indicate if $$p,p'$$ are partners or opponents in game $$g$$.

MIP Model 2
Variables
\begin{align}&\color{darkred}x_{r,g} \in \{0,1\} && \text{Selection of games} \end{align}
Equations
\begin{align}\min&\sum_{r,g}\color{darkblue}{\mathit{ratingDiff}}_g \cdot \color{darkred}x_{r,g} && && \mbox{Minimize the rating differences between teams}\\ &\sum_g \color{darkred}x_{r,g} = 4 && \forall r && \text{Select 4 games in each round r} \\ &\sum_{g,t|\color{darkblue}{\mathit{game}}(g,t,p)} \color{darkred}x_{r,g} = 1 && \forall p,r && \text{Each person plays once in each round}\\ &\sum_{r,g|\color{darkblue}{\mathit{partner}}(g,p,p')}\color{darkred}x_{r,g} \le 1 && \forall p\le p' && \text{Limit number of times p,p' can partner} \\ &\sum_{r,g|\color{darkblue}{\mathit{opponent}}(g,p,p')}\color{darkred}x_{r,g} \le 2 && \forall p\le p' && \text{Limit numbers of times p,p' can be opponents} \end{align}

Notes:
• This model is much larger in terms of binary variables: we have 38,220 of them!
• However, it solves much faster. Showing once more that the number of discrete variables is not always a good measure of complexity.
• The generation of the combinations and games was done in Python. The rest of the model was formulated in GAMS.
• The solution is depicted below.

#### Results

The solution looks like:

The red and blue cells show the team number. Each row represents an individual game, so it will have two red and two blue cells. The optimizer did a good job: many games have a rating difference of zero.

The differences in performance of the models:

Performance Statistics
Model 1Model 2
Equations13,877360
Variables14,36538,221
Discrete Variables89638,220
Objective10.50.7
Time100066
StatusTime LimitOptimal
Gap93%

Obviously, the second model is doing much better. The author of [1] did a pretty good job.

#### Conclusions

In this post, I implemented two different models for the scheduling problem. The first is smaller in terms of the number of binary variables. The second one requires us to do some work upfront: generating all possible game configurations. Once we have done that, the resulting model is very large but rather simple and efficient to solve. This is similar to a set-covering model approach where we see a likewise behavior: the resulting models are large but easy to solve. This is a useful approach that can be quite helpful in certain cases.

The GAMS implemention of the second model is also interesting. We can use embedded Python code (with the wonderful itertools package) to generate the possible games.

#### Appendix A: Model 1

Notes:
• The 2d set lt(p,pp) implements the conditions $$p \lt p'$$. It is used throughout the models to prevent double-checking pairs of players.
• A circular lead ++ is used in this model. This can be replaced by a circular lag --, having the same result. It is not so often we can use this operator in a model, so worth pointing out.

 $ontext There are 16 players and 7 rounds. Each round consists of 4 games. Each game is two players vs. two players. Each player has a rating. The objective is to minimize the absolute rating difference between the teams. The constraints are that every player: -must play 7 rounds -can only play 1 game per round -can only be partnered with another player 0 or 1 times -can only be opponent of another player 0, 1, or 2 times From: https://stackoverflow.com/questions/67604569/pulp-scheduling-program-needs-more-efficient-problem-definition This model does not work very well. It is not able to find a good schedule within a reasonable time.$offtext *------------------------------------------------------ * data *------------------------------------------------------ set   p  'player'      /Joe, Tom, Sam, Bill, Fred, John, Wex, Chip,        Mike, Jeff, Steve, Kumar, Connor, Matt, Peter, Cindy/   r 'rounds' /round1*round7/   g 'game' /game1*game4/   t '2 teams per game' /team1,team2/ ; alias(p,pp); parameter rating(p)  /    Joe    5.0,  Tom  4.2,  Sam   4.3,  Bill  5.1,    Fred   4.4,  John 3.7,  Wex   3.8,  Chip  4.6,    Mike   3.2,  Jeff 3.6,  Steve 3.8,  Kumar 4.7,    Connor 4.3,  Matt 4.6,  Peter 4.2,  Cindy 3.4/ ; display p,r,g,t,rating; set lt(p,pp) 'combinations of players with p

#### Appendix B: Model 2

Notes:
• Embedded Python is used to populate the set game
• A note about domain errors. By default, moving data from embedded Python to its GAMS environment is unsafe: domain errors are silently ignored. (Obviously, that default is wrong: the default should be safety first). I should have used $OffFiltered. That will trigger errors when a domain error occurs. You can see the difference in behavior by mangling one of the "team1" strings inside the Python code. In general, except for very special cases, like we should use$loaddc instead of $load, we should use$OffFiltered if embedded Python code is used.

 $ontext There are 16 players and 7 rounds. Each round consists of 4 games. Each game is two players vs. two players. Each player has a rating. The objective is to minimize the absolute rating difference between the teams. The constraints are that every player: -must play 7 rounds -can only play 1 game per round -can only be partnered with another player 0 or 1 times -can only be opponent of another player 0, 1, or 2 times From: https://stackoverflow.com/questions/67604569/pulp-scheduling-program-needs-more-efficient-problem-definition This is version 2 of the model. We generate possible game configurations in advance.$offtext *------------------------------------------------------ * base data *------------------------------------------------------ sets   p  'player'      /Joe, Tom, Sam, Bill, Fred, John, Wex, Chip,        Mike, Jeff, Steve, Kumar, Connor, Matt, Peter, Cindy/   r 'rounds' /round1*round7/   t 'teams' /team1,team2/   g 'all possible games'   game(g<,t,p) 'possible game configurations' ; parameter rating(p)  /    Joe    5.0,  Tom  4.2,  Sam   4.3,  Bill  5.1,    Fred   4.4,  John 3.7,  Wex   3.8,  Chip  4.6,    Mike   3.2,  Jeff 3.6,  Steve 3.8,  Kumar 4.7,    Connor 4.3,  Matt 4.6,  Peter 4.2,  Cindy 3.4/ ; *------------------------------------------------------ * generate all game configurations *------------------------------------------------------   $onEmbeddedCode Python: from itertools import combinations print("") k = 4 # k is number to select # n is number of players players = [p for p in gams.get("p")] n = len(players) print(f"number of players={n}") comb = list(combinations(range(n),k)) ncomb = len(comb) print(f"number of combinations={ncomb}") # for each combination A,B,C,D we have 3 different team arrangements: # team 1 team 2 # 1 A,B C,D # 2 A,C B,D # 3 A,D B,C ngame = ncomb*3 print(f"number of possible games={ngame}") game = [] nn = 0 # counter for c in comb: # configuration 1 nn = nn + 1 snn = f"game{nn}" game.append((snn,'team1',players[c[0]])) game.append((snn,'team1',players[c[1]])) game.append((snn,'team2',players[c[2]])) game.append((snn,'team2',players[c[3]])) # configuration 2 nn = nn + 1 snn = f"game{nn}" game.append((snn,'team1',players[c[0]])) game.append((snn,'team1',players[c[2]])) game.append((snn,'team2',players[c[1]])) game.append((snn,'team2',players[c[3]])) # configuration 3 nn = nn + 1 snn = f"game{nn}" game.append((snn,'team1',players[c[0]])) game.append((snn,'team1',players[c[3]])) game.append((snn,'team2',players[c[1]])) game.append((snn,'team2',players[c[2]])) # export to gams gams.set("game",game)$offEmbeddedCode game option game:0:0:4; display game; *------------------------------------------------------ * derived data to make the model simpler *------------------------------------------------------ alias (p,pp); set    lt(p,pp)  'p