Yet Another Math Programming Consultant: Another sports schedule problem

Wimbledon 2010 Mixed Doubles (source)

In [1] a question is asked about designing a sports schedule as follows:

There are 16 players, 7 rounds, and 4 games per round.
A game consists of 2 teams with 2 players per team.
Each player has a rating. A team rating is the sum of the ratings of the two members.
A player plays exactly one game in each round.
A player can be partnered with another player only once.
A player can play against another player only twice.
Design a schedule that has the minimum difference in ratings between opposing teams.

I'll formulate two different models. One with a small number of binary variables. And one with a large number of binary variables. As we shall see the model with the larger number of variables will win.

Data

Here is a recapitulation of the data.

----     40 SET p  player

Joe   ,    Tom   ,    Sam   ,    Bill  ,    Fred  ,    John  ,    Wex   ,    Chip  ,    Mike  ,    Jeff  ,    Steve 
Kumar ,    Connor,    Matt  ,    Peter ,    Cindy 


----     40 SET r  rounds

round1,    round2,    round3,    round4,    round5,    round6,    round7


----     40 SET g  game

game1,    game2,    game3,    game4


----     40 SET t  2 teams per game

team1,    team2


----     40 PARAMETER rating  

Joe    5.000,    Tom    4.200,    Sam    4.300,    Bill   5.100,    Fred   4.400,    John   3.700,    Wex    3.800
Chip   4.600,    Mike   3.200,    Jeff   3.600,    Steve  3.800,    Kumar  4.700,    Connor 4.300,    Matt   4.600
Peter  4.200,    Cindy  3.400

Model 1

Here we model directly the assignment of players to teams.

MIP Model 1
Variables
\[\begin{align}&\color{darkred}x_{p,r,g,t} \in \{0,1\} && \text{Assignment of player $p$ to team $t$ in round $r$, game $g$}\\ &\color{darkred}{\mathit{isPartner}}_{p,p',r,g,t} \in [0,1] && \text{This is 1 if $p,p'$ are playing in the same team}\\ &\color{darkred}{\mathit{isOpponent}}_{p,p',r,g,t} \in [0,1] && \text{This is 1 if $p,p'$ are playing in the opposite team}\\ &\color{darkred}{\mathit{diff}}_{r,g} \ge 0 && \text{Rating difference between teams in game}\end{align}\]
Equations
\[\begin{align}\min&\sum_{r,g}\color{darkred}{\mathit{diff}}_{r,g} && && \mbox{Minimize the rating differences between teams}\\ &\sum_{g,t} \color{darkred}x_{p,r,g,t} = 1 && \forall p,r && \text{Player $p$ is scheduled in each round $r$} \\ &\sum_p \color{darkred}x_{p,r,g,t} = 2 && \forall r,g,t && \text{Teams consist of two persons}\\ &\color{darkred}{\mathit{isPartner}}_{p,p',r,g,t} \ge \color{darkred}x_{p,r,g,t}+\color{darkred}x_{p',r,g,t}-1 && \forall p\le p',r,g,t && \text{$p$ and $p'$ are partnering at $r,g,t$}\\ &\sum_{r,g,t}\color{darkred}{\mathit{isPartner}}_{p,p',r,g,t} \le 1 && \forall p\le p' && \text{Limit number of times $p,p'$ can partner} \\ &\color{darkred}{\mathit{isOpponent}}_{p,p',r,g,t} \ge \color{darkred}x_{p,r,g,t}+\color{darkred}x_{p',r,g,t++1}-1 && \forall p\le p',r,g,t && \text{$p$ and $p'$ are opponents at $r,g,t$} \\ &\sum_{r,g,t} \color{darkred}{\mathit{isOpponent}}_{p,p',r,g,t} \le 2 && \forall p\le p' && \text{Limit numbers of times $p,p'$ can be opponents} \\ &\color{darkred}{\mathit{diff}}_{r,g} = \sum_p\color{darkred}x_{p,r,g,\color{darkgreen}{\mathit{team2}}}\cdot\color{darkblue}{\mathit{rating}}_p - \sum_p\color{darkred}x_{p,r,g,\color{darkgreen}{\mathit{team1}}}\cdot\color{darkblue}{\mathit{rating}}_p && \forall r,g && \text{Force team 2 to be the higher rated team} \end{align}\]

There are some interesting wrinkles here.

We have some linearizations. I wanted to use \[\color{darkred}{\mathit{isPartner}}_{p,p',r,g,t} = \color{darkred}x_{p,r,g,t} \cdot \color{darkred}x_{p',r,g,t}\] A multiplication with binary variables \[\color{darkred}z=\color{darkred}x\cdot \color{darkred}y\] can be linearized as \[\begin{align} & \color{darkred}z \le \color{darkred}x \\ &\color{darkred}z \le \color{darkred}y \\ & \color{darkred}z\ge \color{darkred}x+\color{darkred}y-1\end{align}\] In our case we can drop the first two inequalities as we are only interesting in setting some upper bound on $\color{darkred}{\mathit{isPartner}}$, as can be seen from \[\sum_{r,g,t}\color{darkred}{\mathit{isPartner}}_{p,p',r,g,t} \le 1\] So we can replace the nonlinear constraint by: \[\color{darkred}{\mathit{isPartner}}_{p,p',r,g,t} \ge \color{darkred}x_{p,r,g,t}+\color{darkred}x_{p',r,g,t}-1\]
The other linearization contains a funny operator: \[\color{darkred}{\mathit{isOpponent}}_{p,p',r,g,t} \ge \color{darkred}x_{p,r,g,t}+\color{darkred}x_{p',r,g,t++1}-1\] The expression $t\text{++}1$ means a circular lead. $t$ stands for teams. We have $t = \{\color{darkgreen}{\mathit{team1}},\color{darkgreen}{\mathit{team2}}\}$. In this case $t\text{++}1$ means: "the other team". I.e. if $t=\color{darkgreen}{\mathit{team1}}$ then $t\text{++}1$ is $\color{darkgreen}{\mathit{team2}}$. And vice versa. Note that I could have used $t\text{--}1$ as well. A circular lag is identical to a circular lead when there are just two elements in the set.
One would expect to minimize the sum of the absolute differences between the team ratings. Here I did not do this. As $\color{darkred}{\mathit{diff}}_{r,g}$ is a non-negative variable, we enforce that team 2 has an equal or better rating than team 1. This is not an issue. Actually, the idea is to reduce symmetry in the problem to help it solve faster.
The number of binary variables in this model is 896. Note that the variables $\color{darkred}{\mathit{isPartner}}_{p,p',r,g,t}$ and $\color{darkred}{\mathit{isOpponent}}_{p,p',r,g,t}$ can be either binary or continuous between zero and one. I chose to make them continous.
This model is slow as molasses. See the results below. So we need to look at another formulation.

Model 2

In this model, we try to enumerate the possible games. First, we can enumerate the combinations of four players. As we have $n=16$ players, the number of combinations of four players is \[{16 \choose 4}=1820\] Four players A,B,C,D can play in three different ways: \[\begin{array}{cc} \text{team 1} & \text{team 2}\\ \hline A,B & C,D \\ A,C & B,D \\ A,D & B,C\end{array}\] This means, we have 5460 possible games. I generated them all. This looks like:

----    109 SET game  possible game configurations

game1   .team1.Joe   ,    game1   .team1.Tom   ,    game1   .team2.Sam   ,    game1   .team2.Bill  
game2   .team1.Joe   ,    game2   .team1.Sam   ,    game2   .team2.Tom   ,    game2   .team2.Bill  
game3   .team1.Joe   ,    game3   .team1.Bill  ,    game3   .team2.Tom   ,    game3   .team2.Sam   
game4   .team1.Joe   ,    game4   .team1.Tom   ,    game4   .team2.Sam   ,    game4   .team2.Fred  
game5   .team1.Joe   ,    game5   .team1.Sam   ,    game5   .team2.Tom   ,    game5   .team2.Fred  
game6   .team1.Joe   ,    game6   .team1.Fred  ,    game6   .team2.Tom   ,    game6   .team2.Sam   
game7   .team1.Joe   ,    game7   .team1.Tom   ,    game7   .team2.Sam   ,    game7   .team2.John  
game8   .team1.Joe   ,    game8   .team1.Sam   ,    game8   .team2.Tom   ,    game8   .team2.John  
game9   .team1.Joe   ,    game9   .team1.John  ,    game9   .team2.Tom   ,    game9   .team2.Sam   
game10  .team1.Joe   ,    game10  .team1.Tom   ,    game10  .team2.Sam   ,    game10  .team2.Wex   
  . . .
game5455.team1.Kumar ,    game5455.team1.Matt  ,    game5455.team2.Peter ,    game5455.team2.Cindy 
game5456.team1.Kumar ,    game5456.team1.Peter ,    game5456.team2.Matt  ,    game5456.team2.Cindy 
game5457.team1.Kumar ,    game5457.team1.Cindy ,    game5457.team2.Matt  ,    game5457.team2.Peter 
game5458.team1.Connor,    game5458.team1.Matt  ,    game5458.team2.Peter ,    game5458.team2.Cindy 
game5459.team1.Connor,    game5459.team1.Peter ,    game5459.team2.Matt  ,    game5459.team2.Cindy 
game5460.team1.Connor,    game5460.team1.Cindy ,    game5460.team2.Matt  ,    game5460.team2.Peter

This means we "just" have to select four different games from this set for each round (subject to some constraints) and we are done. As we have all this precalculated data, we can also precalculate the rating difference in each game. In the model below this is stored as $\color{darkblue}{\mathit{ratingDiff}}_g$. Similarly we can precalculate the sets $\color{darkblue}{\mathit{partner}}_{g,p,p'}$ and $\color{darkblue}{\mathit{opponent}}_{g,p,p'}$ to indicate if $p,p'$ are partners or opponents in game $g$.

MIP Model 2
Variables
\[\begin{align}&\color{darkred}x_{r,g} \in \{0,1\} && \text{Selection of games} \end{align}\]
Equations
\[\begin{align}\min&\sum_{r,g}\color{darkblue}{\mathit{ratingDiff}}_g \cdot \color{darkred}x_{r,g} && && \mbox{Minimize the rating differences between teams}\\ &\sum_g \color{darkred}x_{r,g} = 4 && \forall r && \text{Select 4 games in each round $r$} \\ &\sum_{g,t\|\color{darkblue}{\mathit{game}}(g,t,p)} \color{darkred}x_{r,g} = 1 && \forall p,r && \text{Each person plays once in each round}\\ &\sum_{r,g\|\color{darkblue}{\mathit{partner}}(g,p,p')}\color{darkred}x_{r,g} \le 1 && \forall p\le p' && \text{Limit number of times $p,p'$ can partner} \\ &\sum_{r,g\|\color{darkblue}{\mathit{opponent}}(g,p,p')}\color{darkred}x_{r,g} \le 2 && \forall p\le p' && \text{Limit numbers of times $p,p'$ can be opponents} \end{align}\]

Notes:

This model is much larger in terms of binary variables: we have 38,220 of them!
However, it solves much faster. Showing once more that the number of discrete variables is not always a good measure of complexity.
The generation of the combinations and games was done in Python. The rest of the model was formulated in GAMS.
The solution is depicted below.

Results

The solution looks like:

The red and blue cells show the team number. Each row represents an individual game, so it will have two red and two blue cells. The optimizer did a good job: many games have a rating difference of zero.

The differences in performance of the models:

Performance Statistics
	Model 1	Model 2
Equations	13,877	360
Variables	14,365	38,221
Discrete Variables	896	38,220
Objective	10.5	0.7
Time	1000	66
Status	Time Limit	Optimal
Gap	93%

Obviously, the second model is doing much better. The author of [1] did a pretty good job.

Conclusions

In this post, I implemented two different models for the scheduling problem. The first is smaller in terms of the number of binary variables. The second one requires us to do some work upfront: generating all possible game configurations. Once we have done that, the resulting model is very large but rather simple and efficient to solve. This is similar to a set-covering model approach where we see a likewise behavior: the resulting models are large but easy to solve. This is a useful approach that can be quite helpful in certain cases.

The GAMS implemention of the second model is also interesting. We can use embedded Python code (with the wonderful itertools package) to generate the possible games.

References

pulp scheduling program needs more efficient problem definition, https://stackoverflow.com/questions/67604569/pulp-scheduling-program-needs-more-efficient-problem-definition

Appendix A: Model 1

Notes:

The 2d set lt(p,pp) implements the conditions $p \lt p'$. It is used throughout the models to prevent double-checking pairs of players.
A circular lead ++ is used in this model. This can be replaced by a circular lag --, having the same result. It is not so often we can use this operator in a model, so worth pointing out.

$ontext

There are 16 players and 7 rounds. Each round consists of 4 games.
Each game is two players vs. two players. Each player has a rating.
The objective is to minimize the absolute rating difference between
the teams. The constraints are that every player:

-must play 7 rounds
-can only play 1 game per round
-can only be partnered with another player 0 or 1 times
-can only be opponent of another player 0, 1, or 2 times

From:
https://stackoverflow.com/questions/67604569/pulp-scheduling-program-needs-more-efficient-problem-definition

This model does not work very well. It is not able to find a good schedule
within a reasonable time.

$offtext

*------------------------------------------------------
* data
*------------------------------------------------------

set
p 'player'
     /Joe, Tom, Sam, Bill, Fred, John, Wex, Chip,
       Mike, Jeff, Steve, Kumar, Connor, Matt, Peter, Cindy/
r 'rounds' /round1*round7/
g 'game' /game1*game4/
t '2 teams per game' /team1,team2/
;
alias(p,pp);

parameter rating(p) /
   Joe    5.0, Tom 4.2, Sam   4.3, Bill 5.1,
   Fred   4.4, John 3.7, Wex   3.8, Chip 4.6,
   Mike   3.2, Jeff 3.6, Steve 3.8, Kumar 4.7,
   Connor 4.3, Matt 4.6, Peter 4.2, Cindy 3.4/
;

display p,r,g,t,rating;

set lt(p,pp) 'combinations of players with p<pp';
lt(p,pp) = ord(p)<ord(pp);

*------------------------------------------------------
* optimization model
*------------------------------------------------------

variables
   x(p,r,g,t)              'schedule'
   isPartner(p,pp,r,g,t)   '1 if p and pp are partners'
   isOpponent(p,pp,r,g,t) '1 if p and pp are opponents'
   diff(r,g)               'rating difference'
   z                       'objective'
;
binary variable x;
positive variable isPartner,isOpponent,diff;

isPartner.up(lt(p,pp),r,g,t) = 1;
isOpponent.up(lt(p,pp),r,g,t) = 1;

equations
   play(p,r)           'player is scheduled in each round'
   team(r,g,t)         'team consist of 2 persons'
   partner(p,pp,r,g,t) 'p,pp are partners'
   oppose(p,pp,r,g,t) 'p,pp are in opposing teams'
   maxPartner(p,pp)    '0 or 1 times partnered with same person'
   maxOppose(p,pp)     '0,1 or 2 times opposed to the same person'
   ediff(r,g)          'difference in rating'
   objective           'sum of absolute values'
;

* player p is scheduled in each round r
play(p,r)..
   sum((g,t),x(p,r,g,t)) =e= 1;

* a team consist of 2 persons
team(r,g,t)..
   sum(p,x(p,r,g,t)) =e= 2;

* p,pp are partners at r,g,t
partner(lt(p,pp),r,g,t)..
   isPartner(p,pp,r,g,t) =g= x(p,r,g,t)+x(pp,r,g,t)-1;

* limit numbers of times p,pp can be partners
maxPartner(lt(p,pp))..
   sum((r,g,t),isPartner(p,pp,r,g,t)) =l= 1;

* p,pp are opponents at r,g,t (pp will be at opposite team of t)
oppose(lt(p,pp),r,g,t)..
   isOpponent(p,pp,r,g,t) =g= x(p,r,g,t)+x(pp,r,g,t++1)-1;

* limit numbers of times p,pp can be opponents
maxOppose(lt(p,pp))..
   sum((r,g,t),isOpponent(p,pp,r,g,t)) =l= 2;

* we force team 2 to be the higher-rated team
* that is removing some symmetry in the model
ediff(r,g)..
   diff(r,g) =e= sum(p,x(p,r,g,'team2')*rating(p)) - sum(p,x(p,r,g,'team1')*rating(p));

* minimize the difference between team ratings
objective..
   z =e= sum((r,g),diff(r,g));

*------------------------------------------------------
* solve
*------------------------------------------------------

model sched /all/;
option optcr=0, threads=8, reslim=1000;
solve sched minimizing z using mip;

Appendix B: Model 2

Notes:

Embedded Python is used to populate the set game.
A note about domain errors. By default, moving data from embedded Python to its GAMS environment is unsafe: domain errors are silently ignored. (Obviously, that default is wrong: the default should be safety first). I should have used $OffFiltered. That will trigger errors when a domain error occurs. You can see the difference in behavior by mangling one of the "team1" strings inside the Python code. In general, except for very special cases, like we should use $loaddc instead of $load, we should use $OffFiltered if embedded Python code is used.

$onEmbeddedCode Python:

from itertools import combinations
print("")

k = 4 # k is number to select

# n is number of players
players = [p for p in gams.get("p")]
n = len(players)
print(f"number of players={n}")

comb = list(combinations(range(n),k))
ncomb = len(comb)
print(f"number of combinations={ncomb}")

# for each combination A,B,C,D we have 3 different team arrangements:
#       team 1    team 2
# 1     A,B        C,D
# 2     A,C        B,D
# 3     A,D        B,C

ngame = ncomb*3
print(f"number of possible games={ngame}")

game = []
nn = 0 # counter
for c in comb:

   # configuration 1
   nn = nn + 1
   snn = f"game{nn}"
   game.append((snn,'team1',players[c[0]]))
   game.append((snn,'team1',players[c[1]]))
   game.append((snn,'team2',players[c[2]]))
   game.append((snn,'team2',players[c[3]]))

   # configuration 2
   nn = nn + 1
   snn = f"game{nn}"
   game.append((snn,'team1',players[c[0]]))
   game.append((snn,'team1',players[c[2]]))
   game.append((snn,'team2',players[c[1]]))
   game.append((snn,'team2',players[c[3]]))

   # configuration 3
   nn = nn + 1
   snn = f"game{nn}"
   game.append((snn,'team1',players[c[0]]))
   game.append((snn,'team1',players[c[3]]))
   game.append((snn,'team2',players[c[1]]))
   game.append((snn,'team2',players[c[2]]))

# export to gams
gams.set("game",game)

$offEmbeddedCode game

option game:0:0:4;
display game;

*------------------------------------------------------
* derived data to make the model simpler
*------------------------------------------------------

alias (p,pp);

set
lt(p,pp) 'p<pp: used to limit comparisons'
   partner(g,p,pp) 'in game g, do p,pp play in same team'
   opponent(g,p,pp) 'in game g, do p,pp play in other team'
;
lt(p,pp) = ord(p)<ord(pp);
partner(g,lt(p,pp)) = sum(t$(game(g,t,p) and game(g,t,pp)),1);
opponent(g,lt(p,pp)) = sum(t$(game(g,t,p) and game(g,t++1,pp)),1);

parameter ratingDiff(g) 'difference in rating for game g';
ratingDiff(g) = abs(sum(game(g,'team1',p),rating(p)) - sum(game(g,'team2',p),rating(p)));

*------------------------------------------------------
* optimization model
*------------------------------------------------------

variables
   x(r,g)   'schedule'
   z        'objective'
;
binary variable x;

equations
   configuration(r)    'exactly 4 games for each round'
   play(p,r)           'player is scheduled exactly once in each round'
   epartner(p,pp)       'limit number of times p,pp are member of same team'
   eoppose(p,pp)        'limit number of times p,pp are member of opposing team'
   objective
;

* exactly four games for each round
configuration(r)..
   sum(g,x(r,g)) =e= 4;

* player p is scheduled in each round r exactly once
play(p,r)..
   sum(game(g,t,p), x(r,g)) =e= 1;

* limit number of times p,pp are member of same team
epartner(lt(p,pp))..
   sum((r,partner(g,p,pp)),x(r,g)) =l= 1;

* limit number of times p,pp are member of opposing team
eoppose(lt(p,pp))..
   sum((r,opponent(g,p,pp)),x(r,g)) =l= 2;

* minimize rate difference
objective..
   z =e= sum((r,g),x(r,g)*ratingDiff(g));

*------------------------------------------------------
* solve
*------------------------------------------------------

model sched /all/;
option optcr=0, threads=8, reslim=1000;
solve sched minimizing z using mip;

*------------------------------------------------------
* reporting
*------------------------------------------------------

x.l(r,g) = round(x.l(r,g));

parameter res(*,*,*) 'Results. Teams=1/2';
res(r,g,p)$x.l(r,g) = sum(game(g,t,p),ord(t));
res(r,g,'diff')$x.l(r,g) = x.l(r,g)*ratingDiff(g);

option res:1;
display res;

Yet Another Math Programming Consultant

Monday, July 5, 2021

Another sports schedule problem

Data

Model 1

Model 2

Results

Conclusions

References

Appendix A: Model 1

Appendix B: Model 2

2 comments: