Wimbledon 2010 Mixed Doubles (source) |
In [1] a question is asked about designing a sports schedule as follows:
- There are 16 players, 7 rounds, and 4 games per round.
- A game consists of 2 teams with 2 players per team.
- Each player has a rating. A team rating is the sum of the ratings of the two members.
- A player plays exactly one game in each round.
- A player can be partnered with another player only once.
- A player can play against another player only twice.
- Design a schedule that has the minimum difference in ratings between opposing teams.
Data
---- 40 SET p player Joe , Tom , Sam , Bill , Fred , John , Wex , Chip , Mike , Jeff , Steve Kumar , Connor, Matt , Peter , Cindy ---- 40 SET r rounds round1, round2, round3, round4, round5, round6, round7 ---- 40 SET g game game1, game2, game3, game4 ---- 40 SET t 2 teams per game team1, team2 ---- 40 PARAMETER rating Joe 5.000, Tom 4.200, Sam 4.300, Bill 5.100, Fred 4.400, John 3.700, Wex 3.800 Chip 4.600, Mike 3.200, Jeff 3.600, Steve 3.800, Kumar 4.700, Connor 4.300, Matt 4.600 Peter 4.200, Cindy 3.400
Model 1
MIP Model 1 | |
---|---|
Variables | |
\[\begin{align}&\color{darkred}x_{p,r,g,t} \in \{0,1\} && \text{Assignment of player $p$ to team $t$ in round $r$, game $g$}\\ &\color{darkred}{\mathit{isPartner}}_{p,p',r,g,t} \in [0,1] && \text{This is 1 if $p,p'$ are playing in the same team}\\ &\color{darkred}{\mathit{isOpponent}}_{p,p',r,g,t} \in [0,1] && \text{This is 1 if $p,p'$ are playing in the opposite team}\\ &\color{darkred}{\mathit{diff}}_{r,g} \ge 0 && \text{Rating difference between teams in game}\end{align}\] | |
Equations | |
\[\begin{align}\min&\sum_{r,g}\color{darkred}{\mathit{diff}}_{r,g} && && \mbox{Minimize the rating differences between teams}\\ &\sum_{g,t} \color{darkred}x_{p,r,g,t} = 1 && \forall p,r && \text{Player $p$ is scheduled in each round $r$} \\ &\sum_p \color{darkred}x_{p,r,g,t} = 2 && \forall r,g,t && \text{Teams consist of two persons}\\ &\color{darkred}{\mathit{isPartner}}_{p,p',r,g,t} \ge \color{darkred}x_{p,r,g,t}+\color{darkred}x_{p',r,g,t}-1 && \forall p\le p',r,g,t && \text{$p$ and $p'$ are partnering at $r,g,t$}\\ &\sum_{r,g,t}\color{darkred}{\mathit{isPartner}}_{p,p',r,g,t} \le 1 && \forall p\le p' && \text{Limit number of times $p,p'$ can partner} \\ &\color{darkred}{\mathit{isOpponent}}_{p,p',r,g,t} \ge \color{darkred}x_{p,r,g,t}+\color{darkred}x_{p',r,g,t++1}-1 && \forall p\le p',r,g,t && \text{$p$ and $p'$ are opponents at $r,g,t$} \\ &\sum_{r,g,t} \color{darkred}{\mathit{isOpponent}}_{p,p',r,g,t} \le 2 && \forall p\le p' && \text{Limit numbers of times $p,p'$ can be opponents} \\ &\color{darkred}{\mathit{diff}}_{r,g} = \sum_p\color{darkred}x_{p,r,g,\color{darkgreen}{\mathit{team2}}}\cdot\color{darkblue}{\mathit{rating}}_p - \sum_p\color{darkred}x_{p,r,g,\color{darkgreen}{\mathit{team1}}}\cdot\color{darkblue}{\mathit{rating}}_p && \forall r,g && \text{Force team 2 to be the higher rated team} \end{align}\] |
- We have some linearizations. I wanted to use \[\color{darkred}{\mathit{isPartner}}_{p,p',r,g,t} = \color{darkred}x_{p,r,g,t} \cdot \color{darkred}x_{p',r,g,t}\] A multiplication with binary variables \[\color{darkred}z=\color{darkred}x\cdot \color{darkred}y\] can be linearized as \[\begin{align} & \color{darkred}z \le \color{darkred}x \\ &\color{darkred}z \le \color{darkred}y \\ & \color{darkred}z\ge \color{darkred}x+\color{darkred}y-1\end{align}\] In our case we can drop the first two inequalities as we are only interesting in setting some upper bound on \(\color{darkred}{\mathit{isPartner}}\), as can be seen from \[\sum_{r,g,t}\color{darkred}{\mathit{isPartner}}_{p,p',r,g,t} \le 1\] So we can replace the nonlinear constraint by: \[\color{darkred}{\mathit{isPartner}}_{p,p',r,g,t} \ge \color{darkred}x_{p,r,g,t}+\color{darkred}x_{p',r,g,t}-1\]
- The other linearization contains a funny operator: \[\color{darkred}{\mathit{isOpponent}}_{p,p',r,g,t} \ge \color{darkred}x_{p,r,g,t}+\color{darkred}x_{p',r,g,t++1}-1\] The expression \(t\text{++}1\) means a circular lead. \(t\) stands for teams. We have \(t = \{\color{darkgreen}{\mathit{team1}},\color{darkgreen}{\mathit{team2}}\}\). In this case \(t\text{++}1\) means: "the other team". I.e. if \(t=\color{darkgreen}{\mathit{team1}}\) then \(t\text{++}1\) is \(\color{darkgreen}{\mathit{team2}}\). And vice versa. Note that I could have used \(t\text{--}1\) as well. A circular lag is identical to a circular lead when there are just two elements in the set.
- One would expect to minimize the sum of the absolute differences between the team ratings. Here I did not do this. As \(\color{darkred}{\mathit{diff}}_{r,g}\) is a non-negative variable, we enforce that team 2 has an equal or better rating than team 1. This is not an issue. Actually, the idea is to reduce symmetry in the problem to help it solve faster.
- The number of binary variables in this model is 896. Note that the variables \(\color{darkred}{\mathit{isPartner}}_{p,p',r,g,t}\) and \(\color{darkred}{\mathit{isOpponent}}_{p,p',r,g,t}\) can be either binary or continuous between zero and one. I chose to make them continous.
- This model is slow as molasses. See the results below. So we need to look at another formulation.
Model 2
---- 109 SET game possible game configurations
game1 .team1.Joe , game1 .team1.Tom , game1 .team2.Sam , game1 .team2.Bill
game2 .team1.Joe , game2 .team1.Sam , game2 .team2.Tom , game2 .team2.Bill
game3 .team1.Joe , game3 .team1.Bill , game3 .team2.Tom , game3 .team2.Sam
game4 .team1.Joe , game4 .team1.Tom , game4 .team2.Sam , game4 .team2.Fred
game5 .team1.Joe , game5 .team1.Sam , game5 .team2.Tom , game5 .team2.Fred
game6 .team1.Joe , game6 .team1.Fred , game6 .team2.Tom , game6 .team2.Sam
game7 .team1.Joe , game7 .team1.Tom , game7 .team2.Sam , game7 .team2.John
game8 .team1.Joe , game8 .team1.Sam , game8 .team2.Tom , game8 .team2.John
game9 .team1.Joe , game9 .team1.John , game9 .team2.Tom , game9 .team2.Sam
game10 .team1.Joe , game10 .team1.Tom , game10 .team2.Sam , game10 .team2.Wex
. . .
game5455.team1.Kumar , game5455.team1.Matt , game5455.team2.Peter , game5455.team2.Cindy
game5456.team1.Kumar , game5456.team1.Peter , game5456.team2.Matt , game5456.team2.Cindy
game5457.team1.Kumar , game5457.team1.Cindy , game5457.team2.Matt , game5457.team2.Peter
game5458.team1.Connor, game5458.team1.Matt , game5458.team2.Peter , game5458.team2.Cindy
game5459.team1.Connor, game5459.team1.Peter , game5459.team2.Matt , game5459.team2.Cindy
game5460.team1.Connor, game5460.team1.Cindy , game5460.team2.Matt , game5460.team2.Peter
MIP Model 2 |
---|
Variables |
\[\begin{align}&\color{darkred}x_{r,g} \in \{0,1\} && \text{Selection of games} \end{align}\] |
Equations |
\[\begin{align}\min&\sum_{r,g}\color{darkblue}{\mathit{ratingDiff}}_g \cdot \color{darkred}x_{r,g} && && \mbox{Minimize the rating differences between teams}\\ &\sum_g \color{darkred}x_{r,g} = 4 && \forall r && \text{Select 4 games in each round $r$} \\ &\sum_{g,t|\color{darkblue}{\mathit{game}}(g,t,p)} \color{darkred}x_{r,g} = 1 && \forall p,r && \text{Each person plays once in each round}\\ &\sum_{r,g|\color{darkblue}{\mathit{partner}}(g,p,p')}\color{darkred}x_{r,g} \le 1 && \forall p\le p' && \text{Limit number of times $p,p'$ can partner} \\ &\sum_{r,g|\color{darkblue}{\mathit{opponent}}(g,p,p')}\color{darkred}x_{r,g} \le 2 && \forall p\le p' && \text{Limit numbers of times $p,p'$ can be opponents} \end{align}\] |
- This model is much larger in terms of binary variables: we have 38,220 of them!
- However, it solves much faster. Showing once more that the number of discrete variables is not always a good measure of complexity.
- The generation of the combinations and games was done in Python. The rest of the model was formulated in GAMS.
- The solution is depicted below.
Results
Performance Statistics | ||
---|---|---|
Model 1 | Model 2 | |
Equations | 13,877 | 360 |
Variables | 14,365 | 38,221 |
Discrete Variables | 896 | 38,220 |
Objective | 10.5 | 0.7 |
Time | 1000 | 66 |
Status | Time Limit | Optimal |
Gap | 93% |
Obviously, the second model is doing much better. The author of [1] did a pretty good job.
Conclusions
References
- pulp scheduling program needs more efficient problem definition, https://stackoverflow.com/questions/67604569/pulp-scheduling-program-needs-more-efficient-problem-definition
Appendix A: Model 1
- The 2d set lt(p,pp) implements the conditions \(p \lt p'\). It is used throughout the models to prevent double-checking pairs of players.
- A circular lead ++ is used in this model. This can be replaced by a circular lag --, having the same result. It is not so often we can use this operator in a model, so worth pointing out.
|
Appendix B: Model 2
- Embedded Python is used to populate the set game.
- A note about domain errors. By default, moving data from embedded Python to its GAMS environment is unsafe: domain errors are silently ignored. (Obviously, that default is wrong: the default should be safety first). I should have used $OffFiltered. That will trigger errors when a domain error occurs. You can see the difference in behavior by mangling one of the "team1" strings inside the Python code. In general, except for very special cases, like we should use $loaddc instead of $load, we should use $OffFiltered if embedded Python code is used.
$ontext $onEmbeddedCode Python: from itertools import combinations $offEmbeddedCode game |
Nice article. There might be a market for scheduling in Switzerland . Btw: we tennis players play matches of which games are a part.
ReplyDeleteI don't really know if it was about tennis... I just picked the first picture I saw.
Delete