- We have a number of legs with their lengths (in miles). E.g.
legs = [3.3, 4.2, 5.2, 3, 2.7, 4, 5.3, 4.5, 3, 5.8, 3.3, 4.9, 3.1, 3.2, 4, 3.5, 4.9, 2.3, 3.2, 4.6, 4.5, 4, 5.3, 5.9, 2.8, 1.9, 2.1, 3, 2.5, 5.6, 1.3, 4.6, 1.5, 1.2, 4.1, 8.1]
- We also have a number of runs performed by runners. Again, we have distances in miles:
runs = [3.2, 12.3, 5.2, 2.9, 2.9, 5.5]
A run can be allocated to different legs, as long a the sum of these legs is less than the length of the run. - We want to maximize the number of legs covered by the runs, but with a wrinkle: the legs should be consecutive: 1,2,3,... (we can't skip a leg).
- A solution for this data set can look like:
---- 60 PARAMETER report results run1 run2 run3 run4 run5 run6 leg1 3.3 leg2 4.2 leg3 5.2 leg4 3.0 leg5 2.7 leg6 4.0 leg7 5.3 total 3.0 11.5 5.2 2.7 5.3 runLen 3.2 12.3 5.2 2.9 2.9 5.5
It is noted that solutions are not necessarily unique. In this case, we could have swapped run 4 for run 5 to cover leg 5.
I am not sure what a Ragnar relay [2] is, but I suspect this has to do with how these legs are assigned to different runners.
The text in the original question [1] was not immediately clear to me. English prose is a poor vehicle to succinctly describe a problem. Mathematics is much better for this. It is (or should be) compact, precise and unambiguous. My point is that a mathematical model, besides the input for a solver, is also an excellent communication tool.
Mathematical model
We define two sets of binary variables: \[\begin{align}&\color{darkred}{\mathit{assign}}_{i,j} = \begin{cases}1 & \text{if leg $i$ is assigned to run $j$} \\ 0 & \text{otherwise} \end{cases} \\ & \color{darkred}{\mathit{covered}}_i = \begin{cases}1 & \text{if leg $i$ is covered by a run} \\ 0& \text{otherwise}\end{cases}\end{align}\] With this we can write:
MIP Model |
---|
\[\begin{align}\max&\sum_i \color{darkred}{\mathit{covered}}_i \\ & \sum_i \color{darkblue}{\mathit{legs}}_i \cdot \color{darkred}{\mathit{assign}}_{i,j} \le \color{darkblue}{\mathit{runs}}_j && \forall j \\ & \color{darkred}{\mathit{covered}}_i \le \sum_j \color{darkred}{\mathit{assign}}_{i,j} && \forall i \\ & \color{darkred}{\mathit{covered}}_{i-1} \ge \color{darkred}{\mathit{covered}}_i && \forall i \gt 1 \\ & \color{darkred}{\mathit{covered}}_i \in \{0,1\} \\& \color{darkred}{\mathit{assign}}_{i,j} \in \{0,1\} \end{align}\] |
The GAMS model is reproduced in the appendix below. It follows this model quite literally.
It may be tempting just to print the optimal values for the variable \(\color{darkred}{\mathit{assign}}_{i,j}\) as solution report. However, there may be some spurious assignments that are not part of the covered legs. Here is an example:
---- 56 VARIABLE assign.L assign leg to runner run1 run2 run3 run4 run5 run6 leg1 1 leg2 1 leg3 1 leg4 1 leg5 1 leg6 1 leg7 1 leg18 1 ---- 56 VARIABLE covered.L leg is covered leg1 1, leg2 1, leg3 1, leg4 1, leg5 1, leg6 1, leg7 1 ---- 56 VARIABLE numLegs.L = 7 number of covered legs
The assignment to leg 18 should not be part of the optimal solution presented to the user. For that reason the parameter report was calculated as: \[\color{darkblue}{\mathit{report}}_{i,j} := \color{darkred}{\mathit{}assign}^*_{i,j}\cdot\color{darkred}{\mathit{covered}}^*_i \cdot\color{darkblue}{\mathit{legs}}_i\]
CVXPY
Model in Matrix Notation |
---|
\[\begin{align}\max\>&\color{darkblue}e^T\cdot\color{darkred}{\mathit{covered}}\\ & \color{darkred}{\mathit{assign}}^T\cdot\color{darkblue}{\mathit{legs}} \le \color{darkblue}{\mathit{runs}} \\ & \color{darkred}{\mathit{covered}} \le \color{darkred}{\mathit{assign}} \cdot\color{darkblue}e \\& \color{darkred}{\mathit{covered}}[1:n-1]\ge\color{darkred}{\mathit{covered}}[2:n]\\& \color{darkred}{\mathit{assign}},\color{darkred}{\mathit{covered}} \in \{0,1\} \end{align}\] |
Here \(\color{darkblue}e\) is a (column) vector of ones of appropriate size, The notation \(a[1:n]\) indicates taking the first \(n\) numbers from vector \(a\). I personally find the indexed model more intuitive and easier to read and write. Fortunately, CVXPY has a few convenience functions to calculate sums, so we can write:
assign = cp.Variable((n,m),boolean=True) covered = cp.Variable(n,boolean=True) prob = cp.Problem(cp.Maximize(cp.sum(covered)), [ assign.T@legs <= runs, covered <= cp.sum(assign,axis=1), covered[1:n]<=covered[0:(n-1)] ]) prob.solve()
Sidenote: I am not the only one confused by row and column sums. This is from the R documentation:
rowsum: Give Column Sums of a Matrix or Data Frame, Based on a Grouping Variable
The complete CVXPY model is reproduced in appendix B. The results look like:
Elapsed 0.033 sec. Results: 0 1 2 3 4 5 0 0.0 3.3 0.0 0.0 0.0 0.0 1 0.0 4.2 0.0 0.0 0.0 0.0 2 0.0 0.0 5.2 0.0 0.0 0.0 3 3.0 0.0 0.0 0.0 0.0 0.0 4 0.0 0.0 0.0 0.0 2.7 0.0 5 0.0 4.0 0.0 0.0 0.0 0.0 6 0.0 0.0 0.0 0.0 0.0 5.3
runs = [20.0, 5.4, 3.3, 26.4, 2.4, 8.6, 14.6, 20]
is much more difficult for GLPK:
Elapsed 3.2e+03 sec. Results: 0 1 2 3 4 5 6 7 0 0.0 0.0 3.3 0.0 0.0 0.0 0.0 0.0 1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 4.2 2 0.0 0.0 0.0 0.0 0.0 0.0 5.2 0.0 3 0.0 0.0 0.0 3.0 0.0 0.0 0.0 0.0 4 2.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 5 4.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 6 0.0 5.3 0.0 0.0 0.0 0.0 0.0 0.0 7 4.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8 3.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 9 5.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 10 0.0 0.0 0.0 0.0 0.0 3.3 0.0 0.0 11 0.0 0.0 0.0 0.0 0.0 0.0 0.0 4.9 12 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.1 13 0.0 0.0 0.0 3.2 0.0 0.0 0.0 0.0 14 0.0 0.0 0.0 4.0 0.0 0.0 0.0 0.0 15 0.0 0.0 0.0 0.0 0.0 0.0 3.5 0.0 16 0.0 0.0 0.0 4.9 0.0 0.0 0.0 0.0 17 0.0 0.0 0.0 0.0 2.3 0.0 0.0 0.0 18 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.2 19 0.0 0.0 0.0 0.0 0.0 0.0 0.0 4.6 20 0.0 0.0 0.0 4.5 0.0 0.0 0.0 0.0 21 0.0 0.0 0.0 4.0 0.0 0.0 0.0 0.0 22 0.0 0.0 0.0 0.0 0.0 5.3 0.0 0.0 23 0.0 0.0 0.0 0.0 0.0 0.0 5.9 0.0 24 0.0 0.0 0.0 2.8 0.0 0.0 0.0 0.0
Of course, Cplex has no problem: less than a second. Again, the solution is not unique:
---- 60 PARAMETER report results run1 run2 run3 run4 run5 run6 run7 run8 leg1 3.3 leg2 4.2 leg3 5.2 leg4 3.0 leg5 2.7 leg6 4.0 leg7 5.3 leg8 4.5 leg9 3.0 leg10 5.8 leg11 3.3 leg12 4.9 leg13 3.1 leg14 3.2 leg15 4.0 leg16 3.5 leg17 4.9 leg18 2.3 leg19 3.2 leg20 4.6 leg21 4.5 leg22 4.0 leg23 5.3 leg24 5.9 leg25 2.8 total 20.0 5.3 3.3 26.4 2.3 8.6 14.6 20.0 runLen 20.0 5.4 3.3 26.4 2.4 8.6 14.6 20.0
Can we help glpk a bit? Yes, we can replace the constraint \[\color{darkred}{\mathit{covered}}_i \le \sum_j \color{darkred}{\mathit{assign}}_{i,j}\] by \[\color{darkred}{\mathit{covered}}_i = \sum_j \color{darkred}{\mathit{assign}}_{i,j}\] This will reduce the glpk solution time from 3200 seconds to 340 seconds. The new constraint reduces the size of the feasible region in two ways. First, we allow only one run \(j\) to cover a leg \(i\). Allow more than one run does not help us finding the maximum coverage of legs. Also, if there is an \(\color{darkred}{\mathit{assign}}_{i,j}=1\) then immediately we force the corresponding \(\color{darkred}{\mathit{covered}}_i=1\) instead of relying on the objective. This reformulation is slightly diverging from the description in [1], but still calculates the largest set of consecutive legs that we can cover.
Another idea is to do a bit of manual presolve: don't generate the variable \(\color{darkred}{\mathit{assign}}_{i,j}\) when \(\color{darkblue}{\mathit{legs}}_i \gt \color{darkblue}{\mathit{runs}}_j\). In GAMS this is easy to do, but I am not sure how to approach this in CVXPY. Without trying, I am not sure if this has any effect on the performance of GLPK. It depends a bit on the quality of the presolver in GLPK.
Conclusions
- CVXPY models can look very clean and compact. But there are some problems:
- We can't see what CVXPY generates. How to debug CVXPY models?
- Jupyter notebooks may not show solvers logs. This is very unfortunate, as they produce valuable feedback on the solver's behavior.
- Some important modeling constructs are difficult or impossible to express in CVXPY.
- Matrix notation is not always easy to read and write and is limited to 0, 1, and 2-dimensional constructs.
- All indexing is positional.
- GAMS uses indexed models. This provides great flexibility and readability.
- Indexing by strings and referential integrity (domain checking) is a boon for large models.
- Printing of the results in GAMS is cleaner and the output is easier to read.
References
- Algorithm on how to put as many numbers of a list into different capacity of buckets, https://stackoverflow.com/questions/69273836/algorithm-on-how-to-put-as-many-numbers-of-a-list-into-different-capacity-of-buc
- Ragnar Relay Series, https://en.wikipedia.org/wiki/Ragnar_Relay_Series
Appendix A: GAMS code
$ontext |
- The notation order(i-1) in the equation definition means we generate the equation for all but the first element of set \(i\).
- The parameter report is declared as report(*,*). This will disable domain checking so we can add totals to this parameter.
Appendix B: CVXPY model
import cvxpy as cp import pandas as pd import time legs = [3.3, 4.2, 5.2, 3, 2.7, 4, 5.3, 4.5, 3, 5.8, 3.3, 4.9, 3.1, 3.2, 4, 3.5, 4.9, 2.3, 3.2, 4.6, 4.5, 4, 5.3, 5.9, 2.8, 1.9, 2.1, 3, 2.5, 5.6, 1.3, 4.6, 1.5, 1.2, 4.1, 8.1] runs = [3.2, 12.3, 5.2, 2.9, 2.9, 5.5] #runs = [20.0, 5.4, 3.3, 26.4, 2.4, 8.6, 14.6, 20] n = len(legs) m = len(runs) # # model # assign = cp.Variable((n,m),boolean=True) covered = cp.Variable(n,boolean=True) prob = cp.Problem(cp.Maximize(cp.sum(covered)), [ assign.T@legs <= runs, covered <= cp.sum(assign,axis=1), covered[1:n]<=covered[0:(n-1)] ]) # # solve # t0 = time.time() prob.solve(solver=cp.GLPK_MI,verbose=True) #prob.solve(solver=cp.CPLEX,verbose=True) print() # print(f"Solve time: {prob.solver_stats.solve_time}") # does not seem to work print(f"Elapsed {time.time()-t0:{0}.{2}} sec.") # # reporting # L = round(prob.value) result = assign.value[0:L,] for i in range(L): for j in range(m): result[i,j] *= covered.value[i]*legs[i] print("Results:") print(pd.DataFrame(result))
Notes:
- When running under Jupyter (browser-based), we don't see the solver log for glpk (but cplex shows the log).
- The solve_time attribute does not seem to work.
- Cplex can be used after running: pip install cplex. This model (with both data sets) fits in the community edition of Cplex.
No comments:
Post a Comment