I am again looking at simple models based on the transportation model:
Dense Transportation Model |
---|
\[\begin{align}\min&\sum_{i,j}\color{darkblue}c_{i,j}\cdot\color{darkred}x_{i,j}\\ & \sum_j \color{darkred}x_{i,j} \le \color{darkblue}a_i && \forall i \\& \sum_i \color{darkred}x_{i,j} \ge \color{darkblue}b_j && \forall j \\ & \color{darkred}x_{i,j} \ge 0 \end{align}\] |
This is a good time to discuss what happens when some links can not be used. When a significant number of links cannot be used, the underlying bipartite graph is called sparse. Let's discuss some of the approaches often used to handle this.
- Use a dense formulation with a large cost coefficient \(\color{darkblue}c_{i,j}=99999\) if the link \(i\rightarrow j\) should not be used. This is my least favorite approach. We need to invent a large penalty cost (should we use 99999 or 999999 or something even more extreme?). This may make the problem poorly scaled.
- Use a dense formulation, but fix \(\color{darkblue}x_{i,j}=0\) for the non-existing links. The solver will remove these fixed variables in the presolve phase before starting to iterate.
- In GAMS we can add to this model.holdfixed=1;. This will remove all fixed variables from the model even before passing on the model to the solver.
- Use a sparse formulation, where don't even generate the non-existing links. I.e. only generate \(\color{darkblue}x_{i,j}\) for existing links. This is my favorite approach. It is the most work, but it is also the most explicit.
- During model generation
- The solver receives a larger model
- Hopefully, it can presolve things away
- The solver has to re-insert solution values during postsolve
- The modeling system receives a larger solution
- 500 supply nodes
- 500 demand nodes
- 80% of the arcs are removed
---- 132 PARAMETER results performance comparison c x.fx holdfixed sparse Variables 250001.000 250001.000 49667.000 49667.000 Equations 1001.000 1001.000 1001.000 1001.000 Nonzero elem. 750001.000 549667.000 148999.000 148999.000 Modelstat Optimal Optimal Optimal Optimal Objective 12015.218 12015.218 12015.218 12015.218 Generation time 0.242 0.301 0.158 0.096 Solver time 0.468 0.331 0.107 0.112 Iterations 967.000 1351.000 1351.000 1351.000
Unexpectedly the first method needs the fewest iterations. The models holdfixed and sparse are almost identical (which makes sense) and are the winners. In general, however, timings are just quite good for this model.
Python: PuLP with CBC
Key Large Cost Fixed Sparse 0 Variables 250000.000000 250000.000000 49666.000000 1 Constraints 1000.000000 1000.000000 1000.000000 2 Nonzeros NaN NaN NaN 3 Status 1.000000 1.000000 1.000000 4 Objective 12015.218130 12015.218130 12015.218130 5 Generation Time 3.626614 3.538979 1.394390 6 Solver Time 3.924816 3.292323 0.963178 7 Iterations NaN NaN NaN
We see a similar pattern, just about 10 times as slow (in generation time and in solver time).
R: OMPR with GLPK
These tools are much slower on these models:
row_name largecost fixed sparse 1 variables 250000 250000 49666 2 constraints 1000 1000 1000 3 status success success success 4 objective 12015.2181250246 12015.2181250246 12015.2181250246 5 total time 3946.04000000001 245.820000000007 160.080000000016
Here it is more important to use the sparse representation. Nevertheless, OMPR/GLPK is way slower than PULP/CBC (roughly a factor 100) and GAMS/Cplex (roughly a factor 1,000). I am not sure if this is due to OMPR or GLPK (I don't know how to get timings for the individual steps).
Appendix: GAMS Model
$onText |
Appendix 2: Python/PuLP code
import pulp as lp import numpy as np import pickle import pandas as pd from pytictoc import TicToc t = TicToc() # # get the GAMS data # stored in a pickle file # datafile = r"c:\tmp\assignment.pkl" data = pickle.load(open(datafile,'rb')) capacity = data['capacity'] demand = data['demand'] cost = data['cost'] plink = data['plink'] # # we should have a balanced problem # print(f"CHECK: sum capacity = {sum(capacity)}, sum demand = {sum(demand)}") # # large cost for problem 1 # largecost = cost+(1-plink)*1.0e6 # # sources and destinations # SRC = range(len(capacity)) DEST = range(len(demand)) # # for reporting # results = pd.DataFrame({'Key': ['Variables','Constraints','Nonzeros','Status', 'Objective','Generation Time','Solver Time','Iterations']}) # # Transportation Models # --------------------- # # problem 1: dense with large cost entries # t.tic() prob1 = lp.LpProblem("Assignment1", lp.LpMinimize) x = lp.LpVariable.dicts("x", (SRC, DEST), lowBound=0.0) prob1 += lp.lpSum([largecost[i][j]*x[i][j] for i in SRC for j in DEST]) for i in SRC: prob1 += lp.lpSum([x[i][j] for j in DEST]) == capacity[i] for j in DEST: prob1 += lp.lpSum([x[i][j] for i in SRC]) == demand[j] prob1.solve() time = t.tocvalue() print("Status:", lp.LpStatus[prob1.status]) print(f"Solution time: {prob1.solutionTime}") print(f"Objective: {prob1.objective.value()}") print(f"Elapsed: {time}") res = [prob1.numVariables(),prob1.numConstraints(),float("nan"),prob1.status, prob1.objective.value(),time-prob1.solutionTime,prob1.solutionTime,float("nan")] results['Large Cost'] = res print(results) # # problem 1: fixed variables # t.tic() prob2 = lp.LpProblem("Assignment2", lp.LpMinimize) x = lp.LpVariable.dicts("x", (SRC, DEST), lowBound=0.0) prob2 += lp.lpSum([cost[i][j]*x[i][j] for i in SRC for j in DEST]) for i in SRC: prob2 += lp.lpSum([x[i][j] for j in DEST]) == capacity[i] for j in DEST: prob2 += lp.lpSum([x[i][j] for i in SRC]) == demand[j] for i in SRC: for j in DEST: if plink[i][j]==0: x[i][j].upBound = 0 prob2.solve() time = t.tocvalue() print("Status:", lp.LpStatus[prob2.status]) print(f"Solution time: {prob2.solutionTime}") print(f"Objective: {prob2.objective.value()}") print(f"Elapsed: {time}") res = [prob2.numVariables(),prob2.numConstraints(),float("nan"),prob2.status, prob2.objective.value(),time-prob2.solutionTime,prob2.solutionTime,float("nan")] results['Fixed'] = res print(results) # # problem 3: sparse network # t.tic() prob3 = lp.LpProblem("Assignment3", lp.LpMinimize) x = lp.LpVariable.dicts("x", (SRC, DEST), lowBound=0.0) prob3 += lp.lpSum([cost[i][j]*x[i][j] for i in SRC for j in DEST if plink[i][j]==1]) for i in SRC: prob3 += lp.lpSum([x[i][j] for j in DEST if plink[i][j]==1]) == capacity[i] for j in DEST: prob3 += lp.lpSum([x[i][j] for i in SRC if plink[i][j]==1]) == demand[j] prob3.solve() time = t.tocvalue() print("Status:", lp.LpStatus[prob3.status]) print(f"Solution time: {prob3.solutionTime}") print(f"Objective: {prob3.objective.value()}") print(f"Elapsed: {time}") res = [prob3.numVariables(),prob3.numConstraints(),float("nan"),prob3.status, prob3.objective.value(),time-prob3.solutionTime,prob3.solutionTime,float("nan")] results['Sparse'] = res print(results)
Appendix 3: R/OMPR code
#----------------------------------------------------------- # read GAMS data # capacity, demand, c, plink #----------------------------------------------------------- load(file="c:/tmp/gamsdata.Rdata") #----------------------------------------------------------- # convert to matrices/vectors #----------------------------------------------------------- convSup <- 1:nrow(capacity) names(convSup) <- capacity$i convDem <- 1:nrow(demand) names(convDem) <- demand$j cost<-matrix(data=0,nrow=length(convSup),ncol=length(convDem)) # loop is fast enough for (r in 1:nrow(c)) { cost[convSup[c$i[r]],convDem[c$j[r]]] = c$value[r] } islink<-matrix(data=0,nrow=length(convSup),ncol=length(convDem)) # loop is fast enough for (r in 1:nrow(plink)) { islink[convSup[plink$i[r]],convDem[plink$j[r]]] = plink$value[r] } cap <- capacity$value dem <- demand$value paste(sum(cap),sum(dem)) n <- length(cap) m <- length(dem) #----------------------------------------------------------- # OMPR models #----------------------------------------------------------- library(ROI.plugin.glpk) library(ompr) library(ompr.roi) results <- data.frame( row_name=c('variables','constraints','status','objective','total time')) #----------------------------------------------------------- # model 1: large cost #----------------------------------------------------------- largecost = cost + (1-islink)*1e6 t <- system.time({ result <- MIPModel() |> add_variable(x[i,j], lb=0, i=1:n, j=1:m) |> add_constraint(sum_over(x[i,j], j=1:m)==cap[i], i=1:n) |> add_constraint(sum_over(x[i,j], i=1:m)==dem[j], j=1:n) |> set_objective(sum_over(largecost[i,j] * x[i,j], i = 1:m, j = 1:n),sense = "min") |> solve_model(with_ROI("glpk", verbose = TRUE)) }) res <- c(nvars(result$model)$continuous, nconstraints(result$model), result$status, result$objective_value, t[['elapsed']] ) results['largecost'] <- res #----------------------------------------------------------- # model 2: fixed variables #----------------------------------------------------------- t <- system.time({ result <- MIPModel() |> add_variable(x[i,j], lb=0, i=1:n, j=1:m) |> add_constraint(sum_over(x[i,j], j=1:m)==cap[i], i=1:n) |> add_constraint(sum_over(x[i,j], i=1:m)==dem[j], j=1:n) |> set_objective(sum_over(cost[i,j] * x[i,j], i = 1:m, j = 1:n),sense = "min") |> set_bounds(x[i,j], ub=0, i=1:m, j=1:n, islink[i,j]==0) |> solve_model(with_ROI("glpk", verbose = TRUE)) }) res <- c(nvars(result$model)$continuous, nconstraints(result$model), result$status, result$objective_value, t[['elapsed']] ) results['fixed'] <- res #----------------------------------------------------------- # model 3: sparse model #----------------------------------------------------------- t <- system.time({ result <- MIPModel() |> add_variable(x[i,j], lb=0, i=1:n, j=1:m, islink[i,j]==1) |> add_constraint(sum_over(x[i,j], j=1:m, islink[i,j]==1)==cap[i], i=1:n) |> add_constraint(sum_over(x[i,j], i=1:m, islink[i,j]==1)==dem[j], j=1:n) |> set_objective(sum_over(cost[i,j] * x[i,j], i = 1:m, j = 1:n, islink[i,j]==1),sense = "min") |> solve_model(with_ROI("glpk", verbose = TRUE)) }) res <- c(nvars(result$model)$continuous, nconstraints(result$model), result$status, result$objective_value, t[['elapsed']] ) results['sparse'] <- res print(results)
No comments:
Post a Comment