## Monday, July 11, 2022

### Transportation model with some non-existing links

I am again looking at simple models based on the transportation model:

Dense Transportation Model
\begin{align}\min&\sum_{i,j}\color{darkblue}c_{i,j}\cdot\color{darkred}x_{i,j}\\ & \sum_j \color{darkred}x_{i,j} \le \color{darkblue}a_i && \forall i \\& \sum_i \color{darkred}x_{i,j} \ge \color{darkblue}b_j && \forall j \\ & \color{darkred}x_{i,j} \ge 0 \end{align}

This is a good time to discuss what happens when some links can not be used. When a significant number of links cannot be used, the underlying bipartite graph is called sparse. Let's discuss some of the approaches often used to handle this.

• Use a dense formulation with a large cost coefficient $$\color{darkblue}c_{i,j}=99999$$ if the link $$i\rightarrow j$$ should not be used. This is my least favorite approach. We need to invent a large penalty cost (should we use 99999 or 999999 or something even more extreme?). This may make the problem poorly scaled.
• Use a dense formulation, but fix $$\color{darkblue}x_{i,j}=0$$ for the non-existing links. The solver will remove these fixed variables in the presolve phase before starting to iterate.
• In GAMS we can add to this model.holdfixed=1;. This will remove all fixed variables from the model even before passing on the model to the solver.
• Use a sparse formulation, where don't even generate the non-existing links. I.e. only generate $$\color{darkblue}x_{i,j}$$ for existing links. This is my favorite approach. It is the most work, but it is also the most explicit.

When creating a larger dense problem, we stress the system at different points:
• During model generation
• The solver receives a larger model
• Hopefully, it can presolve things away
• The solver has to re-insert solution values during postsolve
• The modeling system receives a larger solution
Should we really worry about this, or should we pick the simplest approach? One way to weigh these things is to look at the performance. Warning: this is an extremely simple and easy-to-solve LP, and the situation may be dramatically different for a complex, difficult MIP model.

I set up the following scenario:
• 500 supply nodes
• 500 demand nodes
• 80% of the arcs are removed

The GAMS/CPLEX results are:

----    132 PARAMETER results  performance comparison

c        x.fx   holdfixed      sparse

Variables        250001.000  250001.000   49667.000   49667.000
Equations          1001.000    1001.000    1001.000    1001.000
Nonzero elem.    750001.000  549667.000  148999.000  148999.000
Modelstat           Optimal     Optimal     Optimal     Optimal
Objective         12015.218   12015.218   12015.218   12015.218
Generation time       0.242       0.301       0.158       0.096
Solver time           0.468       0.331       0.107       0.112
Iterations          967.000    1351.000    1351.000    1351.000


Unexpectedly the first method needs the fewest iterations. The models holdfixed and sparse are almost identical (which makes sense) and are the winners. In general, however, timings are just quite good for this model.

### Python: PuLP with CBC

Out of curiosity, let's also try a Python implementation with PuLP and the open source solver CBC. Here are the results:

               Key     Large Cost          Fixed        Sparse
0        Variables  250000.000000  250000.000000  49666.000000
1      Constraints    1000.000000    1000.000000   1000.000000
2         Nonzeros            NaN            NaN           NaN
3           Status       1.000000       1.000000      1.000000
4        Objective   12015.218130   12015.218130  12015.218130
5  Generation Time       3.626614       3.538979      1.394390
6      Solver Time       3.924816       3.292323      0.963178
7       Iterations            NaN            NaN           NaN


We see a similar pattern, just about 10 times as slow (in generation time and in solver time).

### R: OMPR with GLPK

These tools are much slower on these models:

     row_name        largecost            fixed           sparse
1   variables           250000           250000            49666
2 constraints             1000             1000             1000
3      status          success          success          success
4   objective 12015.2181250246 12015.2181250246 12015.2181250246
5  total time 3946.04000000001 245.820000000007 160.080000000016


Here it is more important to use the sparse representation. Nevertheless, OMPR/GLPK is way slower than PULP/CBC (roughly a factor 100) and GAMS/Cplex (roughly a factor 1,000). I am not sure if this is due to OMPR or GLPK (I don't know how to get timings for the individual steps).

### Appendix: GAMS Model

 $onText Three different ways to implement a sparse bipartite graph (transportation problem). 1. Dense with large cost on non-existsing links 2. Dense with fixing x.fx(i,j)=0 when link (i,j) does not exist 2a. Id with model.holdfixed. 3. Don't even generate the variables x(i,j) when i->j is missing. (my preferred way)$offtext set    i 'supply nodes' /sup1*sup500/    j 'demand nodes' /dem1*dem500/    link(i,j)  'allowed links' ; alias(i,ii),(j,jj); * sparse network * increase fraction if infeasible link(i,j) = uniform(0,1)<0.2; scalar totsupdem 'total supply and demand' /10000/; parameter    capacity(i)    demand(j) ; * we try to generate a balanced transportation problem * where sum(demand) = sum(supply) = 10000 capacity(i) = uniform(0,1); capacity(i) = totsupdem*capacity(i)/sum(ii,capacity(ii)); demand(j) = uniform(0,1); demand(j) = totsupdem*demand(j)/sum(jj,demand(jj)); parameter c(i,j) 'unit transportation cost'; c(i,j) = 1e6; c(link) = uniform(1,10); *------------------------------------------------------------------------------- * reporting macro *------------------------------------------------------------------------------- acronym Optimal; parameter results(*,*) 'performance comparison'; $macro collect(m,id) \ results('Variables',id) = m.numvar; \ results('Equations',id) = m.numequ; \ results('Nonzero elem.',id) = m.numnz; \ results('Modelstat',id) = m.modelstat; \ results('Modelstat',id)$(m.modelstat=1) = Optimal; \      results('Objective',id) = m.objval; \      results('Generation time',id) = m.etSolve-m.etSolver; \      results('Solver time',id) = m.etSolver; \      results('Iterations',id) = m.iterusd; *------------------------------------------------------------------------------- * * (1) c(i,j)=99999 for links that don't exist * *------------------------------------------------------------------------------- variable z 'total cost'; positive variable x(i,j) 'shipment from i->j'; equations     objective     eDemand(j)   'meet demand'     eCapacity(i) "don't exceed capacity" ; objective..     z =e= sum((i,j), c(i,j)*x(i,j)); eDemand(j)..    sum(i, x(i,j)) =e= demand(j); eCapacity(i)..  sum(j, x(i,j)) =e= capacity(i); model m1 /all/; * no printing to listing file m1.solprint = %solprint.Silent%; solve m1 minimizing z using lp; collect(m1,'c') *------------------------------------------------------------------------------- * * (2) x.fx(i,j)=0 if link does not exist * *------------------------------------------------------------------------------- x.fx(i,j)$(not link(i,j)) = 0; c(i,j)$(not link(i,j))= 0; option bratio=1; solve m1 minimizing z using lp; collect(m1,'x.fx') *------------------------------------------------------------------------------- * * (2a)holdfixed * *------------------------------------------------------------------------------- m1.holdfixed=1; solve m1 minimizing z using lp; collect(m1,'holdfixed') *------------------------------------------------------------------------------- * * (3) sparse * *------------------------------------------------------------------------------- equations     objective2     eDemand2(j)   'meet demand'     eCapacity2(i) "don't exceed capacity" ; objective2..     z =e= sum(link, c(link)*x(link)); eDemand2(j)..    sum(link(i,j), x(link)) =e= demand(j); eCapacity2(i)..  sum(link(i,j), x(link)) =e= capacity(i); model m2 /all-m1/; m2.solprint = %solprint.Silent%; solve m2 minimizing z using lp; collect(m2,'sparse') display results;

### Appendix 2: Python/PuLP code

import pulp as lp
import numpy as np
import pickle
import pandas as pd
from pytictoc import TicToc

t = TicToc()

#
# get the GAMS data
# stored in a pickle file
#

datafile = r"c:\tmp\assignment.pkl"
capacity = data['capacity']
demand = data['demand']
cost = data['cost']

#
# we should have a balanced problem
#
print(f"CHECK: sum capacity = {sum(capacity)}, sum demand = {sum(demand)}")

#
# large cost for problem 1
#

#
#  sources and destinations
#
SRC = range(len(capacity))
DEST = range(len(demand))

#
# for reporting
#
results = pd.DataFrame({'Key': ['Variables','Constraints','Nonzeros','Status',
'Objective','Generation Time','Solver Time','Iterations']})

#
# Transportation Models
# ---------------------

#
#  problem 1: dense with large cost entries
#

t.tic()
prob1 = lp.LpProblem("Assignment1", lp.LpMinimize)

x = lp.LpVariable.dicts("x", (SRC, DEST), lowBound=0.0)

prob1 += lp.lpSum([largecost[i][j]*x[i][j] for i in SRC for j in DEST])
for i in SRC:
prob1 += lp.lpSum([x[i][j] for j in DEST]) == capacity[i]
for j in DEST:
prob1 += lp.lpSum([x[i][j] for i in SRC]) == demand[j]

prob1.solve()

time = t.tocvalue()

print("Status:", lp.LpStatus[prob1.status])
print(f"Solution time: {prob1.solutionTime}")
print(f"Objective: {prob1.objective.value()}")
print(f"Elapsed: {time}")
res = [prob1.numVariables(),prob1.numConstraints(),float("nan"),prob1.status,
prob1.objective.value(),time-prob1.solutionTime,prob1.solutionTime,float("nan")]
results['Large Cost'] = res
print(results)

#
#  problem 1: fixed variables
#

t.tic()
prob2 = lp.LpProblem("Assignment2", lp.LpMinimize)

x = lp.LpVariable.dicts("x", (SRC, DEST), lowBound=0.0)

prob2 += lp.lpSum([cost[i][j]*x[i][j] for i in SRC for j in DEST])
for i in SRC:
prob2 += lp.lpSum([x[i][j] for j in DEST]) == capacity[i]
for j in DEST:
prob2 += lp.lpSum([x[i][j] for i in SRC]) == demand[j]

for i in SRC:
for j in DEST:
x[i][j].upBound = 0

prob2.solve()

time = t.tocvalue()

print("Status:", lp.LpStatus[prob2.status])
print(f"Solution time: {prob2.solutionTime}")
print(f"Objective: {prob2.objective.value()}")
print(f"Elapsed: {time}")
res = [prob2.numVariables(),prob2.numConstraints(),float("nan"),prob2.status,
prob2.objective.value(),time-prob2.solutionTime,prob2.solutionTime,float("nan")]
results['Fixed'] = res
print(results)

#
#  problem 3: sparse network
#

t.tic()
prob3 = lp.LpProblem("Assignment3", lp.LpMinimize)

x = lp.LpVariable.dicts("x", (SRC, DEST), lowBound=0.0)

prob3 += lp.lpSum([cost[i][j]*x[i][j] for i in SRC for j in DEST if plink[i][j]==1])
for i in SRC:
prob3 += lp.lpSum([x[i][j] for j in DEST if plink[i][j]==1]) == capacity[i]
for j in DEST:
prob3 += lp.lpSum([x[i][j] for i in SRC if plink[i][j]==1]) == demand[j]

prob3.solve()

time = t.tocvalue()

print("Status:", lp.LpStatus[prob3.status])
print(f"Solution time: {prob3.solutionTime}")
print(f"Objective: {prob3.objective.value()}")
print(f"Elapsed: {time}")
res = [prob3.numVariables(),prob3.numConstraints(),float("nan"),prob3.status,
prob3.objective.value(),time-prob3.solutionTime,prob3.solutionTime,float("nan")]
results['Sparse'] = res
print(results)


### Appendix 3: R/OMPR code

#-----------------------------------------------------------
#-----------------------------------------------------------

#-----------------------------------------------------------
# convert to matrices/vectors
#-----------------------------------------------------------

convSup <- 1:nrow(capacity)
names(convSup) <- capacity$i convDem <- 1:nrow(demand) names(convDem) <- demand$j

cost<-matrix(data=0,nrow=length(convSup),ncol=length(convDem))
# loop is fast enough
for (r in 1:nrow(c)) {
cost[convSup[c$i[r]],convDem[c$j[r]]] = c$value[r] } islink<-matrix(data=0,nrow=length(convSup),ncol=length(convDem)) # loop is fast enough for (r in 1:nrow(plink)) { islink[convSup[plink$i[r]],convDem[plink$j[r]]] = plink$value[r]
}

cap <- capacity$value dem <- demand$value

paste(sum(cap),sum(dem))

n <- length(cap)
m <- length(dem)

#-----------------------------------------------------------
# OMPR models
#-----------------------------------------------------------

library(ROI.plugin.glpk)
library(ompr)
library(ompr.roi)

results <- data.frame(
row_name=c('variables','constraints','status','objective','total time'))

#-----------------------------------------------------------
# model 1: large cost
#-----------------------------------------------------------

t <- system.time({
result <- MIPModel() |>
set_objective(sum_over(largecost[i,j] * x[i,j], i = 1:m, j = 1:n),sense = "min") |>
solve_model(with_ROI("glpk", verbose = TRUE))
})
res <- c(nvars(result$model)$continuous,
nconstraints(result$model), result$status,
result$objective_value, t[['elapsed']] ) results['largecost'] <- res #----------------------------------------------------------- # model 2: fixed variables #----------------------------------------------------------- t <- system.time({ result <- MIPModel() |> add_variable(x[i,j], lb=0, i=1:n, j=1:m) |> add_constraint(sum_over(x[i,j], j=1:m)==cap[i], i=1:n) |> add_constraint(sum_over(x[i,j], i=1:m)==dem[j], j=1:n) |> set_objective(sum_over(cost[i,j] * x[i,j], i = 1:m, j = 1:n),sense = "min") |> set_bounds(x[i,j], ub=0, i=1:m, j=1:n, islink[i,j]==0) |> solve_model(with_ROI("glpk", verbose = TRUE)) }) res <- c(nvars(result$model)$continuous, nconstraints(result$model),
result$status, result$objective_value,
t[['elapsed']]
)
results['fixed'] <- res

#-----------------------------------------------------------
# model 3: sparse model
#-----------------------------------------------------------

t <- system.time({
result <- MIPModel() |>
set_objective(sum_over(cost[i,j] * x[i,j], i = 1:m, j = 1:n, islink[i,j]==1),sense = "min") |>
solve_model(with_ROI("glpk", verbose = TRUE))
})

res <- c(nvars(result$model)$continuous,
nconstraints(result$model), result$status,
result\$objective_value,
t[['elapsed']]
)
results['sparse'] <- res
print(results)