Thursday, July 12, 2018

Network models as building blocks for LPs

Many LP problems have some form of embedded networks. Examples are economic trade networks and supply chain networks. We can solve network models with specialized solvers. I typically just use LP solvers. There are some reasons for this:

  • Many of my models are not pure networks (i.e. an LP with an embedded network or a network with side constraints - these are essentially the same, the difference is a matter of gradation),
  • LP solvers are actually quite good in solving network models,
  • Some models can be be converted to pure networks, but this can make the model less straightforward and more removed from the original problem 
To make it interesting, let's consider a large, sparse network. This means not all nodes have a direct link between them. Also assume a directed graph. Directed graphs are more prevalent in practical situation, and undirected graphs are not very easy to handle. A minimum cost flow problem can be stated as the following:\[ \begin{align} \min & \sum_{(i,j)\in A} c_{i,j} x_{i,j}\\ & \sum_{(j,i)\in A} x_{j,i} + \mathit{supply}_i = \sum_{(i,j)\in A} x_{i,j} + \mathit{demand}_i & \forall i \\ & x_{i,j} \in [0,u_{i,j}] & \forall (i,j) \in A \end{align}\] Here \(x_{i,j}\) is the flow from \(i \rightarrow j\). The constants \(supply_i\) and \(demand_i\) are exogenous in- and out-flow. If we want to push a flow of one from node 1 to node 2 then this would look like: \[\mathit{supply}_i  = \begin{cases} 1 & \text{if $i=1$}\\ 0 & \text{otherwise}\end{cases}\] and \[\mathit{demand}_i  = \begin{cases} 1 & \text{if $i=2$}\\ 0 & \text{otherwise}\end{cases}\] The LP model is sometimes respresented as \[ \begin{align} \min & \sum_{(i,j)\in A} c_{i,j} x_{i,j}\\ & \sum_{(j,i)\in A} x_{j,i} -  \sum_{(i,j)\in A} x_{i,j} = b_i & \forall i \\ & x_{i,j} \in [0,u_{i,j}] & \forall (i,j) \in A \end{align}\] where \(b_i = \mathit{demand}_i - \mathit{supply}_i\). 
  

GAMS model


A complete GAMS model can look like:

set i 'nodes'  /n1*n5000/;
alias(i,j);

*
* network topology (directed arcs)
* approx 5% density
*
set a(i,j) 'arcs i -> j';
a(i,j) = uniform(0,1)<0.05
;
a(i,i) =
no;

*
* data
*
parameters
  c(i,j)    
'cost coefficients (over arcs)'
  cap(i,j)  
'arc capacities'
  supply(i) 
'exogenous inflow (sources)'
      
/ (n1,n2) 1 /
  demand(i) 
'exogenous outflow (sinks)'
      
/ (n3,n4) 1 /
;
c(a) = uniform(1,10);
cap(a) = uniform(0,0.5);

positive variable x(i,j) 'flow';
variable z 'objective variable';

* arc capacities
x.up(a) = cap(a);

equations
   flowbal
'inflow = outflow'
   obj    
'objective'
;

obj.. z =e=
sum(a,c(a)*x(a));

flowbal(i)..
  
sum(a(j,i),x(a)) + supply(i) =e=
  
sum(a(i,j),x(a)) + demand(i);

* reduce printing
option limrow=0,limcol=0;

model m /all/;
m.solprint=0;
solve m minimizing z using lp;
display x.l;

We have \(n=5,000\) nodes. Arcs are modeled by a two-dimensional subset a(i,j). This set is sparse: only relatively few arcs exist. The step to generate approximately \(0.05 n^2\) takes a few seconds: we draw and compare \(n^2\) random numbers. We actually have 1,250,314 entries afterwards. This is indeed sparse:

Partial view of two dimensional set a(i,j)

We have two source nodes (\n1,n2\) and two sink nodes \(n3,n4\). The matrices cap and c and the vectors demand and supply are also sparse, The flow balance equation is heavily dependent on the sparse set a. Given a node \(i\) the first summation finds all incoming arcs, i.e. nodes \(j\) such that \(j \rightarrow i\) is an existing arc. The second summation finds all leaving arcs: nodes \(j\) such that \(i \rightarrow j\) is an existing arc. This equation is quite close to what we have in the mathematical model.

The model is not small. We have:


MODEL STATISTICS

BLOCKS OF EQUATIONS           2     SINGLE EQUATIONS        5,001
BLOCKS OF VARIABLES           2     SINGLE VARIABLES    1,250,315
NON ZERO ELEMENTS     3,750,943

As the capacities are somewhat small, the solution is somewhat complex:


----     49 VARIABLE x.L  flow

               n3          n4        n102        n490        n635        n875       n1072       n1209       n1405

n1                                  0.049       0.199                                           0.014
n2                                                                                                          0.359
n490                    0.199
n635        0.139
n875        0.048
n1405                   0.359
n1518       0.028
n1719                   0.097                               0.139                   0.158
n1873                   0.007
n1888       0.164
n2466       0.018
n2490                                                                   0.048
n2619       0.158
n3599       0.128
n3631       0.318
n4012                   0.337

    +       n1518       n1719       n1873       n1888       n1894       n1896       n2412       n2466       n2490

n1                      0.394       0.007                               0.246
n2                                                                                  0.113                   0.048
n102                                            0.049
n1894       0.028
n3169                                                       0.028
n4524                                                                                           0.018
n4558                                           0.114

    +       n2619       n3169       n3599       n3619       n3631       n4012       n4489       n4524       n4558

n1                                              0.072                                           0.018
n2                      0.028                                           0.337                               0.114
n1072       0.158
n1209                                                                               0.014
n2412                               0.113
n3619                                                       0.072
n4489                               0.014
n4636                                                       0.246

    +       n4636

n1896       0.246

Some timings from my Windows laptop vs NEOS:

laptopNEOS Server
Operating SystemWindowsLinux
GAMS Generation Time26.40612.816
Cplex Iterations319319
Cplex Objective8.37638.3763
Cplex Time9.39035.117

Somehow NEOS is relatively slow when solving. I think timing programs on NEOS is rather difficult with all that is happening on their machines.

It is noted that a pure network like this solves quickly. If I use COIN CBC instead of Cplex, we see excellent performance:


COIN-OR Branch and Cut (CBC Library 2.9)
written by J. Forrest

Calling CBC main solution routine...
0  Obj 0 Primal inf 3.9999996 (4)
175  Obj 8.0839851 Primal inf 1.4230331 (22)
318  Obj 8.3762647
Optimal - objective value 8.3762647
Optimal objective 8.376264693 - 318 iterations time 4.372

Solved to optimality.


AMPL Models


AMPL has two ways to express network models. The first is equation based and is basically the same as the GAMS model above.


param n := 5000;
set I := 1..n;
set A := setof{i in I, j in I: Uniform(0,1)<0.05 and i<>j} (i,j);

param c{A} = Uniform(1,10);
param cap{A} = Uniform(0,0.5);
param supply{I} default 0;
param demand{I} default 0;
let supply[1] := 1;
let supply[2] := 1;
let demand[3] := 1;
let demand[4] := 1;


var x{(i,j) in A} >= 0, <= cap[i,j];

minimize Cost: sum{(i,j) in A} c[i,j]*x[i,j];

flowbal{i in I}: supply[i] + sum{(j,i) in A} x[j,i] = demand[i] + sum{(i,j) in A} x[i,j];

solve;

display {(i,j) in A: x[i,j] > 0} (x[i,j]);

The model looks quite nice,

I had some problems printing the results. In some cases funny results were printed with set elements I did not recognize.. A fragment is:


:    993 994 995 996 997 998 999 1000 1001 1002 $11 $12 $13 $14 $15 $16 $17 $18 :=

# $4 = 1014
# $5 = 1015
# $6 = 1016
# $7 = 1017

This is not normal output and looks like a bug when printing with options to suppress empty rows. In the end I worked around the problem and just printed the nonzero values using an expression.

AMPL also has special facilities to express networks. Let's see how that works with this example.


param n := 5000;
set I := 1..n;
set A := setof{i in I, j in I: Uniform(0,1)<0.05 and i<>j} (i,j);

param c{A} = Uniform(1,10);
param cap{A} = Uniform(0,0.5);
param supply{I} default 0;
param demand{I} default 0;
let supply[1] := 1;
let supply[2] := 1;
let demand[3] := 1;
let demand[4] := 1;

minimize Cost;

node flowbal {i in I}: net_in = demand[i] - supply[i];

arc x{(i,j) in A} >= 0, <= cap[i,j],
from flowbal[i], to flowbal[j], obj Cost c[i,j];

solve;

display {(i,j) in A: x[i,j] > 0} (x[i,j]);

I actually prefer the equation based version: it conveys better what we are modeling. For more information about implementing network models in AMPL see [1].

GLPK


GLPK is an open source LP/MIP solver which includes a modeling tool called MathProg based on a subset of AMPL. Mathprog is different enough from AMPL such that the AMPL model did not work out of the box. I applied some edits things to make things work again:


param n := 5000;
set I := 1..n;
set A := setof{i in I, j in I: Uniform(0,1)<0.05 and i<>j} (i,j);

set src := 1..2;
set dest := 3..4;

param c{A} := Uniform(1,10);  
param cap{A} := Uniform(0,0.5);
param supply{i in I} := if i in src then 1 else 0; 
param demand{i in I} := if i in dest then 1 else 0; 

var x{(i,j) in A} >= 0, <= cap[i,j];

minimize Cost: sum{(i,j) in A} c[i,j]*x[i,j];

flowbal{i in I}: supply[i] + sum{(j,i) in A} x[j,i] = demand[i] + sum{(i,j) in A} x[i,j];

solve;

display {(i,j) in A: x[i,j] > 0} (x[i,j]);

end;

It is interesting to see how a free solver/modeling system would stack up against commercial ones. Unfortunately generating and solving this model took a very long time. Instead of less than a minute we deal with more than an hour:


D:\projects\blog\winglpk-4.65\glpk-4.65\w64>TimeMem-1.0.exe glpsol.exe --math network3.mod
GLPSOL: GLPK LP/MIP Solver, v4.65
Parameter(s) specified in the command line:
 --math network3.mod
Reading model section from network3.mod...
23 lines were read
Generating Cost...
Generating flowbal...
Model has been successfully generated
GLPK Simplex Optimizer, v4.65
5001 rows, 1250470 columns, 3751410 non-zeros
Preprocessing...
5000 rows, 1250470 columns, 2500940 non-zeros
Scaling...
 A: min|aij| =  1.000e+00  max|aij| =  1.000e+00  ratio =  1.000e+00
Problem data seem to be well scaled
Constructing initial basis...
Size of triangular part is 4999
      0: obj =   6.939485101e-01 inf =   1.352e+01 (11)
    120: obj =   2.066091319e+00 inf =   9.366e+00 (10)
    239: obj =   5.857094094e+00 inf =   5.028e+00 (8)
    360: obj =   5.857094094e+00 inf =   5.028e+00 (8)
Perturbing LP to avoid stalling [403]...
    479: obj =   5.857094094e+00 inf =   5.028e+00 (8)
    597: obj =   5.532619386e+00 inf =   4.645e+00 (8)
    716: obj =   5.532619386e+00 inf =   4.645e+00 (8)
    835: obj =   1.004237246e+01 inf =   2.921e+00 (8)
    953: obj =   1.025203418e+01 inf =   2.840e+00 (7)
   1067: obj =   1.581285250e+01 inf =   1.285e+00 (5)
   1192: obj =   1.805115142e+01 inf =   1.188e+00 (2)
   1340: obj =   1.805115142e+01 inf =   1.188e+00 (2)
   1503: obj =   2.212195621e+01 inf =   9.765e-01 (2)
   1699: obj =   3.789698157e+01 inf =   0.000e+00 (0)
*  1827: obj =   3.789698157e+01 inf =   0.000e+00 (473859)
*  1955: obj =   3.789698157e+01 inf =   0.000e+00 (470365)
*  2088: obj =   3.789698157e+01 inf =   0.000e+00 (459322)
*  2221: obj =   3.789698157e+01 inf =   0.000e+00 (468268)
*  2356: obj =   3.713937625e+01 inf =   0.000e+00 (456803)
*  2488: obj =   3.713937625e+01 inf =   0.000e+00 (465825)
*  2619: obj =   3.713937625e+01 inf =   0.000e+00 (475413) 1
*  2748: obj =   3.629538161e+01 inf =   0.000e+00 (429758)
*  2879: obj =   3.586530520e+01 inf =   0.000e+00 (457580)
*  3011: obj =   3.586530520e+01 inf =   0.000e+00 (466653)
*  3150: obj =   3.586530520e+01 inf =   0.000e+00 (467626)
*  3286: obj =   3.586530520e+01 inf =   0.000e+00 (469059)
*  3423: obj =   3.586530520e+01 inf =   0.000e+00 (464494)
*  3554: obj =   3.586530520e+01 inf =   0.000e+00 (470673)
*  3683: obj =   3.586530520e+01 inf =   0.000e+00 (468476)
*  3808: obj =   3.586530520e+01 inf =   0.000e+00 (458177)
*  3934: obj =   3.568951686e+01 inf =   0.000e+00 (380721) 1
*  4065: obj =   3.568951686e+01 inf =   0.000e+00 (373842)
*  4198: obj =   3.407645932e+01 inf =   0.000e+00 (454855)
*  4331: obj =   3.407645932e+01 inf =   0.000e+00 (489226)
*  4450: obj =   3.407645932e+01 inf =   0.000e+00 (478258) 1
*  4565: obj =   3.388831868e+01 inf =   0.000e+00 (463926)
*  4687: obj =   3.200699481e+01 inf =   0.000e+00 (386710)
*  4818: obj =   3.129121358e+01 inf =   0.000e+00 (431844)
*  4947: obj =   3.120661156e+01 inf =   0.000e+00 (425111) 1
*  5081: obj =   3.120661156e+01 inf =   0.000e+00 (418646)
*  5212: obj =   3.046484643e+01 inf =   0.000e+00 (449751)
*  5337: obj =   2.989829501e+01 inf =   0.000e+00 (476458)
*  5461: obj =   2.848322634e+01 inf =   0.000e+00 (474040) 1
*  5588: obj =   2.651301575e+01 inf =   0.000e+00 (363776)
*  5717: obj =   2.576492847e+01 inf =   0.000e+00 (408484)
*  5843: obj =   2.253496704e+01 inf =   0.000e+00 (423080) 1
*  5969: obj =   2.253496704e+01 inf =   0.000e+00 (403170)
*  6099: obj =   2.215910899e+01 inf =   0.000e+00 (387070)
*  6231: obj =   2.146461527e+01 inf =   0.000e+00 (401893) 1
*  6360: obj =   2.140440543e+01 inf =   0.000e+00 (420520)
*  6485: obj =   2.140440543e+01 inf =   0.000e+00 (359102)
*  6611: obj =   1.925278568e+01 inf =   0.000e+00 (417554) 1
*  6737: obj =   1.841509841e+01 inf =   0.000e+00 (426050)
*  6860: obj =   1.581217012e+01 inf =   0.000e+00 (384451) 1
*  6990: obj =   1.406658394e+01 inf =   0.000e+00 (304416)
*  7132: obj =   1.406658394e+01 inf =   0.000e+00 (312140)
*  7269: obj =   1.406658394e+01 inf =   0.000e+00 (330591)
*  7400: obj =   1.294736282e+01 inf =   0.000e+00 (335060) 1
*  7530: obj =   1.294736282e+01 inf =   0.000e+00 (361433)
*  7667: obj =   1.256488191e+01 inf =   0.000e+00 (238709) 1
*  7802: obj =   1.256488191e+01 inf =   0.000e+00 (361552) 1
*  7938: obj =   1.256488191e+01 inf =   0.000e+00 (283082)
*  8067: obj =   1.256488191e+01 inf =   0.000e+00 (339700) 1
*  8176: obj =   1.252840374e+01 inf =   0.000e+00 (248354)
*  8318: obj =   1.252840374e+01 inf =   0.000e+00 (272086) 1
*  8458: obj =   1.252840374e+01 inf =   0.000e+00 (240051)
*  8601: obj =   1.252840374e+01 inf =   0.000e+00 (203217) 1
*  8768: obj =   1.206036592e+01 inf =   0.000e+00 (68736)
*  8936: obj =   1.194914307e+01 inf =   0.000e+00 (115418) 1
*  9092: obj =   1.186788602e+01 inf =   0.000e+00 (257936) 1
*  9257: obj =   1.186788602e+01 inf =   0.000e+00 (293133)
*  9411: obj =   1.186788602e+01 inf =   0.000e+00 (190910) 1
*  9554: obj =   1.186788602e+01 inf =   0.000e+00 (232493)
*  9703: obj =   1.162047323e+01 inf =   0.000e+00 (151447) 1
*  9852: obj =   1.162047323e+01 inf =   0.000e+00 (164831)
* 10013: obj =   1.162047323e+01 inf =   0.000e+00 (160575) 1
* 10166: obj =   1.162047323e+01 inf =   0.000e+00 (277300) 1
* 10307: obj =   1.162047323e+01 inf =   0.000e+00 (236831)
* 10454: obj =   1.162047323e+01 inf =   0.000e+00 (212227) 1
* 10604: obj =   1.162047323e+01 inf =   0.000e+00 (140137)
* 10769: obj =   1.142233709e+01 inf =   0.000e+00 (98909) 1
* 10926: obj =   1.142233709e+01 inf =   0.000e+00 (241045) 1
* 11084: obj =   1.142233709e+01 inf =   0.000e+00 (181677) 1
* 11221: obj =   1.142233709e+01 inf =   0.000e+00 (230863)
* 11373: obj =   1.142233709e+01 inf =   0.000e+00 (110799) 1
* 11542: obj =   1.142233709e+01 inf =   0.000e+00 (125143) 1
* 11710: obj =   1.142233709e+01 inf =   0.000e+00 (119827) 1
* 11870: obj =   1.142233709e+01 inf =   0.000e+00 (119131)
* 12034: obj =   1.142233709e+01 inf =   0.000e+00 (109375) 1
* 12200: obj =   1.142233709e+01 inf =   0.000e+00 (101337) 1
* 12371: obj =   1.142233709e+01 inf =   0.000e+00 (107241)
* 12542: obj =   1.142233709e+01 inf =   0.000e+00 (91039) 1
* 12719: obj =   1.142233709e+01 inf =   0.000e+00 (50343) 1
* 12875: obj =   1.142233709e+01 inf =   0.000e+00 (45627)
* 13043: obj =   1.142233709e+01 inf =   0.000e+00 (61821) 1
* 13220: obj =   1.142233709e+01 inf =   0.000e+00 (56508) 1
* 13398: obj =   1.142233709e+01 inf =   0.000e+00 (63235) 1
* 13556: obj =   1.142233709e+01 inf =   0.000e+00 (171802) 1
* 13711: obj =   1.142233709e+01 inf =   0.000e+00 (178764)
* 13870: obj =   1.142233709e+01 inf =   0.000e+00 (56535) 1
* 14046: obj =   1.142233709e+01 inf =   0.000e+00 (72873) 1
* 14219: obj =   1.142233709e+01 inf =   0.000e+00 (84913) 1
* 14385: obj =   1.142233709e+01 inf =   0.000e+00 (77343)
* 14558: obj =   1.142233709e+01 inf =   0.000e+00 (62914) 1
* 14727: obj =   1.142233709e+01 inf =   0.000e+00 (48561) 1
* 14902: obj =   1.132754349e+01 inf =   0.000e+00 (84008)
* 15054: obj =   1.132754349e+01 inf =   0.000e+00 (79216) 1
* 15216: obj =   1.132754349e+01 inf =   0.000e+00 (68852) 1
* 15386: obj =   1.119166284e+01 inf =   0.000e+00 (57191) 1
* 15556: obj =   1.119166284e+01 inf =   0.000e+00 (57834)
* 15727: obj =   1.119166284e+01 inf =   0.000e+00 (75029) 1
* 15902: obj =   1.090002160e+01 inf =   0.000e+00 (58584) 1
* 16078: obj =   1.082030323e+01 inf =   0.000e+00 (48927) 1
* 16260: obj =   1.082030323e+01 inf =   0.000e+00 (38824)
* 16439: obj =   1.080362034e+01 inf =   0.000e+00 (36321) 1
* 16626: obj =   1.042732846e+01 inf =   0.000e+00 (19686) 1
* 16812: obj =   1.042474771e+01 inf =   0.000e+00 (22113) 1
* 16996: obj =   1.042474771e+01 inf =   0.000e+00 (23719)
* 17180: obj =   1.042474771e+01 inf =   0.000e+00 (24825) 1
* 17364: obj =   1.042474771e+01 inf =   0.000e+00 (20385) 1
* 17552: obj =   1.042474771e+01 inf =   0.000e+00 (19242) 1
* 17744: obj =   1.026837798e+01 inf =   0.000e+00 (18388)
* 17930: obj =   1.026837798e+01 inf =   0.000e+00 (22530) 1
* 18120: obj =   1.025595159e+01 inf =   0.000e+00 (9025) 1
* 18296: obj =   1.020465344e+01 inf =   0.000e+00 (9133) 1
* 18463: obj =   1.020465344e+01 inf =   0.000e+00 (10438) 1
* 18621: obj =   1.015646186e+01 inf =   0.000e+00 (13266)
* 18802: obj =   1.014150750e+01 inf =   0.000e+00 (19057) 1
* 18986: obj =   9.954708943e+00 inf =   0.000e+00 (7220) 1
* 19167: obj =   9.925813604e+00 inf =   0.000e+00 (7537)
* 19351: obj =   9.900116820e+00 inf =   0.000e+00 (15139) 1
* 19538: obj =   9.863875257e+00 inf =   0.000e+00 (7051) 1
* 19727: obj =   9.861544899e+00 inf =   0.000e+00 (7586) 1
* 19921: obj =   9.861544899e+00 inf =   0.000e+00 (6291)
* 20111: obj =   9.861544899e+00 inf =   0.000e+00 (5661) 1
* 20324: obj =   9.754583642e+00 inf =   0.000e+00 (3955) 1
* 20517: obj =   9.707349345e+00 inf =   0.000e+00 (1889) 1
* 20705: obj =   9.659064105e+00 inf =   0.000e+00 (1759) 1
* 20895: obj =   9.653194379e+00 inf =   0.000e+00 (1472) 1
* 21077: obj =   9.652094815e+00 inf =   0.000e+00 (1474)
* 21271: obj =   9.650142953e+00 inf =   0.000e+00 (1149) 1
* 21465: obj =   9.630282332e+00 inf =   0.000e+00 (908) 1
* 21657: obj =   9.617880384e+00 inf =   0.000e+00 (448) 1
* 21846: obj =   9.606680468e+00 inf =   0.000e+00 (268)
* 22042: obj =   9.606316377e+00 inf =   0.000e+00 (10) 1
Removing LP perturbation [22052]...
* 22052: obj =   9.606316377e+00 inf =   0.000e+00 (0)
OPTIMAL LP SOLUTION FOUND
Time used:   728.3 secs
Memory used: 2433.8 Mb (2552071521 bytes)
Display statement at line 21
x[1,252].val = 0.149840161204338
x[1,379].val = 0.074746768688783
x[1,550].val = 0.0453869814518839
x[1,1084].val = 0.0445991435553879
x[1,1540].val = 0.0551238842308521
x[1,2345].val = 0.0476873791776597
x[1,2475].val = 0.00135596189647913
x[1,2749].val = 0.13588639209047
x[1,2884].val = 0.0230762232095003
x[1,3002].val = 0.00382565474137664
x[1,3188].val = 0.323973921360448
x[1,3645].val = 0.0407408073078841
x[1,4228].val = 0.0537567210849375
x[2,341].val = 0.386876111850142
x[2,847].val = 0.0894879563711584
x[2,1225].val = 0.0989777501672506
x[2,1259].val = 0.0705436542630196
x[2,1572].val = 0.0149974161759019
x[2,2406].val = 0.0154880676418543
x[2,2449].val = 0.00948114111088216
x[2,3169].val = 0.0226166676729918
x[2,3568].val = 0.0758920384105295
x[2,3570].val = 0.0437631316017359
x[2,3575].val = 0.020591959124431
x[2,4377].val = 0.00492054037749767
x[2,4548].val = 0.146363565232605
x[76,3].val = 0.149840161204338
x[170,4].val = 0.0154880676418543
x[252,76].val = 0.149840161204338
x[341,493].val = 0.31903103669174
x[341,1498].val = 0.0137350093573332
x[341,4015].val = 0.0541100658010691
x[379,1633].val = 0.0551570120733231
x[379,4157].val = 0.0195897566154599
x[493,2287].val = 0.31903103669174
x[502,1162].val = 0.020591959124431
x[550,2869].val = 0.0453869814518839
x[692,4].val = 0.0551570120733231
x[847,3240].val = 0.0894879563711584
x[917,3].val = 0.151284105610102
x[1007,3].val = 0.0541100658010691
x[1084,2428].val = 0.0425937338732183
x[1084,2673].val = 0.00200540968216956
x[1162,3].val = 0.020591959124431
x[1202,2963].val = 0.0551238842308521
x[1225,4900].val = 0.0989777501672506
x[1253,917].val = 0.00492054037749767
x[1259,2480].val = 0.0705436542630196
x[1344,3].val = 0.0537567210849375
x[1493,4].val = 0.165379994781688
x[1498,4].val = 0.0137350093573332
x[1540,1202].val = 0.0551238842308521
x[1572,2287].val = 0.0149974161759019
x[1633,692].val = 0.0551570120733231
x[1951,4].val = 0.121594417840242
x[2142,4].val = 0.157906056847423
x[2155,4].val = 0.0135977419558913
x[2158,1493].val = 0.0758920384105295
x[2271,4].val = 0.0230762232095003
x[2287,4].val = 0.334028452867642
x[2345,2142].val = 0.0220196647569537
x[2345,3479].val = 0.025667714420706
x[2373,3].val = 0.0437631316017359
x[2406,170].val = 0.0154880676418543
x[2428,3].val = 0.0425937338732183
x[2449,4892].val = 0.00948114111088216
x[2475,2793].val = 0.00135596189647913
x[2480,2675].val = 0.0705436542630196
x[2673,3].val = 0.00200540968216956
x[2675,4].val = 0.0705436542630196
x[2749,4340].val = 0.13588639209047
x[2793,3].val = 0.00135596189647913
x[2869,3].val = 0.0453869814518839
x[2884,2271].val = 0.0230762232095003
x[2963,3].val = 0.0551238842308521
x[3002,3772].val = 0.00382565474137664
x[3083,3].val = 0.0467328219674528
x[3169,1951].val = 0.0226166676729918
x[3188,3585].val = 0.323973921360448
x[3240,1493].val = 0.0894879563711584
x[3479,4].val = 0.025667714420706
x[3568,2158].val = 0.0758920384105295
x[3570,2373].val = 0.0437631316017359
x[3575,502].val = 0.020591959124431
x[3585,3].val = 0.323973921360448
x[3645,4183].val = 0.0135977419558913
x[3645,4619].val = 0.0271430653519928
x[3772,4].val = 0.00382565474137664
x[4015,1007].val = 0.0541100658010691
x[4157,3083].val = 0.0195897566154599
x[4183,2155].val = 0.0135977419558913
x[4228,1344].val = 0.0537567210849375
x[4340,2142].val = 0.13588639209047
x[4377,1253].val = 0.00492054037749767
x[4548,917].val = 0.146363565232605
x[4619,3083].val = 0.0271430653519928
x[4892,3].val = 0.00948114111088216
x[4900,1951].val = 0.0989777501672506
Model has been successfully processed
Exit code      : 0
Elapsed time   : 4343.45
Kernel time    : 11.16 (0.3%)
User time      : 4053.41 (93.3%)
page fault #   : 724488
Working set    : 2579972 KB
Paged pool     : 86 KB
Non-paged pool : 29 KB
Page file size : 2665148 KB

D:\projects\blog\winglpk-4.65\glpk-4.65\w64>

The solver takes about 730 seconds, and the modeling tool needs the rest of the 4340 seconds (i.e. about an hour). I suspect running through the sparse two-dimensional set A in the flowbal equation is the killer.

References


  1. Chapter 15, Network Linear Programs, AMPL book, https://ampl.com/BOOK/CHAPTERS/18-network.pdf

Wednesday, July 4, 2018

LP problems and the number of nonzero elements

I was reviewing an LP model and I was able to reduce the number of nonzero elements in the LP matrix by a large amount. Many new LP modelers focus too much on the number of variables and the number of equations as a measure of size. The number of nonzero elements is a much more important quantity to watch.

To illustrate, I generated a random LP where I could fiddle with the density: the number of nonzero elements \(nz\) as fraction of \(n \times m\) (\(n\) = number of variables, \(m\) is numbers of equations). To be precise: \[\mathit{density}=\frac{nz}{n \times m} 100\%\] In my case I fixed: \(m=2,000\) and \(n=5,000\) and varied the density from 100% to 10% to 1%.

The results look like:

densitynzSimplex
iterations
Simplex
seconds
Barrier
iterations
Barrier
seconds
100%10,000,000953229515484
10%1,004,8635297401320
1%104,799575113138

These timings were with Cplex but you will see the same behavior using your favorite LP solver.

All the problems have the same "size" in terms of rows and columns. Clearly the number of nonzeros is an important factor. We see the same pattern when we use the (dual) Simplex method, or an interior point algorithm: the time per iteration goes up significantly if the problems are denser. LP solvers are designed for large, sparse problems, so expect increased solution times when the problem becomes dense.

Luckily, practical LP models (implemented correctly) have a tendency to be very sparse. Typically a column has fewer than 5-10 nonzeros. In practice that means, that the density for most LP models is much less than 1%.

It is my belief that issues like this do not get enough emphasis in linear programming classes. May be too much time is devoted to learning full-tableau simplex methods (in my opinion that is like spending the first 5 chapters of the aeronautics book on the steam engine).

Friday, June 22, 2018

NEOS and ROI: R Optimization Infrastructure Package

ROI [1] is an R package to express optimization problems (in a rather low-level format). Its main advantage is that you can use the same model against a number of solvers. Linear models are expressed in a matrix notation. I am not a big fan of this: it restricts the applicability to small, simple models only. For larger, more complex models the matrix approach leads quickly to unreadable and unmaintainable models.

Solver interfaces are implemented as "plug-ins". A new plugin [2] allows to run solvers on NEOS [3]. This gives us access to a number of powerful solvers, without the need to install things locally.

Linear models


To illustrate how we can express a simple transportation model in ROI, let's try a very simple transportation model [4]: \[\bbox[lightcyan,10px,border:3px solid darkblue]{\begin{align}\min\>&\sum_{i,j} c_{i,j} x_{i,j}\\&\sum_j x_{i,j} \le a_i\>\>\forall i\\&\sum_i x_{i,j} \ge b_j\>\>\forall j\\&x_{i,j}\ge 0\end{align}}\]
The GAMS representation is:


Set
   i 'canning plants' / seattle,  san-diego /
   j 'markets'        / new-york, chicago, topeka /;

Parameter
   a(i) 'capacity of plant i in cases'
        / seattle     350
          san-diego   600  /

   b(j) 'demand at market j in cases'
        / new-york    325
          chicago     300
          topeka      275  /;

Table d(i,j) 'distance in thousands of miles'
               new-york       chicago        topeka
   seattle          2.5           1.7           1.8
   san-diego        2.5           1.8           1.4;

Scalar f 'freight in dollars per case per thousand miles' / 90 /;

Parameter c(i,j) 'transport cost in thousands of dollars per case';
c(i,j) = f * d(i,j) / 1000;

Variable
   x(i,j) 'shipment quantities in cases'
   z      'total transportation costs in thousands of dollars';

Positive Variable x;

Equation
   cost      'define objective function'
   supply(i) 'observe supply limit at plant i'
   demand(j) 'satisfy demand at market j';

cost..        z  =e=  sum((i,j), c(i,j)*x(i,j));

supply(i)..   sum(j, x(i,j)) =l= a(i);

demand(j)..   sum(i, x(i,j)) =g= b(j);

Model transport / all /;

solve transport using lp minimizing z;

display x.l, x.m;

Note that all vectors and matrices have strings as indices. The ordering does not matter (like in a relational database table). Now let's do the same for ROI.


#
# data
#
n <- 2  # number of plants (sources)
m <- 3  # number of markets (distinations)
si <- c('seattle', 'san-diego')
sj <- c('new-york', 'chicago', 'topeka')
a <- c(350, 600)      # capacity
b <- c(325, 300, 275) # demand
# distances
d <- matrix(c(2.5, 1.7, 1.8, 2.5, 1.8, 1.4), nrow=n, ncol=m, byrow=TRUE)
f <- 90   # unit freight cost
c <- d*f/1000  # unit transportation cost

# flatten c row wise
cflat <- as.vector(t(c))
# create constraint matrix .... somewhat torturous code
nvar <- n * m;
A <- numeric(0)
varnames <- character(0);
for (i in 1:n) {
   row <- rep(0,nvar)
   for (j in 1:m)  {
     row[(i-1)*m+j] <- 1
     varnames <- c(varnames,paste(si[i],sj[j],sep=":"))
   }
   A<-rbind(A,row)
}
for (j in 1:m) {
  row <- rep(0,nvar)
  for (i in 1:n) 
    row[(i-1)*m+j] <- 1
  A<-rbind(A,row)
}
rhs <- c(a,b)
dir <- c(rep('<=',n),rep('>=',m))

LP <- OP(L_objective(cflat,names=varnames),
         L_constraint(A,dir,rhs),
         max=FALSE)

(sol <- ROI_solve(LP, solver = "glpk"))

I just used vectors with integer indices: i.e. the position determines the meaning of a number (in other words: the ordering is important). We could have used vectors/matrices with row and column names, although I am not sure if this would make things much more readable.

The results look like:


> (sol <- ROI_solve(LP, solver = "glpk"))
Optimal solution found.
The objective value is: 1.536750e+02
> solution(sol)
  seattle:new-york    seattle:chicago     seattle:topeka san-diego:new-york  san-diego:chicago   san-diego:topeka 
                50                300                  0                275                  0                275 


When reading the GAMS code, you can follow this is about a transportation model. The R code is much more obscure, it is close to write-only code. I don't understand how you can implement large and/or complex models in this way.

The R code builds up a "large" matrix \(A\). For this tiny example it is small, but in general it is very large: in practice even way too large to handle even medium sized models. A model with 10,000 equations and 10,000 variables (not very large by today's standards) leads to a 10,000 x 10,000 matrix with 100 million entries! This type of matrix based interface essentially brings us back to the days of matrix generators [7] (although they paid attention to sparsity). This technology is largely obsolete.

NEOS


To use a NEOS solver we just need to change the last line:


> (sol <- ROI_solve(LP, solver = "neos", method = "mosek"))
Optimal solution found.
The objective value is: 1.536750e+02

For some solvers you need to provide an e-mail address:


In this case, add email="...".

Behind the scenes


Interestingly, under the hood, a GAMS model is generated and sent to NEOS. It looks like:


> model_call <- ROI_solve(LP, solver = "neos", method = "mosek",
+                         dry_run = TRUE)
> cat(as.list(model_call)$xmlstring)
<?xml version="1.0" encoding="UTF-8"?>
<document>
  <category>milp</category>
  <solver>MOSEK</solver>
  <inputMethod>GAMS</inputMethod>
  <model><![CDATA[Option IntVarUp = 0;

Set i / R1*R5 / ;
Set ileq(i) / R1, R2 / ;
Set igeq(i) / R3, R4, R5 / ;
Set j / C1*C6 / ;

Parameter objL(j)
/C1 0.225
C2 0.153
C3 0.162
C4 0.225
C5 0.162
C6 0.126/ ;

Parameter rhs(i)
/R1 350
R2 600
R3 325
R4 300
R5 275/ ;

Parameter A
/R1.C1 1
R3.C1 1
R1.C2 1
R4.C2 1
R1.C3 1
R5.C3 1
R2.C4 1
R3.C4 1
R2.C5 1
R4.C5 1
R2.C6 1
R5.C6 1/;

Variables obj;
Positive Variables x(j);


Equations
    ObjSum
    LinLeq(ileq)
    LinGeq(igeq);

ObjSum .. obj =e= sum(j, x(j) * objL(j)) ;
LinLeq(ileq) .. sum(j, A(ileq, j) * x(j)) =l= rhs(ileq) ;
LinGeq(igeq) .. sum(j, A(igeq, j) * x(j)) =g= rhs(igeq) ;

Model LinearProblem /all/ ;

Solve LinearProblem using LP minimizing obj ;

option decimals = 8;

display '---BEGIN.SOLUTION---', x.l, '---END.SOLUTION---';


file results /results.txt/;
results.nw = 0;
results.nd = 15;
results.nr = 2;
results.nz = 0;
put results;
put 'solution:'/;
loop(j, put, x.l(j)/);
put 'objval:'/;
put LinearProblem.objval/;
put 'solver_status:'/;
put LinearProblem.solvestat/;
put 'model_status:'/;
put LinearProblem.modelstat/;]]></model>
  <options><![CDATA[]]></options>
  <gdx><![CDATA[]]></gdx>
  <wantgdx><![CDATA[]]></wantgdx>
  <wantlog><![CDATA[]]></wantlog>
  <comments><![CDATA[]]></comments>
</document>
> 

All structure is lost: we are just modeling:\[\begin{align} \min \>& c^Tx \\ & A_1 x  \le b_1 \\&  A_2 x \ge b_2 \end{align}\]

Quadratic models


This plugin also allows quadratic models (but not general nonlinear models). In [2] a trivial non-convex QP model is presented: \[\begin{align}\max\>&F(P_F - 150) + C(P_C - 100) \\ & 2 F + C \le 500\\ & 2F + 3C \le 800 \\ & F \le 490 - P_F \\ & C \le 640 - 2P_C\\ & F, C, P_F, P_C\ge 0\end{align}\] ROI does not understand mathematics, so we have to pass this on in terms of matrices:


Q <- simple_triplet_matrix(i = 1:4, j = c(2, 1, 4, 3), rep(1, 4))
as.matrix(Q)

var_names <- c("full", "price_full", "compact", "price_compact")
o <- OP(
  Q_objective(Q = Q, L = c(-150, 0, -100, 0), names = var_names),
  L_constraint(rbind(c(2, 0, 1, 0), c(2, 0, 3, 0),
                     c(1, 1, 0, 0), c(0, 0, 1, 2)),
               dir = leq(4), rhs = c(500, 800, 490, 640)),
  maximum = TRUE)

(sol <- ROI_solve(o, solver = "neos", method = "BARON"))


Unfortunately, this is not really close to the mathematical model and makes the problem setup difficult to read and understand.

In [2] a table is presented with some of the supported solvers:


Linear models: OMPR


For linear models, we can use the OMPR package [5,6] on top of ROI. This will give us more readable models. Here we implement the above transportation model. The model can look like:


library(ompr)
library(ompr.roi)
library(dplyr)

#
# data
#
n <- 2  # number of plants (sources)
m <- 3  # number of markets (distinations)
si <- c('seattle', 'san-diego')
sj <- c('new-york', 'chicago', 'topeka')
a <- c(350, 600)      # capacity
b <- c(325, 300, 275) # demand
# distances
d <- matrix(c(2.5, 1.7, 1.8, 2.5, 1.8, 1.4), nrow=n, ncol=m, byrow=TRUE)
f <- 90   # unit freight cost
c <- d*f/1000  # unit transportation cost

#
# model + solve
#
(sol <- MIPModel() %>%
  add_variable(x[i,j],i=1:n,j=1:m,type = "continuous", lb = 0) %>%
  set_objective(sum_expr(c[i,j]*x[i,j],i=1:n,j=1:m), "min")  %>%
  add_constraint(sum_expr(x[i,j],j=1:m)<=a[i], i=1:n)  %>%
  add_constraint(sum_expr(x[i,j],i=1:n)>=b[j], j=1:m) %>%
  solve_model(with_ROI(solver = "neos", method="mosek")))  

#  
# solution
#
sol %>% 
  get_solution(x[i, j]) %>%
  filter(value > 0) 

This code at least has some resemblance to the mathematical model. The solution looks like:


Status: optimal
Objective value: 153.675

  variable i j value
1        x 1 1    50
2        x 2 1   275
3        x 1 2   300
4        x 2 3   275

We can clean up the solution a bit, so we get a more meaningful solution report:

> sol %>% 
+   get_solution(x[i, j]) %>%
+   filter(value > 0) %>%
+   mutate(Plant = si[i],Market = sj[j]) %>%
+   select(Plant,Market,Shipment=value)
      Plant   Market Shipment
1   seattle new-york       50
2 san-diego new-york      275
3   seattle  chicago      300
4 san-diego   topeka      275

But..


OMPR is much better than matrix oriented tools to express models. That does not mean OMPR is suited for all LP/MIP models. For one thing it can be slow, there is no good support for large,sparse structures and there are some bugs. Here is a recent example of a bug I encountered.

In many cases we may need to restrict the domain of variables. E.g. instead of all x[i,j], only use a subset. An example is a network of nodes and arcs. The arcs can be represented by x[i,j] indicating the flow from node i to node j. Of course, if we have a sparse network only a few of all (i,j) combinations are allowed.

OMPR allows conditions on the domain of variables. E.g. for a one-dimensional variable x[i], we can do:

> n <- 3
> ok <- c(1,0,1)
> ok
[1] 1 0 1
> (sol <- MIPModel() %>%
+     add_variable(x[i], i=1:n, ok[i]==1)
+ )
Mixed integer linear optimization problem
Variables:
  Continuous: 2
  Integer: 0
  Binary: 0
No objective function.
Constraints: 0


This looks great. We see x[2] is not included in the variables. Now let's do the same thing for a two dimensional variable x[i,j]:

> n <- 3
> ok <- matrix(c(0,1,1, 0,0,1, 0,0,0),nrow=3,ncol=3,byrow=T)
> ok
     [,1] [,2] [,3]
[1,]    0    1    1
[2,]    0    0    1
[3,]    0    0    0
> (sol <- MIPModel() %>%
+     add_variable(x[i,j], i=1:n, j=1:n, ok[i,j]==1) 
+ )
Error in validate_quantifier_candidates(candidates, zero_vars_msg) : 
  only_integer_candidates are not all TRUE

Something goes wrong here. The error message is not very informative: I have no clue what it means. I also have no idea how to fix this.

References


  1. Many Solvers, One Interface, ROI, the R Optimization Infrastructure Package, https://www.r-project.org/conferences/useR-2010/slides/Theussl+Hornik+Meyer.pdf
  2. Ronald Hochreiter and Florian Schwendinger, ROI Plug-in NEOS, https://cran.r-project.org/web/packages/ROI.plugin.neos/vignettes/ROI.plugin.neos_Introduction.pdf
  3. NEOS Server: State-of-the-Art Solvers for Numerical Optimization, https://neos-server.org/neos/
  4. Matlab vs GAMS: linear programming, http://yetanothermathprogrammingconsultant.blogspot.com/2016/09/matlab-vs-gams-linear-programming.html
  5. Mixed Integer Linear Programming in R, https://dirkschumacher.github.io/ompr/
  6. MIP model in R using the OMPR package, http://yetanothermathprogrammingconsultant.blogspot.com/2017/04/mip-model-in-r-using-ompr-package.html
  7. Robert Fourer, Modeling Languages versus Matrix Generators for Linear Programming, ACM Transactions on Mathematical Software, vol. 9, pp. 143–183, 1983.