Saturday, June 6, 2009

Large easy LP’s: are modeling systems fast enough?

> I am planning to solve some fairly large network-like LP problems. I assume it is not a good idea to use a modeling language: it will slow things down too much.

Well, it depends. Of course specialized software will be much faster than a modeling language with an LP solver. But you should not underestimate what can be solved these days. Here is an (artificial) example of a pure network model. I took the “trnsport” model from the GAMS model library and scaled it up by duplicating it 200,000 times. That will give us a (quite dense) transportation problem with 1 million rows and 1.2 million columns.

Sets
     k  
duplicate        / instance1*instance200000 /
     i  
canning plants   / seattle, san-diego /
     j  
markets          / new-york, chicago, topeka / ;

Parameters

     a(i) 
capacity of plant i in cases
      
/    seattle     350
           
san-diego   600  /

     b(j) 
demand at market j in cases
      
/    new-york    325
           
chicago     300
           
topeka      275  / ;

Table d(i,j)  distance in thousands of miles
                 
new-york       chicago      topeka
   
seattle          2.5           1.7          1.8
   
san-diego        2.5           1.8          1.4  ;

Scalar f  freight in dollars per case per thousand miles  /90/ ;

Parameter c(i,j)  transport cost in thousands of dollars per case ;

          c(i,j) = f * d(i,j) / 1000 ;

Variables
     x(k,i,j) 
shipment quantities in cases
     z        
total transportation costs in thousands of dollars ;

Positive Variable x ;

Equations
     cost         
define objective function
     supply(k,i)  
observe supply limit at plant i
     demand(k,j)  
satisfy demand at market j ;

cost ..        z  =e= 
sum((k,i,j), c(i,j)*x(k,i,j)) ;

supply(k,i) ..  
sum(j, x(k,i,j))  =l=  a(i) ;

demand(k,j) ..  
sum(i, x(k,i,j))  =g=  b(j) ;

Model transport /all/ ;

option iterlim=1000000;
Solve transport using lp minimizing z ;

Running this with a good LP solver with all default settings gives the following results:

Solver Cplex Gurobi
GAMS generation time (seconds) 10.4 10.4
Solver time (seconds) 78.6 48.4
total turnaround time (mm:ss) 1:38 1:20

We see that the LP solver still uses most of the total time.

Now lets see what happens if we fine-tune a bit. Let’s turn off some of the massive listing output GAMS is generating, and let it dump the solution in a binary GDX data-file. Also let’s see what happens if we use Cplex’s network and barrier solvers. We can do this with:

Model transport /all/ ;

option limrow=0;
option limcol=0;
option iterlim=1000000;
transport.optfile=1;
transport.solprint=0;
Solve transport using lp minimizing z ;

execute_unload 'sol.gdx',z,x;

$onecho > cplex.opt
lpmethod 3
$offecho

The results are:

Solver Cplex
Network
Cplex
Barrier
Id.
No Crossover
GAMS generation time (seconds) 6.4 6.4 6.4
Solver time (seconds) 22.2 19.1 13.3
total turnaround time (mm:ss) 0:31 0:28 0:25

With the fastest solver settings we see that the solver uses about half of the total turn-around time.

For this model GAMS is holding up quite nicely. Conclusion: easy, large LP’s do not immediately imply that a modeling system should not be used because of performance reasons. I conjecture that the reason for this is as follows. If all is OK, GAMS should be able to generate models in roughly linear time w.r.t. the number of non-zeroes of the model. Solvers have worse run-time complexity, so they will start to get slower in relative terms when the problem is getting really big. For difficult LP’s and for MIP models especially, the solver will use in general (much) more time than GAMS. An exception is when LP restarts are used: sometimes LP solvers can very efficiently use an advanced basis to find the optimal solution of a related model. GAMS is not very intelligent in generating such restart models (it will generate each model essentially from scratch). Some other modeling systems such as AIMMS have special facilities to do this faster. For surprisingly many models the GAMS approach actually works just fine.

Of course there may be many other issues that can influence your decision: software acquisition cost, model development cost, model maintenance issues, your current skills and familiarity with modeling software, preferences (“I like to program in C”) etc.

2 comments:

  1. Erwin: In my experience, pure network flow problems are extremely rare. Most network flow problems are really multicommodity flow problems or network flow problems with side constraints. And here's the rub: a network simplex algorithm only solves pure network flow problems. The minute you have side constraints - even something as simple as capacity limits on multiple commodities - it forces you back to a traditional simplex or barrier algorithm. Now, in the case of CPLEX, it is possible to use its network simplex method to solve a network flow problem with side constraints: it first optimizes the pure network flow, then switches to traditional simplex to solve the side constraints. But here's the counter-intuitive part: in my experience, it is usually slower to use CPLEX's network simplex method if you have side constraints. Instead, treat the problem as a general LP and solve via the simplex method or barrier method.

    ReplyDelete
  2. Agreed. In this example the barrier is even faster on a pure network. Note that this was just an experiment to show that a very easy but large LP does not per se preclude the use of a modeling system. A pure network is probably where a modeling system is slowest compared to a solver (apart from restarts using an advanced basis), so I used that as base for comparison. I do not want to claim that many LP's are pure networks.

    ReplyDelete