Sunday, April 15, 2012

Parallel GAMS jobs (2)

In http://yetanothermathprogrammingconsultant.blogspot.com/2012/04/parallel-gams-jobs.html I described a simple approach I suggested to a client allowing to run multiple scenarios in parallel.

For a different client we needed to run a randomized algorithm that solves many small MIP models. They are so small that using multiple threads inside the MIP solver does not give much performance boost (much of the time is spent outside the pure Branch & Bound part – such as preprocessing etc.). However as the MIP problems are independent of each other we could generate all the necessary data in advance and then call the scenario solver (http://www.gams.com/modlib/adddocs/gusspaper.pdf). This will keep the generated problem in memory, and does in-core updates, so we don’t regenerate the model all the time.

When running the algorithm with n=100,000 MIP models we see the following performance. Note that besides the MIP models there is also a substantial piece of GAMS code that implements other parts of the algorithm.

Implementation number of MIP models solve time rest of algorithm total time
Traditional GAMS loop (call solver as DLL) 100,000 1068 sec 169 sec 1237 sec
Scenario Solver 100,000 293 sec 166 sec 459 sec

To get more performance I tried to run the scenario solver in parallel. That is not completely trivial as the solver has a number glitches (e.g. scratch files with fixed, hard coded names). I also run parts of the GAMS algorithm in parallel, but some parts had to be done in the master model after merging the results.

Implementation number of MIP models Worker threads parallel sub-problem time rest of algorithm (serial) total time
Parallel + Scenario Solver 100,000 4 116 sec 67 sec 183 sec

The implementation does not win the beauty contest, but it could be developed quickly. For these larger problems you always have to watch out for performance bottlenecks. One wrong line in the GAMS model and the performance drops to hours of computation time. In addition the GAMS tools here are not very refined, but if implemented correctly quite effective. As an example consider how I merge the results of the sub jobs. Each sub job writes a results file (GDX file) into its own directory and the master model will read those and merge them together so we have a single solution set for further processing:

*
* read and merge the GDX files
*

loop(job,
  
put_utility 'gdxin' / 'jobdir',(jobinfo(job,"jobno")):0:0,"\results.gdx"
/;
  
execute_loadpoint
solpoollevel,solpoolobj,pp;

)
;

This is not exactly readable and intuitive code (caused by lack of design when these features were added to the GAMS language in a hurry), but it is quite fast.