Monday, February 10, 2014

Parallel R for making many maps

When making many maps (http://yetanothermathprogrammingconsultant.blogspot.com/2014/01/maps-from-gams.html) it may make sense to try to exploit multiple cores. I have 4 on my laptop. It came with hyper-threading turned on so it looks like the machine has 8 cpus:

 image

In R there is a nice parallel foreach construct (see: http://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf). As each map is independent of each other, this offers an obvious way of parallelizing the generation of many maps. The results are quite good:

Serial   : Elapsed time: 271.74   seconds for  84  maps
2 threads: Elapsed time: 136.17   seconds for  84  maps
4 threads: Elapsed time: 86.63   seconds for  84  maps
8 threads: Elapsed time: 71.23   seconds for  84  maps

Now we are in R anyway, let’s make a quick plot:

image

Notice that even going from 4 to 8 threads helps (I did not expect that; conjecture: this may be related to being able to do other useful work while doing disk I/O). In the 8 thread case we have a bunch of Rscript.exe processes running:

image

We can keep all cores quite busy:

image

Implementation details

There are several ways to implement a thing like this. A serial approach could be:

GAMS
loop(maps,
    extract data for single map
    execute_unload “singlemapdata.gdx”;
    execute “Rscript.exe singlemapscript.R”
);

Instead I used:

GAMS
execute_unload “allmapdata.gdx”; 
execute “Rscript.exe allmapscript.R”

In general it is better to call expensive external programs once instead of inside a loop. Of course this moves some complexity from GAMS to the R script. Inside the R script we do the following:

allmapscript.R
1. read all data
2. read shape files for the maps
3. for(map in 1:nrow(maps)) {
       extract data for single map
       merge single map data with shape file
       plot single map
   }

When implementing a parallel version I could have used a parallel construct in GAMS (see: http://yetanothermathprogrammingconsultant.blogspot.com/2012/04/parallel-gams-jobs.html). However it was just much easier to use the parallel foreach loop in R. This just required:

parallelallmapscript.R
1. read all data
2. read shape files for the maps
3. set up a cluster with workers
4. foreach(map = 1:nrow(maps)) %dopar% {
       extract data for single map
       merge single map data with shape file
       plot single map
   }
5. close down parallel cluster