## Monday, February 10, 2014

### Parallel R for making many maps

When making many maps (http://yetanothermathprogrammingconsultant.blogspot.com/2014/01/maps-from-gams.html) it may make sense to try to exploit multiple cores. I have 4 on my laptop. It came with hyper-threading turned on so it looks like the machine has 8 cpus:

In R there is a nice parallel foreach construct (see: http://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf). As each map is independent of each other, this offers an obvious way of parallelizing the generation of many maps. The results are quite good:

Serial   : Elapsed time: 271.74   seconds for  84  maps
2 threads: Elapsed time: 136.17   seconds for  84  maps
4 threads: Elapsed time: 86.63   seconds for  84  maps
8 threads: Elapsed time: 71.23   seconds for  84  maps

Now we are in R anyway, let’s make a quick plot:

Notice that even going from 4 to 8 threads helps (I did not expect that; conjecture: this may be related to being able to do other useful work while doing disk I/O). In the 8 thread case we have a bunch of Rscript.exe processes running:

We can keep all cores quite busy:

#### Implementation details

There are several ways to implement a thing like this. A serial approach could be:

 GAMS loop(maps,     extract data for single map     execute_unload “singlemapdata.gdx”;     execute “Rscript.exe singlemapscript.R” );

 GAMS execute_unload “allmapdata.gdx”;  execute “Rscript.exe allmapscript.R”

In general it is better to call expensive external programs once instead of inside a loop. Of course this moves some complexity from GAMS to the R script. Inside the R script we do the following:

 allmapscript.R 1. read all data 2. read shape files for the maps 3. for(map in 1:nrow(maps)) {        extract data for single map        merge single map data with shape file        plot single map    }

When implementing a parallel version I could have used a parallel construct in GAMS (see: http://yetanothermathprogrammingconsultant.blogspot.com/2012/04/parallel-gams-jobs.html). However it was just much easier to use the parallel foreach loop in R. This just required:

 parallelallmapscript.R 1. read all data 2. read shape files for the maps 3. set up a cluster with workers 4. foreach(map = 1:nrow(maps)) %dopar% {        extract data for single map        merge single map data with shape file        plot single map    } 5. close down parallel cluster