Tuesday, February 18, 2014

Terrible gaps in MIP model

I am doing something really wrong here:

    Nodes    |    Current Node    |     Objective Bounds      |     Work
Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0 26550.5890    0 1950   -0.00000 26550.5890     -      -    4s
     0     0 18754.1028    0 1854   -0.00000 18754.1028     -      -   12s
H    0     0                     239.5974000 18754.1028  7727%     -   13s
     0     0 17391.4529    0 1920  239.59740 17391.4529  7159%     -   16s
H    0     0                     244.2684000 17391.4529  7020%     -   16s
     0     0 17153.8979    0 1875  244.26840 17153.8979  6923%     -   18s
H    0     0                     244.8774000 17153.8979  6905%     -   18s
     0     0 17011.2374    0 1906  244.87740 17011.2374  6847%     -   19s
     0     0 16943.7668    0 1913  244.87740 16943.7668  6819%     -   20s
H    0     0                     245.8274000 16943.7668  6793%     -   21s
     0     0 16890.7751    0 1874  245.82740 16890.7751  6771%     -   21s
     0     0 16857.7210    0 1923  245.82740 16857.7210  6758%     -   22s
     0     0 16833.9578    0 1932  245.82740 16833.9578  6748%     -   23s
     0     0 16815.9366    0 1930  245.82740 16815.9366  6741%     -   24s
     0     0 16806.3253    0 1941  245.82740 16806.3253  6737%     -   25s
     0     0 16806.3253    0 1941  245.82740 16806.3253  6737%     -   26s
H    0     0                     248.0054000 16806.3253  6677%     -   37s
     0     0 16806.3253    0 1889  248.00540 16806.3253  6677%     -   43s
     0     0 16732.2340    0 1768  248.00540 16732.2340  6647%     -   56s
H    0     0                     249.7224000 16732.2340  6600%     -   57s
     0     0 16612.6719    0 1891  249.72240 16612.6719  6552%     -   58s
     0     0 16584.7054    0 1876  249.72240 16584.7054  6541%     -   59s

The gaps between the MIP bounds are larger than I have ever seen and don’t seem to get close anytime soon.

Monday, February 17, 2014

Coloring maps

I am trying to make it easy to make high-quality maps for presenting model results. See: http://yetanothermathprogrammingconsultant.blogspot.com/2014/01/maps-from-gams.html. One rule of thumb: don't use a rainbow color scheme when trying to display numeric data. Here is an example where they actually seem to use random colors. Note that these colors do not help you to quickly grasp where high and low GDP growth occurs.
image
From: http://www.theguardian.com/business/interactive/2014/feb/14/eurozone-economic-growth-gdp-map.

Friday, February 14, 2014

Excel modeling

When translating an Excel spreadsheet model using Solver into a GAMS model, I came across a construct like:
image
This can be modeled as a linear condition:
image
One disadvantage of using Excel as a modeling vehicle (a major one in my opinion) is that it does not really allow you to think about the model formulation. The formulas are hidden behind cells and the model structure is largely lost. Even easy model reformulations will escape attention.

Monday, February 10, 2014

Parallel R for making many maps

When making many maps (http://yetanothermathprogrammingconsultant.blogspot.com/2014/01/maps-from-gams.html) it may make sense to try to exploit multiple cores. I have 4 on my laptop. It came with hyper-threading turned on so it looks like the machine has 8 cpus:

 image

In R there is a nice parallel foreach construct (see: http://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf). As each map is independent of each other, this offers an obvious way of parallelizing the generation of many maps. The results are quite good:

Serial   : Elapsed time: 271.74   seconds for  84  maps
2 threads: Elapsed time: 136.17   seconds for  84  maps
4 threads: Elapsed time: 86.63   seconds for  84  maps
8 threads: Elapsed time: 71.23   seconds for  84  maps

Now we are in R anyway, let’s make a quick plot:

image

Notice that even going from 4 to 8 threads helps (I did not expect that; conjecture: this may be related to being able to do other useful work while doing disk I/O). In the 8 thread case we have a bunch of Rscript.exe processes running:

image

We can keep all cores quite busy:

image

Implementation details

There are several ways to implement a thing like this. A serial approach could be:

GAMS
loop(maps,
    extract data for single map
    execute_unload “singlemapdata.gdx”;
    execute “Rscript.exe singlemapscript.R”
);

Instead I used:

GAMS
execute_unload “allmapdata.gdx”; 
execute “Rscript.exe allmapscript.R”

In general it is better to call expensive external programs once instead of inside a loop. Of course this moves some complexity from GAMS to the R script. Inside the R script we do the following:

allmapscript.R
1. read all data
2. read shape files for the maps
3. for(map in 1:nrow(maps)) {
       extract data for single map
       merge single map data with shape file
       plot single map
   }

When implementing a parallel version I could have used a parallel construct in GAMS (see: http://yetanothermathprogrammingconsultant.blogspot.com/2012/04/parallel-gams-jobs.html). However it was just much easier to use the parallel foreach loop in R. This just required:

parallelallmapscript.R
1. read all data
2. read shape files for the maps
3. set up a cluster with workers
4. foreach(map = 1:nrow(maps)) %dopar% {
       extract data for single map
       merge single map data with shape file
       plot single map
   }
5. close down parallel cluster

Tuesday, February 4, 2014

GAMS–>R: gdxrrw vs gdx2sqlite

There are at least two ways to get data from GAMS into R. Use the tool gdxrrw (http://www.gams.com/presentations/informs2012_gdxrrw.pdf) which allows reading and writing GDX files from within R or use a database representation in between. Here we use gdx2sqlite and the SQLite database (http://yetanothermathprogrammingconsultant.blogspot.com/2013/10/gdx2sqlite.html). Here are some differences:

gdxrrw gdx2sqlite
GAMS code GAMS code
sets
  j
/j1*j3/
  k
/k1*k2/
;
parameter a(j,k);
a(j,k) = uniform(0,1);


execute_unload "data.gdx";
sets
  j
/j1*j3/
  k
/k1*k2/
;
parameter a(j,k);
a(j,k) = uniform(0,1);


execute_unload "data.gdx"
;
execute "gdx2sqlite -i data.gdx -o data.db";
R Code R Code

if (!require(reshape2)) {
   install.packages("reshape2", repos="
http://cran.r-project.org")
   library(reshape2)
}

if (!require(gdxrrw)) {
   download.file("
http://support.gams.com/lib/exe/fetch.php?media=gdxrrw:gdxrrw_0.4.0.zip","gdxrrw_0.4.0.zip")
   install.packages("gdxrrw_0.4.0.zip",repos=NULL)
   library(gdxrrw)
}

igdx("")

j<-rgdx.set("data.gdx","j")
j

a<-rgdx.param("data.gdx","a")
a

a$j

if (!require(RSQLite)) {
   install.packages("RSQLite", repos="
http://cran.r-project.org")
   library(RSQLite)
}

sqlite<-dbDriver("SQLite")
db <- dbConnect(sqlite,"data.db")

j<-dbGetQuery(db,"select * from j")
j

a<-dbGetQuery(db,"select * from a")
a

a$j

Results Results

> j
   i
1 j1
2 j2
3 j3
> a
   i  j     value
1 j1 k1 0.1717471
2 j1 k2 0.8432667
3 j2 k1 0.5503754
4 j2 k2 0.3011379
5 j3 k1 0.2922121
6 j3 k2 0.2240529
> a$j
[1] k1 k2 k1 k2 k1 k2
Levels: k1 k2

> j
   j
1 j1
2 j2
3 j3
> a
   j  k     value
1 j1 k1 0.1717471
2 j1 k2 0.8432667
3 j2 k1 0.5503754
4 j2 k2 0.3011379
5 j3 k1 0.2922121
6 j3 k2 0.2240529
> a$j
[1] "j1" "j1" "j2" "j2" "j3" "j3"

The database does a much better job of getting the correct column names into R. It is possible to retrieve the correct column names with gdxrrw by inspecting the attributes of the data frame. So we could have used the following assignment to repair the column names:

names(a)<-append(attributes(a)$domains,"value")

Another difference is that columns related to GAMS indices have a different type (factor v.s. character vector). The database interface always returns character columns as such, without translation into factors (even if getOption("stringsAsFactors") is true, which is actually the default).

R allows downloading and installing packages without specifying the version number: you get automatically the latest one. Unfortunately for gdxrrw you need to specify a complete version.

Update: looks like gdxrrw 0.4.0’s behavior has changed. I no longer see i,j,k,… as default column names but i1,i2,i3,…