Tuesday, February 18, 2014

Terrible gaps in MIP model

I am doing something really wrong here:

    Nodes    |    Current Node    |     Objective Bounds      |     Work
Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0 26550.5890    0 1950   -0.00000 26550.5890     -      -    4s
     0     0 18754.1028    0 1854   -0.00000 18754.1028     -      -   12s
H    0     0                     239.5974000 18754.1028  7727%     -   13s
     0     0 17391.4529    0 1920  239.59740 17391.4529  7159%     -   16s
H    0     0                     244.2684000 17391.4529  7020%     -   16s
     0     0 17153.8979    0 1875  244.26840 17153.8979  6923%     -   18s
H    0     0                     244.8774000 17153.8979  6905%     -   18s
     0     0 17011.2374    0 1906  244.87740 17011.2374  6847%     -   19s
     0     0 16943.7668    0 1913  244.87740 16943.7668  6819%     -   20s
H    0     0                     245.8274000 16943.7668  6793%     -   21s
     0     0 16890.7751    0 1874  245.82740 16890.7751  6771%     -   21s
     0     0 16857.7210    0 1923  245.82740 16857.7210  6758%     -   22s
     0     0 16833.9578    0 1932  245.82740 16833.9578  6748%     -   23s
     0     0 16815.9366    0 1930  245.82740 16815.9366  6741%     -   24s
     0     0 16806.3253    0 1941  245.82740 16806.3253  6737%     -   25s
     0     0 16806.3253    0 1941  245.82740 16806.3253  6737%     -   26s
H    0     0                     248.0054000 16806.3253  6677%     -   37s
     0     0 16806.3253    0 1889  248.00540 16806.3253  6677%     -   43s
     0     0 16732.2340    0 1768  248.00540 16732.2340  6647%     -   56s
H    0     0                     249.7224000 16732.2340  6600%     -   57s
     0     0 16612.6719    0 1891  249.72240 16612.6719  6552%     -   58s
     0     0 16584.7054    0 1876  249.72240 16584.7054  6541%     -   59s

The gaps between the MIP bounds are larger than I have ever seen and don’t seem to get close anytime soon.

Monday, February 17, 2014

Coloring maps

I am trying to make it easy to make high-quality maps for presenting model results. See: http://yetanothermathprogrammingconsultant.blogspot.com/2014/01/maps-from-gams.html. One rule of thumb: don't use a rainbow color scheme when trying to display numeric data. Here is an example where they actually seem to use random colors. Note that these colors do not help you to quickly grasp where high and low GDP growth occurs.
image
From: http://www.theguardian.com/business/interactive/2014/feb/14/eurozone-economic-growth-gdp-map.

Friday, February 14, 2014

Excel modeling

When translating an Excel spreadsheet model using Solver into a GAMS model, I came across a construct like:
image
This can be modeled as a linear condition:
image
One disadvantage of using Excel as a modeling vehicle (a major one in my opinion) is that it does not really allow you to think about the model formulation. The formulas are hidden behind cells and the model structure is largely lost. Even easy model reformulations will escape attention.

Monday, February 10, 2014

Parallel R for making many maps

When making many maps (http://yetanothermathprogrammingconsultant.blogspot.com/2014/01/maps-from-gams.html) it may make sense to try to exploit multiple cores. I have 4 on my laptop. It came with hyper-threading turned on so it looks like the machine has 8 cpus:

 image

In R there is a nice parallel foreach construct (see: http://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf). As each map is independent of each other, this offers an obvious way of parallelizing the generation of many maps. The results are quite good:

Serial   : Elapsed time: 271.74   seconds for  84  maps
2 threads: Elapsed time: 136.17   seconds for  84  maps
4 threads: Elapsed time: 86.63   seconds for  84  maps
8 threads: Elapsed time: 71.23   seconds for  84  maps

Now we are in R anyway, let’s make a quick plot:

image

Notice that even going from 4 to 8 threads helps (I did not expect that; conjecture: this may be related to being able to do other useful work while doing disk I/O). In the 8 thread case we have a bunch of Rscript.exe processes running:

image

We can keep all cores quite busy:

image

Implementation details

There are several ways to implement a thing like this. A serial approach could be:

GAMS
loop(maps,
    extract data for single map
    execute_unload “singlemapdata.gdx”;
    execute “Rscript.exe singlemapscript.R”
);

Instead I used:

GAMS
execute_unload “allmapdata.gdx”; 
execute “Rscript.exe allmapscript.R”

In general it is better to call expensive external programs once instead of inside a loop. Of course this moves some complexity from GAMS to the R script. Inside the R script we do the following:

allmapscript.R
1. read all data
2. read shape files for the maps
3. for(map in 1:nrow(maps)) {
       extract data for single map
       merge single map data with shape file
       plot single map
   }

When implementing a parallel version I could have used a parallel construct in GAMS (see: http://yetanothermathprogrammingconsultant.blogspot.com/2012/04/parallel-gams-jobs.html). However it was just much easier to use the parallel foreach loop in R. This just required:

parallelallmapscript.R
1. read all data
2. read shape files for the maps
3. set up a cluster with workers
4. foreach(map = 1:nrow(maps)) %dopar% {
       extract data for single map
       merge single map data with shape file
       plot single map
   }
5. close down parallel cluster

Tuesday, February 4, 2014

GAMS–>R: gdxrrw vs gdx2sqlite

There are at least two ways to get data from GAMS into R. Use the tool gdxrrw (http://www.gams.com/presentations/informs2012_gdxrrw.pdf) which allows reading and writing GDX files from within R or use a database representation in between. Here we use gdx2sqlite and the SQLite database (http://yetanothermathprogrammingconsultant.blogspot.com/2013/10/gdx2sqlite.html). Here are some differences:

gdxrrw gdx2sqlite
GAMS code GAMS code
sets
  j
/j1*j3/
  k
/k1*k2/
;
parameter a(j,k);
a(j,k) = uniform(0,1);


execute_unload "data.gdx";
sets
  j
/j1*j3/
  k
/k1*k2/
;
parameter a(j,k);
a(j,k) = uniform(0,1);


execute_unload "data.gdx"
;
execute "gdx2sqlite -i data.gdx -o data.db";
R Code R Code

if (!require(reshape2)) {
   install.packages("reshape2", repos="
http://cran.r-project.org")
   library(reshape2)
}

if (!require(gdxrrw)) {
   download.file("
http://support.gams.com/lib/exe/fetch.php?media=gdxrrw:gdxrrw_0.4.0.zip","gdxrrw_0.4.0.zip")
   install.packages("gdxrrw_0.4.0.zip",repos=NULL)
   library(gdxrrw)
}

igdx("")

j<-rgdx.set("data.gdx","j")
j

a<-rgdx.param("data.gdx","a")
a

a$j

if (!require(RSQLite)) {
   install.packages("RSQLite", repos="
http://cran.r-project.org")
   library(RSQLite)
}

sqlite<-dbDriver("SQLite")
db <- dbConnect(sqlite,"data.db")

j<-dbGetQuery(db,"select * from j")
j

a<-dbGetQuery(db,"select * from a")
a

a$j

Results Results

> j
   i
1 j1
2 j2
3 j3
> a
   i  j     value
1 j1 k1 0.1717471
2 j1 k2 0.8432667
3 j2 k1 0.5503754
4 j2 k2 0.3011379
5 j3 k1 0.2922121
6 j3 k2 0.2240529
> a$j
[1] k1 k2 k1 k2 k1 k2
Levels: k1 k2

> j
   j
1 j1
2 j2
3 j3
> a
   j  k     value
1 j1 k1 0.1717471
2 j1 k2 0.8432667
3 j2 k1 0.5503754
4 j2 k2 0.3011379
5 j3 k1 0.2922121
6 j3 k2 0.2240529
> a$j
[1] "j1" "j1" "j2" "j2" "j3" "j3"

The database does a much better job of getting the correct column names into R. It is possible to retrieve the correct column names with gdxrrw by inspecting the attributes of the data frame. So we could have used the following assignment to repair the column names:

names(a)<-append(attributes(a)$domains,"value")

Another difference is that columns related to GAMS indices have a different type (factor v.s. character vector). The database interface always returns character columns as such, without translation into factors (even if getOption("stringsAsFactors") is true, which is actually the default).

R allows downloading and installing packages without specifying the version number: you get automatically the latest one. Unfortunately for gdxrrw you need to specify a complete version.

Update: looks like gdxrrw 0.4.0’s behavior has changed. I no longer see i,j,k,… as default column names but i1,i2,i3,…

Wednesday, January 22, 2014

Maps from GAMS

Creating maps from GAMS via R. The maps are at country and fpu (basin) level. R can export to different bitmap and vector formats.


Sunday, January 12, 2014

Watson Related Revenue: 100 million vs 10 billion

http://www.businessweek.com/articles/2014-01-10/ibms-artificial-intelligence-problem-or-why-watson-cant-get-a-job

Watson realized $100 million in revenue vs. a predicted 10 billion. That is a factor 100 off. Such a forecast error is not really a good showcase for "predictive analytics." Of course these kind of predictions are often more goal setting or inspirational rather than serious forecasts. (For me $100 million is not something to sneeze at).

Thursday, December 5, 2013

Populating a spreadsheet through VBA

Populating a large spreadsheet table can be slow if we push data cell-by-cell. Here is a small example.

Naïve approach

We start with creating a 1000x1000 table by directly accessing individual cells.

Private Sub run1()
  
Dim r As
Range
  
Set r = Range("Sheet1!B2"
)

   r.Resize(m +
1, n + 1
).ClearContents

  
Dim t0 As
Single
   t0 = Timer

  
Dim i As
Integer
  
Dim j As
Integer

  
For i = 1 To
m
      r(
1 + i, 1) = "row"
& i
  
Next
i
  
For j = 1 To
n
      r(
1, 1 + j) = "col"
& j
  
Next
j

  
For i = 1 To
m
     
For j = 1 To
n
         r(
1 + i, 1 + j) = 3.14

     
Next j
  
Next
i

  
Dim t1 As
Single
   t1 = Timer
   r(
0, 1) = "Time : " & Format(t1 - t0, "fixed"
)
End
Sub

This approach takes about 240 seconds on my machine.

A refinement is to turn off screen updating: Application.ScreenUpdating = False. This did not make a difference for me. Excel seems to turn off screen updating automatically after a little while.

Fast approach

A better approach is to create a VBA array and to write this array in one swoop to the spreadsheet. We use here a Variant array as it contains both strings and numbers. The code looks like:

Private Sub run3()
  
Dim r As
Range
  
Set r = Range("Sheet3!B2"
)

   r.Resize(m +
1, n + 1
).ClearContents

  
Dim t0 As
Single
   t0 = Timer

  
Dim i As
Integer
  
Dim j As
Integer

  
Dim a(1 To m + 1, 1 To n + 1
)
  
For i = 1 To
m
      a(
1 + i, 1) = "row"
& i
  
Next
i
  
For j = 1 To
n
      a(
1, 1 + j) = "col"
& j
  
Next
j

  
For i = 1 To
m
     
For j = 1 To
n
         a(
1 + i, 1 + j) = 3.14

     
Next j
  
Next
i

   r.Resize(m +
1, n + 1
).Value = a

  
Dim t1 As
Single
   t1 = Timer
   r(
0, 1) = "Time : " & Format(t1 - t0, "fixed"
)

End Sub

This code takes 0.46 seconds!

image

This strategy is also useful when calling Excel from other languages such as Delphi or C#.

Friday, November 15, 2013

Converting USDA PSD data to GAMS GDX

USDA has some good agricultural production and trade data available at http://www.fas.usda.gov/psdonline/psdHome.aspx. Here we try to read the CSV file psd_alldata.csv and convert it to a usable GAMS GDX file.

$ontext

  
Convert psd_alldata.csv to a GDX file

  
Source:
  
http://www.fas.usda.gov/psdonline/psdDownload.aspx

  
Sorry for the obscure syntax, I am sure this will confuse users

  
Erwin Kalvelagen, 2013

$offtext


$set csv psd_alldata.csv
$set gdx psd_alldata.gdx

$version 241

$call csv2gdx %csv% id=alldata useheader=y index=2,4,5,9,11 value=12


set year 'superset' /1900*2050/;
alias
(commodity,country,attribute,unit,*);
parameter
alldata(commodity,country,year,attribute,unit);

$gdxin %gdx%
$load alldata


sets

  commodity2(commodity)
  country2(country)
  year2(year)   
'market year'
  attribute2(attribute)
  unit2(unit)
;



option
  commodity2 < alldata
  country2 < alldata
  year2 < alldata
  attribute2 < alldata
  unit2 < alldata
;


execute_unload '%gdx%', alldata, commodity2=commodity, country2=country,
                        year2=year, attribute2=attribute, unit2=unit;

The final GDX file looks like:

image

This file can be read as:

$ontext

  
Example: Read PSD data

$offtext


$set gdx psd_alldata.gdx


sets
  commodity
  country
  year       
'market year'
  attribute
  unit
;


parameter alldata(commodity,country,year,attribute,unit);

$gdxin  %gdx%
$load   commodity country year attribute unit
$loaddc alldata

Notes:

  • We use the tool csv2gdx to read the csv and produce a raw gdx file (the gdx file will be overwritten with the final version after adding the supporting sets).
  • The projection option is used to extract the sets from the parameter alldata. This is non-intuitive (i.e. ugly) syntax but it is fast.
  • The set year is explicitly declared to make sure the years are ordered. In the csv file the (marketing) years are not ordered. Refinement: in order to make sure the superset of years is not chosen too small, we probably should have used a $loaddc alldata instead of $load alldata.
  • The declaration of the sets look a bit funny in the gdx viewer:
    image
    This is due to the way we had to declare the sets for use with the projection option.

Monday, October 28, 2013

Turbo Pascal Compiler in JavaScript: http://www.teamten.com/lawrence/projects/turbo_pascal_compiler/.


There must be more of these interesting efforts. I remember a Basic interpreter in TeX (http://texcatalogue.sarovar.org/entries/basix.html).