Yet Another Math Programming Consultant

Tuesday, February 18, 2014

Terrible gaps in MIP model

I am doing something really wrong here:

    Nodes    |    Current Node    |     Objective Bounds      |     Work
Expl Unexpl | Obj Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0 26550.5890    0 1950   -0.00000 26550.5890     -      -    4s
     0     0 18754.1028    0 1854   -0.00000 18754.1028     -      -   12s
H    0     0                     239.5974000 18754.1028 7727%     -   13s
     0     0 17391.4529    0 1920 239.59740 17391.4529 7159%     -   16s
H    0     0                     244.2684000 17391.4529 7020%     -   16s
     0     0 17153.8979    0 1875 244.26840 17153.8979 6923%     -   18s
H    0     0                     244.8774000 17153.8979 6905%     -   18s
     0     0 17011.2374    0 1906 244.87740 17011.2374 6847%     -   19s
     0     0 16943.7668    0 1913 244.87740 16943.7668 6819%     -   20s
H    0     0                     245.8274000 16943.7668 6793%     -   21s
     0     0 16890.7751    0 1874 245.82740 16890.7751 6771%     -   21s
     0     0 16857.7210    0 1923 245.82740 16857.7210 6758%     -   22s
     0     0 16833.9578    0 1932 245.82740 16833.9578 6748%     -   23s
     0     0 16815.9366    0 1930 245.82740 16815.9366 6741%     -   24s
     0     0 16806.3253    0 1941 245.82740 16806.3253 6737%     -   25s
     0     0 16806.3253    0 1941 245.82740 16806.3253 6737%     -   26s
H    0     0                     248.0054000 16806.3253 6677%     -   37s
     0     0 16806.3253    0 1889 248.00540 16806.3253 6677%     -   43s
     0     0 16732.2340    0 1768 248.00540 16732.2340 6647%     -   56s
H    0     0                     249.7224000 16732.2340 6600%     -   57s
     0     0 16612.6719    0 1891 249.72240 16612.6719 6552%     -   58s
     0     0 16584.7054    0 1876 249.72240 16584.7054 6541%     -   59s

The gaps between the MIP bounds are larger than I have ever seen and don’t seem to get close anytime soon.

Monday, February 17, 2014

I am trying to make it easy to make high-quality maps for presenting model results. See: http://yetanothermathprogrammingconsultant.blogspot.com/2014/01/maps-from-gams.html. One rule of thumb: don't use a rainbow color scheme when trying to display numeric data. Here is an example where they actually seem to use random colors. Note that these colors do not help you to quickly grasp where high and low GDP growth occurs.

From: http://www.theguardian.com/business/interactive/2014/feb/14/eurozone-economic-growth-gdp-map.

Friday, February 14, 2014

Excel modeling

When translating an Excel spreadsheet model using Solver into a GAMS model, I came across a construct like:

This can be modeled as a linear condition:

One disadvantage of using Excel as a modeling vehicle (a major one in my opinion) is that it does not really allow you to think about the model formulation. The formulas are hidden behind cells and the model structure is largely lost. Even easy model reformulations will escape attention.

Monday, February 10, 2014

Parallel R for making many maps

When making many maps (http://yetanothermathprogrammingconsultant.blogspot.com/2014/01/maps-from-gams.html) it may make sense to try to exploit multiple cores. I have 4 on my laptop. It came with hyper-threading turned on so it looks like the machine has 8 cpus:

In R there is a nice parallel foreach construct (see: http://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf). As each map is independent of each other, this offers an obvious way of parallelizing the generation of many maps. The results are quite good:

Serial   : Elapsed time: 271.74   seconds for 84 maps
2 threads: Elapsed time: 136.17   seconds for 84 maps
4 threads: Elapsed time: 86.63   seconds for 84 maps
8 threads: Elapsed time: 71.23   seconds for 84 maps

Now we are in R anyway, let’s make a quick plot:

Notice that even going from 4 to 8 threads helps (I did not expect that; conjecture: this may be related to being able to do other useful work while doing disk I/O). In the 8 thread case we have a bunch of Rscript.exe processes running:

We can keep all cores quite busy:

Implementation details

There are several ways to implement a thing like this. A serial approach could be:

GAMS
loop(maps,
    extract data for single map
    execute_unload “singlemapdata.gdx”;
    execute “Rscript.exe singlemapscript.R”
);

Instead I used:

GAMS
execute_unload “allmapdata.gdx”;
execute “Rscript.exe allmapscript.R”

In general it is better to call expensive external programs once instead of inside a loop. Of course this moves some complexity from GAMS to the R script. Inside the R script we do the following:

allmapscript.R
1. read all data
2. read shape files for the maps
3. for(map in 1:nrow(maps)) {
       extract data for single map
       merge single map data with shape file
       plot single map
   }

When implementing a parallel version I could have used a parallel construct in GAMS (see: http://yetanothermathprogrammingconsultant.blogspot.com/2012/04/parallel-gams-jobs.html). However it was just much easier to use the parallel foreach loop in R. This just required:

parallelallmapscript.R
1. read all data
2. read shape files for the maps
3. set up a cluster with workers
4. foreach(map = 1:nrow(maps)) %dopar% {
       extract data for single map
       merge single map data with shape file
       plot single map
   }
5. close down parallel cluster

Tuesday, February 4, 2014

GAMS–>R: gdxrrw vs gdx2sqlite

There are at least two ways to get data from GAMS into R. Use the tool gdxrrw (http://www.gams.com/presentations/informs2012_gdxrrw.pdf) which allows reading and writing GDX files from within R or use a database representation in between. Here we use gdx2sqlite and the SQLite database (http://yetanothermathprogrammingconsultant.blogspot.com/2013/10/gdx2sqlite.html). Here are some differences:

gdxrrw	gdx2sqlite
GAMS code	GAMS code
sets j /j1j3/ k /k1k2/ ; parameter a(j,k); a(j,k) = uniform(0,1); execute_unload "data.gdx";	sets j /j1j3/ k /k1k2/ ; parameter a(j,k); a(j,k) = uniform(0,1); execute_unload "data.gdx"; execute "gdx2sqlite -i data.gdx -o data.db";
R Code	R Code
if (!require(reshape2)) { install.packages("reshape2", repos="http://cran.r-project.org") library(reshape2) } if (!require(gdxrrw)) { download.file("http://support.gams.com/lib/exe/fetch.php?media=gdxrrw:gdxrrw_0.4.0.zip","gdxrrw_0.4.0.zip") install.packages("gdxrrw_0.4.0.zip",repos=NULL) library(gdxrrw) } igdx("") j<-rgdx.set("data.gdx","j") j a<-rgdx.param("data.gdx","a") a a$j	if (!require(RSQLite)) { install.packages("RSQLite", repos="http://cran.r-project.org") library(RSQLite) } sqlite<-dbDriver("SQLite") db <- dbConnect(sqlite,"data.db") j<-dbGetQuery(db,"select * from j") j a<-dbGetQuery(db,"select * from a") a a$j
Results	Results
> j i 1 j1 2 j2 3 j3 > a i j value 1 j1 k1 0.1717471 2 j1 k2 0.8432667 3 j2 k1 0.5503754 4 j2 k2 0.3011379 5 j3 k1 0.2922121 6 j3 k2 0.2240529 > a$j [1] k1 k2 k1 k2 k1 k2 Levels: k1 k2	> j j 1 j1 2 j2 3 j3 > a j k value 1 j1 k1 0.1717471 2 j1 k2 0.8432667 3 j2 k1 0.5503754 4 j2 k2 0.3011379 5 j3 k1 0.2922121 6 j3 k2 0.2240529 > a$j [1] "j1" "j1" "j2" "j2" "j3" "j3"

The database does a much better job of getting the correct column names into R. It is possible to retrieve the correct column names with gdxrrw by inspecting the attributes of the data frame. So we could have used the following assignment to repair the column names:

names(a)<-append(attributes(a)$domains,"value")

Another difference is that columns related to GAMS indices have a different type (factor v.s. character vector). The database interface always returns character columns as such, without translation into factors (even if getOption("stringsAsFactors") is true, which is actually the default).

R allows downloading and installing packages without specifying the version number: you get automatically the latest one. Unfortunately for gdxrrw you need to specify a complete version.

Update: looks like gdxrrw 0.4.0’s behavior has changed. I no longer see i,j,k,… as default column names but i1,i2,i3,…

Wednesday, January 22, 2014

Maps from GAMS

Creating maps from GAMS via R. The maps are at country and fpu (basin) level. R can export to different bitmap and vector formats.

Sunday, January 12, 2014

Watson Related Revenue: 100 million vs 10 billion

http://www.businessweek.com/articles/2014-01-10/ibms-artificial-intelligence-problem-or-why-watson-cant-get-a-job

Watson realized $100 million in revenue vs. a predicted 10 billion. That is a factor 100 off. Such a forecast error is not really a good showcase for "predictive analytics." Of course these kind of predictions are often more goal setting or inspirational rather than serious forecasts. (For me $100 million is not something to sneeze at).

Thursday, December 5, 2013

Populating a spreadsheet through VBA

Populating a large spreadsheet table can be slow if we push data cell-by-cell. Here is a small example.

Naïve approach

We start with creating a 1000x1000 table by directly accessing individual cells.

Private Sub run1()
   Dim r As Range
   Set r = Range("Sheet1!B2")

   r.Resize(m + 1, n + 1).ClearContents

   Dim t0 As Single
   t0 = Timer

   Dim i As Integer
   Dim j As Integer

   For i = 1 To m
      r(1 + i, 1) = "row" & i
   Next i
   For j = 1 To n
      r(1, 1 + j) = "col" & j
   Next j

   For i = 1 To m
      For j = 1 To n
         r(1 + i, 1 + j) = 3.14
      Next j
   Next i

   Dim t1 As Single
   t1 = Timer
   r(0, 1) = "Time : " & Format(t1 - t0, "fixed")
End Sub

This approach takes about 240 seconds on my machine.

A refinement is to turn off screen updating: Application.ScreenUpdating = False. This did not make a difference for me. Excel seems to turn off screen updating automatically after a little while.

Fast approach

A better approach is to create a VBA array and to write this array in one swoop to the spreadsheet. We use here a Variant array as it contains both strings and numbers. The code looks like:

Private Sub run3()
   Dim r As Range
   Set r = Range("Sheet3!B2")

   r.Resize(m + 1, n + 1).ClearContents

   Dim t0 As Single
   t0 = Timer

   Dim i As Integer
   Dim j As Integer

   Dim a(1 To m + 1, 1 To n + 1)
   For i = 1 To m
      a(1 + i, 1) = "row" & i
   Next i
   For j = 1 To n
      a(1, 1 + j) = "col" & j
   Next j

   For i = 1 To m
      For j = 1 To n
         a(1 + i, 1 + j) = 3.14
      Next j
   Next i

   r.Resize(m + 1, n + 1).Value = a

   Dim t1 As Single
   t1 = Timer
   r(0, 1) = "Time : " & Format(t1 - t0, "fixed")

End Sub

This code takes 0.46 seconds!

This strategy is also useful when calling Excel from other languages such as Delphi or C#.

Friday, November 15, 2013

Converting USDA PSD data to GAMS GDX

USDA has some good agricultural production and trade data available at http://www.fas.usda.gov/psdonline/psdHome.aspx. Here we try to read the CSV file psd_alldata.csv and convert it to a usable GAMS GDX file.

$ontext

   Convert psd_alldata.csv to a GDX file

   Source:
   http://www.fas.usda.gov/psdonline/psdDownload.aspx

   Sorry for the obscure syntax, I am sure this will confuse users

   Erwin Kalvelagen, 2013

$offtext

$set csv psd_alldata.csv
$set gdx psd_alldata.gdx

$version 241

$call csv2gdx %csv% id=alldata useheader=y index=2,4,5,9,11 value=12

set year 'superset' /1900*2050/;
alias (commodity,country,attribute,unit,*);
parameter alldata(commodity,country,year,attribute,unit);

$gdxin %gdx%
$load alldata

sets
commodity2(commodity)
country2(country)
year2(year)    'market year'
attribute2(attribute)
unit2(unit)
;

option
commodity2 < alldata
country2 < alldata
year2 < alldata
attribute2 < alldata
unit2 < alldata
;

execute_unload '%gdx%', alldata, commodity2=commodity, country2=country,
                        year2=year, attribute2=attribute, unit2=unit;

The final GDX file looks like:

This file can be read as:

$ontext

   Example: Read PSD data

$offtext

$set gdx psd_alldata.gdx

sets
commodity
country
year        'market year'
attribute
unit
;

parameter alldata(commodity,country,year,attribute,unit);

$gdxin %gdx%
$load   commodity country year attribute unit
$loaddc alldata

Notes:

We use the tool csv2gdx to read the csv and produce a raw gdx file (the gdx file will be overwritten with the final version after adding the supporting sets).
The projection option is used to extract the sets from the parameter alldata. This is non-intuitive (i.e. ugly) syntax but it is fast.
The set year is explicitly declared to make sure the years are ordered. In the csv file the (marketing) years are not ordered. Refinement: in order to make sure the superset of years is not chosen too small, we probably should have used a $loaddc alldata instead of $load alldata.
The declaration of the sets look a bit funny in the gdx viewer:

This is due to the way we had to declare the sets for use with the projection option.

Monday, October 28, 2013

Turbo Pascal Compiler in JavaScript: http://www.teamten.com/lawrence/projects/turbo_pascal_compiler/.

There must be more of these interesting efforts. I remember a Basic interpreter in TeX (http://texcatalogue.sarovar.org/entries/basix.html).