Saturday, February 6, 2016

R: The RData File Format (2)

Now that we understand the .RData file format (as shown here) we can start writing some code to actually generate some .Rdata files. The tool gdx2r takes a GAMS GDX file and dumps the contents into an .Rdata file. E.g. in a GAMS model we can do:

execute_unload "results.gdx",i,f,c,x;
execute "gdx2r -i results.gdx -o results.rdata"
;

The first command exports some GAMS symbols to a GDX file. The call to gdx2r converts it to an .Rdata file.

> setwd("c:/tmp")

> load("results.rdata",verbose = T)

Loading objects:

  i

  f

  c

  x

> i

[1] "seattle"   "san-diego"

> f

[1] 90

> c

          i        j value

1   seattle new-york 0.225

2   seattle  chicago 0.153

3   seattle   topeka 0.162

4 san-diego new-york 0.225

5 san-diego  chicago 0.162

6 san-diego   topeka 0.126

> x

          i        j level marginal lower upper

1   seattle new-york    50    0.000     0   Inf

2   seattle  chicago   300    0.000     0   Inf

3   seattle   topeka     0    0.036     0   Inf

4 san-diego new-york   275    0.000     0   Inf

5 san-diego  chicago     0    0.009     0   Inf

6 san-diego   topeka   275    0.000     0   Inf

>

Using the small result for the transportation model in the GAMS model library we show here a few symbols.

  • i is one-dimensional set, so it is exported as a string vector.
  • f is a scalar. In R these will arrive as a real vector of length one.
  • c is a two-dimensional parameter. This will be represented as data frame. The i and j columns are represented as string vectors. I am thinking about changing this to a factor (may be using an option setting).  When using a factor we store a vector of unique strings (the levels) and then use an integer vector to index into that string vector. In R factors are often used to handle categorical variables. 
  • Finally x is a two-dimensional variable. This is also exported as a data frame.

The file results.rdata is a compressed binary file, so no good way to look at it directly:

image

For debugging I added some options to gdx2r. With the –ascii flag we can generate an ASCII version of this file. This looks like:

image

We can also turn off compression using the flags –unbuffered or –buffered. That generates a binary file we can at least make some sense from:

image 

All these versions can be read by R using the load command.

As we write these files natively (without going through R) we would expect this to be fast. Next post: some timings on large data.