Friday, April 30, 2010

GAMS/GDX compression

In some cases GDX compression can make a lot of difference. Here are the results for a large CSV file:

File                                 Size
AccPsdData_2009-11-19Su.csv   594,332,819
AccPsdData_2009-11-19Su.gdx    24,765,003
AccPsdData_2009-11-19Su.gdx     4,318,222  (with compression turned on)

The CSV file was read using SQL2GMS:

--- Job readcsv.gms Start 04/28/10 15:50:32 WEX-WEI 23.3.3 x86_64/MS Windows    
GAMS Rev 233  Copyright (C) 1987-2009 GAMS Development. All rights reserved
Licensee: Erwin Kalvelagen                               G090213/0001CV-WIN
          Amsterdam Optimization Modeling Group                      DC4572
--- Starting compilation
--- readcsv.gms(69) 2 Mb
--- call =sql2gms.exe @sql2gms.txt
GDXIO.DLL version:GDX Library      Nov  1, 2009 23.3.3 WIN 14596.15043 VIS x86/MS Windows      
ADO version: 6.0
Query: select ComNam,CtrNam,VarNam25,Year,Value from [AccPsdData_2009-11-19Su.csv]
GDX id: AccPsdData (Parameter)
Number of rows: 3230069
Elapsed time: 113.29 seconds
Done
Query: select distinct ComNam from [AccPsdData_2009-11-19Su.csv]
GDX id: Com (Set)
Number of rows: 55
Elapsed time: 23.67 seconds
Done
Query: select distinct CtrNam from [AccPsdData_2009-11-19Su.csv]
GDX id: Ctr (Set)
Number of rows: 254
Elapsed time: 24.09 seconds
Done
Query: select distinct VarNam25 from [AccPsdData_2009-11-19Su.csv]
GDX id: Var (Set)
Number of rows: 50
Elapsed time: 23.93 seconds
Done
Query: select distinct Year from [AccPsdData_2009-11-19Su.csv]
GDX id: Year (Set)
Number of rows: 51
Elapsed time: 23.76 seconds
Done
--- readcsv.gms(96) 2 Mb
--- Starting execution - empty program
--- readcsv.gms(96) 2 Mb
*** Status: Normal completion
--- Job readcsv.gms Stop 04/28/10 15:54:03 elapsed 0:03:30.982

It is a little bit slow as we make different passes over the data. Probably this would be faster if it was stored in a database.

Note: the sets were created to make it easier to read the data in a GAMS model. This way we have sets that can be read during compile time and thus can be used as domain. It is easy to calculate those sets in GAMS at run time, but then they cannot be used as domains.

The compression was turned on by the statement:

* compress gdx file
$setenv GdxCompress 1

* execute conversion
$call =sql2gms.exe @sql2gms.txt