Recently I was involved a few models where GAMS was generating a lot of results, exceeding what we could easily store in Excel or Access. Here is an alternative: store the results in SQL server.
Here is a small two-liner example script that shows how we can efficiently import a very large parameter with 8 million records residing in a GDX file into a SQL server table.
$call gdx2txt.exe in=x.gdx id=p out=p2.txt tab $call bcp testdb.dbo.result in p2.txt -c -S .\SQLEXPRESS –T -h "ROWS_PER_BATCH=8000000,TABLOCK" |
The first line exports the parameter from a GDX file to a tab delimited text file. The tool gdx2txt is a little bit more convenient than using GAMS PUT statements and it is also faster (using the same precision more than 5 times as fast).
The second line calls the BCP utility to import this text file. The hints setting the batch size and requesting a table lock increase the performance.
The log looks like:
--- Job gdx2txt3.gms Start 02/24/12 23:09:02 WEX-WEI 23.7.3 x86_64/MS Windows GAMS Rev 237 Copyright (C) 1987-2011 GAMS Development. All rights reserved Licensee: Erwin Kalvelagen G100803/0001CV-WIN Amsterdam Optimization Modeling Group DC4572 --- Starting compilation --- gdx2txt3.gms(1) 2 Mb --- call gdx2txt.exe in=x.gdx id=p out=p2.txt tab GDX2TXT:Opening "x.gdx" GDX2TXT:Symbol:p Uels:200 GDX2TXT: 100000 records (elapsed 172 millisec) GDX2TXT: 200000 records (elapsed 320 millisec) …
GDX2TXT: 7800000 records (elapsed 11201 millisec) GDX2TXT: 7900000 records (elapsed 11343 millisec) GDX2TXT: total records: 8000000 (elapsed 11484 millisec) --- gdx2txt3.gms(2) 2 Mb --- call bcp testdb.dbo.result in p2.txt -c -S .\SQLEXPRESS -T -h "ROWS_PER_BATCH=8000000,TABLOCK" Starting copy... 1000 rows sent to SQL Server. Total sent: 1000 1000 rows sent to SQL Server. Total sent: 2000 1000 rows sent to SQL Server. Total sent: 3000 … 1000 rows sent to SQL Server. Total sent: 7998000 1000 rows sent to SQL Server. Total sent: 7999000 1000 rows sent to SQL Server. Total sent: 8000000 8000000 rows copied. Network packet size (bytes): 4096 Clock Time (ms.) Total : 24055 Average : (332571.19 rows per sec.) --- Starting execution - empty program *** Status: Normal completion --- Job gdx2txt3.gms Stop 02/24/12 23:09:38 elapsed 0:00:35.736
|
The total turn around time for this test data was 35 seconds. About 11 seconds for GDX2TXT and 24 seconds for BCP. So the database import is only 2.5 times as slow compared to my time to generate the CSV file. That is very fast indeed. The table had four columns: three GAMS indices (converted to type NCHAR(4)) and a value (type REAL). No (SQL) indices or keys were used against the table so the inserts could be done at optimal speed. Also the db file sizes were extended so they did not need to grow during the inserts. As a result, in practice the BCP timings may be a bit slower than shown here.
Note: dropping all rows by “DELETE FROM testdb.dbo.result” can take a long time (1.5 minutes on my machine). Much faster is “TRUNCATE TABLE testdb.dbo.result”.
Usage:
C:\projects\gdx2txt>gdx2txt Usage: gdx2txt in=gdxfile [id=name] [out=txtfile] [csv|tab] Write identifier from GDX file to an ascii file Parameters: in=gdxfile name of GDX file, required id=name name of identifier, optional if gdx has only one id out=txtfile name of outputfile, optional, stdout is used otherwise csv write ascii file in CSV format tab write ascii file in tab delimited format Example: > gamslib trnsport > gams trnsport gdx=trnsport > gdx2txt in=trnsport.gdx id=x > gdx2txt in=trnsport.gdx id=x out=x.csv csv C:\projects\gdx2txt>bcp usage: bcp {dbtable | query} {in | out | queryout | format} datafile [-m maxerrors] [-f formatfile] [-e errfile] [-F firstrow] [-L lastrow] [-b batchsize] [-n native type] [-c character type] [-w wide character type] [-N keep non-text native] [-V file format version] [-q quoted identifier] [-C code page specifier] [-t field terminator] [-r row terminator] [-i inputfile] [-o outfile] [-a packetsize] [-S server name] [-U username] [-P password] [-T trusted connection] [-v version] [-R regional enable] [-k keep null values] [-E keep identity values] [-h "load hints"] [-x generate xml format file] [-d database name] C:\projects\gdx2txt> |
No comments:
Post a Comment