“This works though I'm worried it might look too scary to my students coming from GAMS.”
Wednesday, January 30, 2013
When receiving DBF files I usually look at the structure by loading it into Excel. In this case I received DBF files that were much too large to be handled by Excel. Before writing queries it is useful to be able to look at the column names and types. That is what this GAMS job does:
The output looks like:
I looked for some free DBF viewers, but they largely originate from obscure Russian web sites and Google warned not to open the downloaded zip files. This ADO based approach should at least create no problems like introducing viruses.
Monday, January 28, 2013
This is a difficult CSV file to read using using the GAMS tool GDXXRW:
We can however read this with SQL2GMS using the following query:
Q1=select barid,dsid,'pass1',pass1 from %csvfile% where barid is not null \
The empty lines are handled by the check for NULLs. Notice that the parameter will look like:
The advantage of SQL is that we have some flexibility in reorganizing and operating on the data before it is passed on to the GDX DLL. In some cases that can be very useful.
In virtually all of the recent projects I have been involved with, I used SQL2GMS to get (sometimes large) data into GAMS. Not all GAMS modelers are familiar with SQL but given the large amounts of data available in databases I would say that it is a worthwhile time investment to get some basic database knowledge.
I think the syntax is somewhat confusing, but this thing can be handy.
Not all databases I encounter are normalized! In fact I see many tables that resemble spreadsheets (i.e. with many columns).
Sunday, January 27, 2013
I tried a construct that looks like the following code:
Basically I take the intersection of the two sets and create a mapping between the two base sets. Unfortunately, the sets are big:
n_cell5m = card(cell5m);
n_cell5m_spam = card(cell5m_spam);
display n_cell5m, n_cell5m_spam;
---- 23 PARAMETER n_cell5m = 2294094.000
as a result expression (1) takes an extraordinary long time to execute:
|---- 26 Assignment cellmap 2436.954 2437.032 SECS 383 Mb 806839|
I believe the sameas function is just not properly implemented in GAMS. This performance is really not what one would expect.
Of course we can work around this:
intersect(u) = cell5m(u)*cell5m_spam(u);
cellmap(cell5m(intersect),cell5m_spam(intersect)) = yes;
This now executes in much less than a second:
|---- 27 Assignment intersect 0.187 0.203 SECS 333 Mb 806839 |
---- 33 Assignment cellmap 0.203 0.406 SECS 358 Mb 806839
Note that we had to change the declarations of the sets a little bit, in order to get this passed the domain checking of GAMS.
Monday, January 21, 2013
I encountered a problem when reading a scalar from a compressed GDX file:
GDXIN C:\projects\ulrike - ifpri\gdx\spam.gdx
I did not really need the rowcount scalar so for me the workaround was: just skip reading this scalar.
It is amazing that this bug was not found earlier. I guess not many users are using compressed GDX files (the default is no compression) and of those few even fewer will ever store a scalar in such a file.
Some possible workarounds:
- skip reading scalars
- store scalars as a 1 dimensional parameter
- store scalars in a separate uncompress gdx file
- read scalars using gdxdump.
As said a possible workaround could be to use gdxdump for the scalars. This means instead of using:
we should implement something like:
$set gdxfile spam.gdx
* read from compressed GDX file
$load cell5m rowcount
$set gdxfile spam.gdx
* read from compressed GDX file
$call gdxdump %gdxfile% symb=rowcount output=rowcount.inc
No doubt this will be fixed in the next release.
Friday, January 18, 2013
Excel 7 (and later) has limit of 1,048,576 rows (before version 7 this limit was 65,536 rows). When using COM automation (using Excel from a program) one usually turns off DisplayAlerts to prevent popping up messages related to saving workbooks etc. However this also means that a large CSV file will be silently truncated by the GAMS utility GDXXRW:
3000000 3000000 25888896 x.csv
C:\projects\tmp>gdxxrw i=x.csv o=x.gdx set=x rng=a1 rdim=1 cdim=0
GDXXRW Dec 18, 2012 24.0.1 WIN 37366.37409 VS8 x86/MS Windows
Input file : C:\projects\tmp\x.csv
Output file: C:\projects\tmp\x.gdx
Total time = 7847 Ms
C:\projects\tmp>gdxdump x.gdx > x.gms
Set x(*) /
We can read unlimited size CSV files using SQL2GMS (see for instance http://yetanothermathprogrammingconsultant.blogspot.com/2010/04/gamsgdx-compression.html).
I talked with the GAMS people about this. The proposed fix for the next release: check if row 108576 is not empty. If that is the case issue a warning message.
Wednesday, January 16, 2013
For a project we convert data from a humongous table with 23 million records stored in a remote SQL Server database (global spatial grid with data on agricultural production and land use from the Spatial Production Allocation Model). It takes about 1 hour to run a query and populate a GDX file. Probably the performance is related to limited bandwidth. This is truly an example of “big data”. In a subsequent model we load the large GDX file into GAMS for further processing. This loading is much faster: about 4 seconds. It takes about 750 MB RAM to load the data in GAMS.
For these type of applications the use of GDX file as a high-performance staging “database” is very convenient. It would be a nightmare to load this data directly from the SQL Server database into GAMS: every run would start with a 1 hour startup time.
Compression works quite well on this gdx file (see also: http://yetanothermathprogrammingconsultant.blogspot.com/2010/04/gamsgdx-compression.html). Here are the file sizes:
|Compression||File size in bytes|
Users are more and more throwing real large data at me. Besides the SQL Server database above, I was also delivered an 1.2 GB DBF file. DBF files are still popular with GIS applications. This file could be imported into a compressed GDX file with the following size reduction (also partly because we dropped some redundant columns but mainly because GDX stores strings more efficiently):
|compressed GDX||62 MB|
Sunday, January 13, 2013
|C based glpsol.exe|| |
GLPK Simplex Optimizer, v4.47
GLPK Simplex Optimizer, v4.47
Friday, January 11, 2013
- New Gurobi version: http://www.gurobi.com/products/gurobi-optimizer/what's-new-in-v5.1. It is amazing how much progress is still being made in making solvers faster.
- AMPL’s Constraint Programming Interface: http://zverovich.net/ics2013/2013-01-06-ampl-interface-to-constraint-programming-solvers.html
- Running modeling languages using an Excel interface: http://solverstudio.org/. Interesting approach, a little bit like Solver Foundations Excel front-end. Both seem to use VSTO:
Thursday, January 10, 2013
If you use a random numbers in GAMS, setting execseed to a clock based number can be used to make each random number sequence different. I sometimes see:
execseed = gmillisec(jnow);
The idea is to initialize the seed with the milliseconds part of the current date. This is not completely correct however. In very rare cases this will fail with the message:
**** Exec Error at line 4: Random seed has to be > 0.0 and < 10**9
A better version looks like:
execseed = 1+gmillisec(jnow);
I.e. the advice given by the GAMS people here is actually not that good: http://support.gams.com/doku.php?id=gams:random_number_generator_in_gams:
Note that runs with this clock based construct will make your runs irreproducable (and thus bugs difficult to handle).
Tuesday, January 8, 2013
Some part of a large GAMS model was running slow so we had to investigate. The first line in the box below was the culprit:
We tried two alternative formulations using implicit loops. The performance gains on this data set were enormous (now the GAMS part is no longer the bottleneck in the process, the bulk of the time has moved back to database operations). Explicit loops can be much more expensive than implicit loops, especially when dealing with big data.
Friday, January 4, 2013
Is there any software available for non-convex QCP’s? (http://www.ibm.com/developerworks/forums/thread.jspa?messageID=14925577).
You may want to try GloMIQO which targets these type of problems.
Here some agricultural production statistics for Angola (source: XLS download from FAO/CountryStat):
I am sure the harvest of 2011 was not that abundant.
Looks like an import problem. This is the XLS file. Also available is CSV data. That shows:
All other production data are coded as integers, but here we have a number with a decimal comma (and many more digits than warranted). Decimal commas/decimal points are always a point of contention.
PS: Crops are denoted in Portuguese (Horticolas=vegetables) as Angola was once a Portuguese colony.
PS2. I dropped a note to Countrystats support about this.