Friday, January 18, 2013

Excel 1 million row limit

Excel 7 (and later) has limit of 1,048,576 rows (before version 7 this limit was 65,536 rows). When using COM automation (using Excel from a program) one usually turns off DisplayAlerts to prevent popping up messages related to saving workbooks etc. However this also means that a large CSV file will be silently truncated by the GAMS utility GDXXRW:

C:\projects\tmp>tail x.csv

2999991

2999992

2999993

2999994

2999995

2999996

2999997

2999998

2999999

3000000

 

C:\projects\tmp>wc x.csv

3000000 3000000 25888896 x.csv

 

C:\projects\tmp>gdxxrw i=x.csv o=x.gdx set=x rng=a1 rdim=1 cdim=0

 

GDXXRW           Dec 18, 2012 24.0.1 WIN 37366.37409 VS8 x86/MS Windows

Input file : C:\projects\tmp\x.csv

Output file: C:\projects\tmp\x.gdx

Total time = 7847 Ms

 

C:\projects\tmp>gdxdump x.gdx > x.gms

 

C:\projects\tmp>head x.gms

$onempty

 

Set x(*) /

'1',

'2',

'3',

'4',

'5',

'6',

'7',

 

C:\projects\tmp>tail x.gms

'1048569',

'1048570',

'1048571',

'1048572',

'1048573',

'1048574',

'1048575',

'1048576' /;

 

$offempty

 

C:\projects\tmp>

We can read unlimited size CSV files using SQL2GMS (see for instance http://yetanothermathprogrammingconsultant.blogspot.com/2010/04/gamsgdx-compression.html).

I talked with the GAMS people about this. The proposed fix for the next release: check if row 108576 is not empty. If that is the case issue a warning message.