Thursday, May 12, 2016

Large tab delimited files and buffer size

A piece of software reads a GAMS GDX file and produces a tab delimited text file to be used for bulk import into a database. This is written in Delphi (Pascal) and the default text buffer size is a skimpy 128 bytes. That means many Windows API calls for my data set. It is better to use a larger buffer size, but I expect the effect of making the buffer larger will taper off very quickly. Indeed, I did a small, quick test:

image

We measure here the time of the loop:

for i := 1 to n do begin
   read_record_from_gdx_file;
   write_record_to_text_file;
end;

Indeed moving from the default of 128 bytes to a 1k buffer gives us most of the performance boost, while moving further to 2k or 4k helps just a tiny bit. After that there is no further effect.