The bulk import facility of SQL Server allows both ascii (text) files or binary files. The documentation states this can make some differences in performance. To test this I added a binary flag (-b) to the gdx2sqlserver tool so it can create a temporary file in native binary format.
To measure the difference we first execute the following GAMS model:
set i /i1*i100/; |
I.e. create a three dimensional parameter a with 100 x 100 x 100 = 1 million entries. The values are random. Then export this as gdx file test.gdx and finally import the gdx file into SQL Server using default settings (except for a verbose –v flag). We see:
a (1000000 records) |
This was using a text file (actually a tab delimited file). Next we do the same thing but add the –b flag. The result is:
a (1000000 records) |
The difference in performance is much more pronounced than I expected. A second advantage is that the floating point numbers are not suffering from round-off during the conversion from and to ascii.
A number of optimizations has enabled to move data quickly from a GAMS GDX file into SQL server and each optimization gave us some extra performance. Here is a short review of things I tried, with timings on transferring the same large parameter ‘a’, so the numbers are comparable:
- Multiple inserts. Done reasonably smart: no auto-commit after each insert, and a prepared statement to reduce some overhead: 258.3 seconds.
- Bulk insert with ascii file: 17.8 seconds.
- Improved buffering when writing ascii file: 16.0 seconds.
- Bulk insert with buffered binary file: 10.0 seconds.
No comments:
Post a Comment