In this post we describe how a bulk insert method can help get better performance when importing large data sets into an MySQL database. The same thing is true for SQL Server. In the tool gdx2sqlserver we dynamically switch to using bulk inserts when the number of records to insert is larger than a threshold. Using this we get good performance on GDX file with 100 element set and a 1 million element parameter:
set i /i1*i100/; alias(i,j,k); parameter a(i,j,k); a(i,j,k) = uniform(0,100); execute_unload "test",i,a; execute "gdx2sqlserver -i test.gdx -S .\SQL1 -db tmp -s tmp"; |
The default output is:
--- Job Untitled_1.gms Start 05/01/16 12:46:29 24.6.1 r55820 WEX-WEI x86 64bit/MS Windows GAMS 24.6.1 Copyright (C) 1987-2016 GAMS Development. All rights reserved Licensee: Erwin Kalvelagen G150803/0001CV-GEN Amsterdam Optimization Modeling Group DC10455 --- Starting compilation --- Untitled_1.gms(6) 3 Mb --- Starting execution: elapsed 0:00:00.010 --- Untitled_1.gms(5) 36 Mb --- GDX File C:\Users\Erwin\Documents\Embarcadero\Studio\Projects\gdx2sqlserver\Win32\Debug\test.gdx --- Untitled_1.gms(6) 36 Mb GDX2SQLServer v 0.1 Copyright (c) 2015-2016 Amsterdam Optimization Modeling Group LLC GDX Library 24.6.1 r55820 Released Jan 18, 2016 VS8 x86 32bit/MS Windows GDX:Input file: test.gdx GDX:Symbols: 2 GDX:Uels: 100 GDX:Loading Uels SQL:Selected driver: ODBC Driver 11 for SQL Server SQL:Connection string: Driver={ODBC Driver 11 for SQL Server};Server=.\SQL1;Trusted_Connection=Yes SQL:RDBMS version: Microsoft SQL Server 2014 - 12.0.2269.0 (X64) SQL:Database: tmp SQL:Schema: tmp ----------------------- i (100 records) a (1000000 records) Done (17.2 seconds) *** Status: Normal completion --- Job Untitled_1.gms Stop 05/01/16 12:46:47 elapsed 0:00:17.862
|
With the –v (verbose) flag we see that the second, larger symbol is using BULK INSERT:
GDX2SQLServer v 0.1 Copyright (c) 2015-2016 Amsterdam Optimization Modeling Group LLC GDX Library 24.6.1 r55820 Released Jan 18, 2016 VS8 x86 32bit/MS Windows GDX:Input file: test.gdx GDX:Symbols: 2 GDX:Uels: 100 GDX:Loading Uels SQL:Selected driver: ODBC Driver 11 for SQL Server SQL:Connection string: Driver={ODBC Driver 11 for SQL Server};Server=.\SQL1;Trusted_Connection=Yes SQLInfo:[Microsoft][ODBC Driver 11 for SQL Server][SQL Server]Changed database context to 'master'. SQLInfo:[Microsoft][ODBC Driver 11 for SQL Server][SQL Server]Changed language setting to us_english. select @@VERSION SQL:RDBMS version: Microsoft SQL Server 2014 - 12.0.2269.0 (X64) Jun 10 2015 03:35:45 Copyright (c) Microsoft Corporation Developer Edition (64-bit) on Windows NT 6.3 <X64> (Build 10586: ) select db_id('tmp') use [tmp] SQLInfo:[Microsoft][ODBC Driver 11 for SQL Server][SQL Server]Changed database context to 'tmp'. SQL:Database: tmp select schema_id('tmp') SQL:Schema: tmp ----------------------- i (100 records) select object_id('[tmp].[i]','U') drop table [tmp].[i] create table [tmp].[i]([i] varchar(4)) insert into [tmp].[i] values (?) sqlexecute(100 times) commit Time: 0.1 seconds a (1000000 records) select object_id('[tmp].[a]','U') drop table [tmp].[a] create table [tmp].[a]([i] varchar(4),[j] varchar(4),[k] varchar(4),[value] double precision) temp file: [C:\Users\Erwin\AppData\Local\Temp\tmp908A.tmp] writing C:\Users\Erwin\AppData\Local\Temp\tmp908A.tmp bulk insert [tmp].[a] from 'C:\Users\Erwin\AppData\Local\Temp\tmp908A.tmp' rows affected: 1000000 commit Time: 17.8 seconds deleting [C:\Users\Erwin\AppData\Local\Temp\tmp908A.tmp] Done (18.0 seconds) |
Here we deem symbol ‘i’ to be small (100 records) and we use a prepared insert statement. Symbol ‘a’ is considered large and we use bulk insert method. Indeed when we would have used normal inserts for the large symbol ‘a', we would have seen:
a (1000000 records) select object_id('[tmp].[a]','U') drop table [tmp].[a] create table [tmp].[a]([i] varchar(4),[j] varchar(4),[k] varchar(4),[value] double precision) insert into [tmp].[a] values (?,?,?,?) sqlexecute(1000000 times) commit (100 times) Time: 258.3 seconds
|
which is much slower.
There are still some improvements possible:
- Better buffering of the intermediate text files. This reduces the time from 17.8 to 16 seconds.
- Use a binary data file (maybe slightly faster, and maintains precision) This has now been implemented (see here), and reduced the time to move symbol ‘a’ into SQL Server to 10 seconds.
- Use bulk data API. This requires us to bypass ODBC and use some SQL Server Client DLL that may not be installed by the user.
No comments:
Post a Comment