This is a follow-up on this post. For large datasets using individual inserts is not as efficient as applying bulk operations. In MySQL bulk inserts can be done with the LOAD DATA LOCAL INFILE command. This statement takes a local (i.e. at the client) text file, copies it to the server and then does a bulk insert of the whole thing. This approach is built in the tool gdx2mysql. If a symbol has more than N records (N=500 by default), then we write a text file and call LOAD DATA LOCAL INFILE. If a symbol has fewer records then we just a standard prepared insert statement. A verbose log will show what happens.
set i /i1*i100/; alias(i,j,k); parameter a(i,j,k); a(i,j,k) = uniform(0,100); execute_unload "test",i,a; execute "gdx2mysql -i test.gdx -s tmp -u test -p test -v"; |
In the above test model we generate a set with 100 elements and a parameter with 1003=1,000,000 elements. The verbose log looks like
--- Job Untitled_1.gms Start 04/22/16 04:01:39 24.6.1 r55820 WEX-WEI x86 64bit/MS Windows GAMS 24.6.1 Copyright (C) 1987-2016 GAMS Development. All rights reserved Licensee: Erwin Kalvelagen G150803/0001CV-GEN Amsterdam Optimization Modeling Group DC10455 --- Starting compilation --- Untitled_1.gms(6) 3 Mb --- Starting execution: elapsed 0:00:00.007 --- Untitled_1.gms(5) 36 Mb --- GDX File C:\Users\Erwin\Documents\Embarcadero\Studio\Projects\gdx2mysql\Win32\Debug\test.gdx --- Untitled_1.gms(6) 36 Mb GDX2MySQL v 0.1 Copyright (c) 2015-2016 Amsterdam Optimization Modeling Group LLC GDX Library 24.6.1 r55820 Released Jan 18, 2016 VS8 x86 32bit/MS Windows GDX:Input file: test.gdx GDX:Symbols: 2 GDX:Uels: 100 GDX:Loading Uels SQL:Selected driver: MySQL ODBC 5.3 ANSI Driver SQL:Connection string: Driver={MySQL ODBC 5.3 ANSI Driver};Server=localhost;User=xxx;Password=xxx set autocommit=0 select @@version_comment SQL:RDBMS: MySQL Community Server (GPL) select @@version SQL:RDBMS version: 5.6.26-log select count(*) from information_schema.schemata where schema_name = 'tmp' ----------------------- i (100 records) drop table if exists `tmp`.`i` create table `tmp`.`i`(`i` varchar(4)) insert into `tmp`.`i` values (?) sqlexecute(100 times) commit Time : 0.6 a (1000000 records) drop table if exists `tmp`.`a` create table `tmp`.`a`(`i` varchar(4),`j` varchar(4),`k` varchar(4),`value` double) temp file: [C:\Users\Erwin\AppData\Local\Temp\tmpA046.tmp] writing C:\Users\Erwin\AppData\Local\Temp\tmpA046.tmp load data local infile 'C:\\Users\\Erwin\\AppData\\Local\\Temp\\tmpA046.tmp' into table `tmp`.`a` rows affected: 1000000 commit Time : 39.6 deleting [C:\Users\Erwin\AppData\Local\Temp\tmpA046.tmp] *** Status: Normal completion --- Job Untitled_1.gms Stop 04/22/16 04:02:20 elapsed 0:00:41.353 |
The smaller set i is imported using normal inserts, while the larger parameter a is imported through an intermediate text file. This is much more efficient than using the standard inserts. There is a gdx2mysql option to force larger symbols to use standard inserts, so we can compare timings:
a (1000000 records) drop table if exists `tmp`.`a` create table `tmp`.`a`(`i` varchar(4),`j` varchar(4),`k` varchar(4),`value` double) insert into `tmp`.`a` values (?,?,?,?) sqlexecute(1000000 times) commit 100 times Time : 257.5 |
So we are about 6.5 times as fast. (We can expect even larger differences in other cases).
A final way to make imports faster is to use ISAM (or rather MyISAM) tables. ISAM is an older storage format (MySQL nowadays uses the InnoDB storage engine by default). However ISAM is still faster for our simple (but large) tables, as can be seen when running with the –isam flag:
i (100 records) drop table if exists `tmp`.`i` create table `tmp`.`i`(`i` varchar(4)) engine=myisam insert into `tmp`.`i` values (?) sqlexecute(100 times) commit Time : 0.3 a (1000000 records) drop table if exists `tmp`.`a` create table `tmp`.`a`(`i` varchar(4),`j` varchar(4),`k` varchar(4),`value` double) engine=myisam temp file: [C:\Users\Erwin\AppData\Local\Temp\tmpBF31.tmp] writing C:\Users\Erwin\AppData\Local\Temp\tmpBF31.tmp load data local infile 'C:\\Users\\Erwin\\AppData\\Local\\Temp\\tmpBF31.tmp' into table `tmp`.`a` rows affected: 1000000 commit Time : 9.9 |
This again makes a substantial difference in getting data into MySQL.
Finally a test using the a model with many symbols: indus89. It is noted that Cplex solves the LP model in less than a second and we need 29 seconds to store everything. Of course in practical situations we don’t store all symbols of a model (i.e. all sets, parameters, variables, equations - both input and output and intermediate results) but only solution information that is of interest. Here we use the default settings (i.e. non-verbose output).
--- Job indus89.gms Start 04/23/16 16:28:32 24.6.1 r55820 WEX-WEI x86 64bit/MS Windows GAMS 24.6.1 Copyright (C) 1987-2016 GAMS Development. All rights reserved Licensee: Erwin Kalvelagen G150803/0001CV-GEN Amsterdam Optimization Modeling Group DC10455 --- Starting compilation --- indus89.gms(3624) 4 Mb --- Starting execution: elapsed 0:00:00.067 --- indus89.gms(3618) 6 Mb --- Generating LP model wsisn --- indus89.gms(3621) 9 Mb --- 2,726 rows 6,570 columns 39,489 non-zeroes --- Executing CPLEX: elapsed 0:00:00.459 IBM ILOG CPLEX 24.6.1 r55820 Released Jan 18, 2016 WEI x86 64bit/MS Windows --- GAMS/Cplex licensed for continuous and discrete problems. Cplex 12.6.3.0 Reading data... Starting Cplex... Space for names approximately 0.28 Mb Use option 'names no' to turn use of names off Tried aggregator 1 time. LP Presolve eliminated 280 rows and 805 columns. Aggregator did 652 substitutions. Reduced LP has 1794 rows, 5113 columns, and 33006 nonzeros. Presolve time = 0.09 sec. (6.67 ticks) Initializing dual steep norms . . . Iteration log . . . Iteration: 1 Scaled dual infeas = 2955667.467575 Iteration: 145 Scaled dual infeas = 67.402987 Iteration: 320 Scaled dual infeas = 40.456945 Iteration: 456 Scaled dual infeas = 0.000000 Iteration: 457 Dual objective = 693736.223395 Iteration: 566 Dual objective = 357373.814662 Iteration: 698 Dual objective = 323632.243202 Iteration: 841 Dual objective = 287702.310223 Iteration: 1032 Dual objective = 214777.339973 Iteration: 1186 Dual objective = 197801.054391 Iteration: 1321 Dual objective = 192342.788491 Iteration: 1493 Dual objective = 165767.510563 Iteration: 1616 Dual objective = 158191.208028 Iteration: 1778 Dual objective = 145496.672263 Iteration: 1877 Dual objective = 143045.603206 Iteration: 1989 Dual objective = 139486.873101 Iteration: 2083 Dual objective = 136581.639370 Iteration: 2207 Dual objective = 133666.833792 Iteration: 2343 Dual objective = 130050.280793 Iteration: 2466 Dual objective = 127415.162363 Iteration: 2617 Dual objective = 123160.882361 Iteration: 2725 Dual objective = 121804.057461 Iteration: 2830 Dual objective = 120796.202924 Iteration: 2954 Dual objective = 118752.921229 Iteration: 3054 Dual objective = 117126.856176 Iteration: 3148 Dual objective = 116392.467557 Iteration: 3259 Dual objective = 115780.645362 Iteration: 3351 Dual objective = 115477.692660 Iteration: 3455 Dual objective = 115194.050278 Iteration: 3543 Dual objective = 114874.122256 Removing shift (1). LP status(1): optimal Cplex Time: 0.78sec (det. 245.24 ticks) Optimal solution found. Objective : 114873.655552 --- Restarting execution --- indus89.gms(3621) 4 Mb --- Reading solution for model wsisn --- indus89.gms(3621) 4 Mb --- Executing after solve: elapsed 0:00:02.716 --- indus89.gms(3623) 5 Mb --- GDX File C:\Users\Erwin\Documents\Embarcadero\Studio\Projects\gdx2mysql\Win32\Debug\test.gdx --- indus89.gms(3624) 5 Mb GDX2MySQL v 0.1 Copyright (c) 2015-2016 Amsterdam Optimization Modeling Group LLC GDX Library 24.6.1 r55820 Released Jan 18, 2016 VS8 x86 32bit/MS Windows GDX:Input file: test.gdx GDX:Symbols: 281 GDX:Uels: 245 GDX:Loading Uels SQL:Selected driver: MySQL ODBC 5.3 ANSI Driver SQL:Connection string: Driver={MySQL ODBC 5.3 ANSI Driver};Server=localhost;User=xxx;Password=xxx SQL:RDBMS: MySQL Community Server (GPL) SQL:RDBMS version: 5.6.26-log ----------------------- z (9 records) pv (4 records) pv1 (3 records) pv2 (2 records) pvz (18 records) cq (18 records) cc (10 records) c (15 records) cf (2 records) cnf (13 records) t (2 records) s (4 records) w (4 records) g (2 records) gf (1 records) gs (1 records) t1 (3 records) r1 (14 records) dc (6 records) sa (4 records) wce (2 records) m1 (15 records) m (12 records) wcem (12 records) sea (2 records) seam (12 records) sea1 (3 records) sea1m (24 records) ci (4 records) p2 (2 records) a (3 records) ai (6 records) q (3 records) nt (2 records) is (21 records) ps (1 records) isr (1 records) land (3041 records) tech (385 records) bullock (1340 records) labor (3151 records) water (2520 records) tractor (783 records) sylds (803 records) fert (240 records) fertgr (15 records) natyield (15 records) yldprpv (41 records) yldprzs (128 records) yldprzo (14 records) growthcy (54 records) weedy (113 records) graz (16 records) yield (502 records) growthcyf (135 records) iolive (153 records) sconv (20 records) bp (12 records) cnl (43 records) pvcnl (43 records) gwfg (63 records) comdef (1032 records) subdef (63 records) zsa (63 records) gwf (38 records) carea (97 records) evap (516 records) rain (508 records) divpost (639 records) gwt (225 records) dep1 (162 records) dep2 (120 records) depth (516 records) efr (508 records) eqevap (516 records) subirr (516 records) subirrfac (9 records) n (37 records) i (9 records) nc (43 records) nn (51 records) ni (9 records) nb (36 records) ncap (27 records) lloss (19 records) lceff (19 records) cd (2 records) rivercd (48 records) riverb (1369 records) s58 (2 records) infl5080 (247 records) tri (205 records) inflow (82 records) trib (66 records) rrcap (11 records) rulelo (32 records) ruleup (60 records) revapl (45 records) pow (4 records) pn (2 records) v (27 records) powerchar (214 records) rcap (3 records) rep7 (148 records) rep8 (214 records) p3 (4 records) prices (38 records) finsdwtpr (35 records) ecnsdwtpr (35 records) p1 (7 records) p11 (2 records) pri1 (14 records) wageps (24 records) twutil (2 records) totprod (93 records) farmcons (75 records) demand (97 records) elast (13 records) growthrd (18 records) consratio (15 records) natexp (4 records) explimit (17 records) exppv (8 records) expzo (17 records) sr1 (2 records) zwt (63 records) eqevapz (108 records) subirrz (108 records) efrz (107 records) resource (136 records) cneff (43 records) wceff (516 records) tweff (516 records) cneffz (9 records) tweffz (108 records) wceffz (108 records) fleffz (9 records) canalwz (180 records) canalwrtz (180 records) gwtsa (273 records) gwt1 (83 records) ratiofs (15 records) ftt (7 records) res88 (63 records) croparea (114 records) growthres (41 records) orcharea (9 records) orchgrowth (9 records) scmillcap (9 records) cnl1 (39 records) postt (90 records) protarb (82 records) psr (1 records) psr1 (1 records) z1 (9 records) cn (13 records) ccn (11 records) qn (2 records) ncn (2 records) ce (5 records) cm (1 records) ex (15 records) techc (131 records) tec (106 records) divnwfp (12 records) rval (6 records) fsalep (16 records) misc (7 records) seedp (14 records) wage (12 records) miscct (15 records) esalep (16 records) emisc (7 records) eseedp (14 records) ewage (12 records) emiscct (15 records) importp (1 records) exportp (5 records) wnr (2502 records) beta (97 records) alpha (117 records) p (20 records) pmax (117 records) pmin (117 records) qmax (97 records) qmin (97 records) incr (97 records) ws (1940 records) rs (1940 records) qs (1940 records) endpr (2340 records) acost (15 records) ppc (30 records) x (649 records) animal (45 records) prodt (240 records) proda (221 records) import (9 records) export (45 records) consump (324 records) familyl (216 records) hiredl (180 records) itw (9 records) tw (96 records) itr (18 records) ts (180 records) f (612 records) rcont (444 records) canaldiv (516 records) cnldivsea (86 records) prsea (4 records) tcdivsea (2 records) wdivrz (180 records) slkland (180 records) slkwater (180 records) artfod (30 records) artwater (180 records) artwaternd (444 records) nat (2080 records) cost (15 records) conv (104 records) demnat (131 records) ccombal (195 records) qcombal (45 records) consbal (135 records) laborc (180 records) fodder (30 records) protein (30 records) grnfdr (30 records) bdraft (180 records) brepco (15 records) tdraft (180 records) trcapc (108 records) twcapc (96 records) landc (180 records) orchareac (9 records) waterbaln (180 records) watalcz (180 records) subirrc (84 records) nbal (432 records) watalcsea (78 records) divsea (2 records) divcnlsea (86 records) watalcpro (4 records) prseaw (4 records) nwfpalc (12 records) scalarparameters (24 records) scalarvariables (1 records) scalarequations (4 records) Done (29.0 seconds) *** Status: Normal completion --- Job indus89.gms Stop 04/23/16 16:29:04 elapsed 0:00:31.762
|
No comments:
Post a Comment