Friday, April 22, 2016

Dump solution data into a MySQL database II

This is a follow-up on this post. For large datasets using individual inserts is not as efficient as applying bulk operations. In MySQL bulk inserts can be done with the LOAD DATA LOCAL INFILE command. This statement takes a local (i.e. at the client) text file, copies it to the server and then does a bulk insert of the whole thing. This approach is built in the tool gdx2mysql. If a symbol has more than N records (N=500 by default), then we write a text file and call LOAD DATA LOCAL INFILE. If a symbol has fewer records then we just a standard prepared insert statement. A verbose log will show what happens.

set i /i1*i100/;
alias
(i,j,k);
parameter
a(i,j,k);
a(i,j,k) = uniform(0,100);
execute_unload "test"
,i,a;
execute "gdx2mysql -i test.gdx -s tmp -u test -p test -v";

In the above test model we generate a set with 100 elements and a parameter with 1003=1,000,000 elements. The verbose log looks like

--- Job Untitled_1.gms Start 04/22/16 04:01:39 24.6.1 r55820 WEX-WEI x86 64bit/MS Windows
GAMS 24.6.1   Copyright (C) 1987-2016 GAMS Development. All rights reserved
Licensee: Erwin Kalvelagen                               G150803/0001CV-GEN
          Amsterdam Optimization Modeling Group                     DC10455
--- Starting compilation
--- Untitled_1.gms(6) 3 Mb
--- Starting execution: elapsed 0:00:00.007
--- Untitled_1.gms(5) 36 Mb
--- GDX File C:\Users\Erwin\Documents\Embarcadero\Studio\Projects\gdx2mysql\Win32\Debug\test.gdx
--- Untitled_1.gms(6) 36 Mb
GDX2MySQL v 0.1
Copyright (c) 2015-2016 Amsterdam Optimization Modeling Group LLC

   GDX Library      24.6.1 r55820 Released Jan 18, 2016 VS8 x86 32bit/MS Windows
   GDX:Input file: test.gdx
   GDX:Symbols: 2
   GDX:Uels: 100
   GDX:Loading Uels
   SQL:Selected driver: MySQL ODBC 5.3 ANSI Driver
   SQL:Connection string: Driver={MySQL ODBC 5.3 ANSI Driver};Server=localhost;User=xxx;Password=xxx
      set autocommit=0
      select @@version_comment
   SQL:RDBMS: MySQL Community Server (GPL)
      select @@version
   SQL:RDBMS version: 5.6.26-log
      select count(*) from information_schema.schemata where schema_name = 'tmp'
   -----------------------
   i (100 records)
      drop table if exists `tmp`.`i`
      create table `tmp`.`i`(`i` varchar(4))
      insert into `tmp`.`i` values (?)
      sqlexecute(100 times)
      commit
      Time : 0.6
   a (1000000 records)
      drop table if exists `tmp`.`a`
      create table `tmp`.`a`(`i` varchar(4),`j` varchar(4),`k` varchar(4),`value` double)
      temp file: [C:\Users\Erwin\AppData\Local\Temp\tmpA046.tmp]
      writing C:\Users\Erwin\AppData\Local\Temp\tmpA046.tmp
      load data local infile 'C:\\Users\\Erwin\\AppData\\Local\\Temp\\tmpA046.tmp' into table `tmp`.`a`
      rows affected: 1000000
      commit
      Time : 39.6
      deleting [C:\Users\Erwin\AppData\Local\Temp\tmpA046.tmp]

*** Status: Normal completion
--- Job Untitled_1.gms Stop 04/22/16 04:02:20 elapsed 0:00:41.353

The smaller set i is imported using normal inserts, while the larger parameter a is imported through an intermediate text file. This is much more efficient than using the standard inserts. There is a gdx2mysql option to force larger symbols to use standard inserts, so we can compare timings:

a (1000000 records)
   drop table if exists `tmp`.`a`
   create table `tmp`.`a`(`i` varchar(4),`j` varchar(4),`k` varchar(4),`value` double)
   insert into `tmp`.`a` values (?,?,?,?)
   sqlexecute(1000000 times)
   commit 100 times
   Time : 257.5

So we are about 6.5 times as fast. (We can expect even larger differences in other cases).

A final way to make imports faster is to use ISAM (or rather MyISAM) tables. ISAM is an older storage format (MySQL nowadays uses the InnoDB storage engine by default). However ISAM is still faster for our simple (but large) tables, as can be seen when running with the –isam flag:

i (100 records)
   drop table if exists `tmp`.`i`
   create table `tmp`.`i`(`i` varchar(4)) engine=myisam
   insert into `tmp`.`i` values (?)
   sqlexecute(100 times)
   commit
   Time : 0.3
a (1000000 records)
   drop table if exists `tmp`.`a`
   create table `tmp`.`a`(`i` varchar(4),`j` varchar(4),`k` varchar(4),`value` double) engine=myisam
   temp file: [C:\Users\Erwin\AppData\Local\Temp\tmpBF31.tmp]
   writing C:\Users\Erwin\AppData\Local\Temp\tmpBF31.tmp
   load data local infile 'C:\\Users\\Erwin\\AppData\\Local\\Temp\\tmpBF31.tmp' into table `tmp`.`a`
   rows affected: 1000000
   commit
   Time : 9.9

This again makes a substantial difference in getting data into MySQL.

Finally a test using the a model with many symbols: indus89. It is noted that Cplex solves the LP model in less than a second and we need 29 seconds to store everything. Of course in practical situations we don’t store all symbols of a model (i.e. all sets, parameters, variables, equations - both input and output and intermediate results) but only solution information that is of interest. Here we use the default settings (i.e. non-verbose output).

--- Job indus89.gms Start 04/23/16 16:28:32 24.6.1 r55820 WEX-WEI x86 64bit/MS Windows
GAMS 24.6.1   Copyright (C) 1987-2016 GAMS Development. All rights reserved
Licensee: Erwin Kalvelagen                               G150803/0001CV-GEN
          Amsterdam Optimization Modeling Group                     DC10455
--- Starting compilation
--- indus89.gms(3624) 4 Mb
--- Starting execution: elapsed 0:00:00.067
--- indus89.gms(3618) 6 Mb
--- Generating LP model wsisn
--- indus89.gms(3621) 9 Mb
---   2,726 rows  6,570 columns  39,489 non-zeroes
--- Executing CPLEX: elapsed 0:00:00.459

IBM ILOG CPLEX   24.6.1 r55820 Released Jan 18, 2016 WEI x86 64bit/MS Windows
--- GAMS/Cplex licensed for continuous and discrete problems.
Cplex 12.6.3.0

Reading data...
Starting Cplex...
Space for names approximately 0.28 Mb
Use option 'names no' to turn use of names off
Tried aggregator 1 time.
LP Presolve eliminated 280 rows and 805 columns.
Aggregator did 652 substitutions.
Reduced LP has 1794 rows, 5113 columns, and 33006 nonzeros.
Presolve time = 0.09 sec. (6.67 ticks)
Initializing dual steep norms . . .

Iteration log . . .
Iteration:     1   Scaled dual infeas =       2955667.467575
Iteration:   145   Scaled dual infeas =            67.402987
Iteration:   320   Scaled dual infeas =            40.456945
Iteration:   456   Scaled dual infeas =             0.000000
Iteration:   457   Dual objective     =        693736.223395
Iteration:   566   Dual objective     =        357373.814662
Iteration:   698   Dual objective     =        323632.243202
Iteration:   841   Dual objective     =        287702.310223
Iteration:  1032   Dual objective     =        214777.339973
Iteration:  1186   Dual objective     =        197801.054391
Iteration:  1321   Dual objective     =        192342.788491
Iteration:  1493   Dual objective     =        165767.510563
Iteration:  1616   Dual objective     =        158191.208028
Iteration:  1778   Dual objective     =        145496.672263
Iteration:  1877   Dual objective     =        143045.603206
Iteration:  1989   Dual objective     =        139486.873101
Iteration:  2083   Dual objective     =        136581.639370
Iteration:  2207   Dual objective     =        133666.833792
Iteration:  2343   Dual objective     =        130050.280793
Iteration:  2466   Dual objective     =        127415.162363
Iteration:  2617   Dual objective     =        123160.882361
Iteration:  2725   Dual objective     =        121804.057461
Iteration:  2830   Dual objective     =        120796.202924
Iteration:  2954   Dual objective     =        118752.921229
Iteration:  3054   Dual objective     =        117126.856176
Iteration:  3148   Dual objective     =        116392.467557
Iteration:  3259   Dual objective     =        115780.645362
Iteration:  3351   Dual objective     =        115477.692660
Iteration:  3455   Dual objective     =        115194.050278
Iteration:  3543   Dual objective     =        114874.122256
Removing shift (1).
LP status(1): optimal
Cplex Time: 0.78sec (det. 245.24 ticks)

Optimal solution found.
Objective :      114873.655552

--- Restarting execution
--- indus89.gms(3621) 4 Mb
--- Reading solution for model wsisn
--- indus89.gms(3621) 4 Mb
--- Executing after solve: elapsed 0:00:02.716
--- indus89.gms(3623) 5 Mb
--- GDX File C:\Users\Erwin\Documents\Embarcadero\Studio\Projects\gdx2mysql\Win32\Debug\test.gdx
--- indus89.gms(3624) 5 Mb
GDX2MySQL v 0.1
Copyright (c) 2015-2016 Amsterdam Optimization Modeling Group LLC

   GDX Library      24.6.1 r55820 Released Jan 18, 2016 VS8 x86 32bit/MS Windows
   GDX:Input file: test.gdx
   GDX:Symbols: 281
   GDX:Uels: 245
   GDX:Loading Uels
   SQL:Selected driver: MySQL ODBC 5.3 ANSI Driver
   SQL:Connection string: Driver={MySQL ODBC 5.3 ANSI Driver};Server=localhost;User=xxx;Password=xxx
   SQL:RDBMS: MySQL Community Server (GPL)
   SQL:RDBMS version: 5.6.26-log
   -----------------------
   z (9 records)
   pv (4 records)
   pv1 (3 records)
   pv2 (2 records)
   pvz (18 records)
   cq (18 records)
   cc (10 records)
   c (15 records)
   cf (2 records)
   cnf (13 records)
   t (2 records)
   s (4 records)
   w (4 records)
   g (2 records)
   gf (1 records)
   gs (1 records)
   t1 (3 records)
   r1 (14 records)
   dc (6 records)
   sa (4 records)
   wce (2 records)
   m1 (15 records)
   m (12 records)
   wcem (12 records)
   sea (2 records)
   seam (12 records)
   sea1 (3 records)
   sea1m (24 records)
   ci (4 records)
   p2 (2 records)
   a (3 records)
   ai (6 records)
   q (3 records)
   nt (2 records)
   is (21 records)
   ps (1 records)
   isr (1 records)
   land (3041 records)
   tech (385 records)
   bullock (1340 records)
   labor (3151 records)
   water (2520 records)
   tractor (783 records)
   sylds (803 records)
   fert (240 records)
   fertgr (15 records)
   natyield (15 records)
   yldprpv (41 records)
   yldprzs (128 records)
   yldprzo (14 records)
   growthcy (54 records)
   weedy (113 records)
   graz (16 records)
   yield (502 records)
   growthcyf (135 records)
   iolive (153 records)
   sconv (20 records)
   bp (12 records)
   cnl (43 records)
   pvcnl (43 records)
   gwfg (63 records)
   comdef (1032 records)
   subdef (63 records)
   zsa (63 records)
   gwf (38 records)
   carea (97 records)
   evap (516 records)
   rain (508 records)
   divpost (639 records)
   gwt (225 records)
   dep1 (162 records)
   dep2 (120 records)
   depth (516 records)
   efr (508 records)
   eqevap (516 records)
   subirr (516 records)
   subirrfac (9 records)
   n (37 records)
   i (9 records)
   nc (43 records)
   nn (51 records)
   ni (9 records)
   nb (36 records)
   ncap (27 records)
   lloss (19 records)
   lceff (19 records)
   cd (2 records)
   rivercd (48 records)
   riverb (1369 records)
   s58 (2 records)
   infl5080 (247 records)
   tri (205 records)
   inflow (82 records)
   trib (66 records)
   rrcap (11 records)
   rulelo (32 records)
   ruleup (60 records)
   revapl (45 records)
   pow (4 records)
   pn (2 records)
   v (27 records)
   powerchar (214 records)
   rcap (3 records)
   rep7 (148 records)
   rep8 (214 records)
   p3 (4 records)
   prices (38 records)
   finsdwtpr (35 records)
   ecnsdwtpr (35 records)
   p1 (7 records)
   p11 (2 records)
   pri1 (14 records)
   wageps (24 records)
   twutil (2 records)
   totprod (93 records)
   farmcons (75 records)
   demand (97 records)
   elast (13 records)
   growthrd (18 records)
   consratio (15 records)
   natexp (4 records)
   explimit (17 records)
   exppv (8 records)
   expzo (17 records)
   sr1 (2 records)
   zwt (63 records)
   eqevapz (108 records)
   subirrz (108 records)
   efrz (107 records)
   resource (136 records)
   cneff (43 records)
   wceff (516 records)
   tweff (516 records)
   cneffz (9 records)
   tweffz (108 records)
   wceffz (108 records)
   fleffz (9 records)
   canalwz (180 records)
   canalwrtz (180 records)
   gwtsa (273 records)
   gwt1 (83 records)
   ratiofs (15 records)
   ftt (7 records)
   res88 (63 records)
   croparea (114 records)
   growthres (41 records)
   orcharea (9 records)
   orchgrowth (9 records)
   scmillcap (9 records)
   cnl1 (39 records)
   postt (90 records)
   protarb (82 records)
   psr (1 records)
   psr1 (1 records)
   z1 (9 records)
   cn (13 records)
   ccn (11 records)
   qn (2 records)
   ncn (2 records)
   ce (5 records)
   cm (1 records)
   ex (15 records)
   techc (131 records)
   tec (106 records)
   divnwfp (12 records)
   rval (6 records)
   fsalep (16 records)
   misc (7 records)
   seedp (14 records)
   wage (12 records)
   miscct (15 records)
   esalep (16 records)
   emisc (7 records)
   eseedp (14 records)
   ewage (12 records)
   emiscct (15 records)
   importp (1 records)
   exportp (5 records)
   wnr (2502 records)
   beta (97 records)
   alpha (117 records)
   p (20 records)
   pmax (117 records)
   pmin (117 records)
   qmax (97 records)
   qmin (97 records)
   incr (97 records)
   ws (1940 records)
   rs (1940 records)
   qs (1940 records)
   endpr (2340 records)
   acost (15 records)
   ppc (30 records)
   x (649 records)
   animal (45 records)
   prodt (240 records)
   proda (221 records)
   import (9 records)
   export (45 records)
   consump (324 records)
   familyl (216 records)
   hiredl (180 records)
   itw (9 records)
   tw (96 records)
   itr (18 records)
   ts (180 records)
   f (612 records)
   rcont (444 records)
   canaldiv (516 records)
   cnldivsea (86 records)
   prsea (4 records)
   tcdivsea (2 records)
   wdivrz (180 records)
   slkland (180 records)
   slkwater (180 records)
   artfod (30 records)
   artwater (180 records)
   artwaternd (444 records)
   nat (2080 records)
   cost (15 records)
   conv (104 records)
   demnat (131 records)
   ccombal (195 records)
   qcombal (45 records)
   consbal (135 records)
   laborc (180 records)
   fodder (30 records)
   protein (30 records)
   grnfdr (30 records)
   bdraft (180 records)
   brepco (15 records)
   tdraft (180 records)
   trcapc (108 records)
   twcapc (96 records)
   landc (180 records)
   orchareac (9 records)
   waterbaln (180 records)
   watalcz (180 records)
   subirrc (84 records)
   nbal (432 records)
   watalcsea (78 records)
   divsea (2 records)
   divcnlsea (86 records)
   watalcpro (4 records)
   prseaw (4 records)
   nwfpalc (12 records)
   scalarparameters (24 records)
   scalarvariables (1 records)
   scalarequations (4 records)
   Done (29.0 seconds)
*** Status: Normal completion
--- Job indus89.gms Stop 04/23/16 16:29:04 elapsed 0:00:31.762