## Tuesday, February 24, 2009

### Timing of Sparse Matrix Multiplication in Mathprog, Ampl and GAMS

> In Mathprog, can I use sparse matrix multplication?
In GAMS parameters are stored sparse and also the operations are executed sparse automatically. This can make a large difference in performance (cpu time and memory usage). Below I will investigate how GAMS, AMPL and Mathprog behave on a sparse matrix multiplication.
Here is a small example of multiplying two identity matrices in AMPL or Mathprog:
``param N := 250;set I := {1..N};param A{i in I, j in I} := if (i=j) then 1;param B{i in I, j in I} := if (i=j) then 1;param C{i in I, j in I} := sum{k in I} A[i,k]*B[k,j];param s := sum{i in I, j in I} C[i,j];display s;end;``

Using the timeit.exe tool from the windows 2003 resource kit, we see:
``C:\projects\glpk\glpk\bin>timeit glpsol --math x.modReading model section from x.mod...9 lines were readDisplay statement at line 7s = 250Model has been successfully generatedProblem has no rowsVersion Number:   Windows NT 6.0 (Build 6001)Exit Time:        4:28 pm, Tuesday, February 24 2009Elapsed Time:     0:00:51.025Process Time:     0:00:50.076System Calls:     3306455Context Switches: 841545Page Faults:      191189Bytes Read:       36549676Bytes Written:    1409054Bytes Other:      120151C:\projects\glpk\glpk\bin>timeit \projects\ampl\amplcml\ampl.exe x.mods = 250Version Number:   Windows NT 6.0 (Build 6001)Exit Time:        4:28 pm, Tuesday, February 24 2009Elapsed Time:     0:00:02.571Process Time:     0:00:02.542System Calls:     104208Context Switches: 20998Page Faults:      9767Bytes Read:       59854Bytes Written:    784Bytes Other:      6240C:\projects\glpk\glpk\bin>``

I.e. Mathprog takes 50 seconds and AMPL only 3 seconds. This difference is largely due to smarter implementation in AMPL, but likely not to better sparse execution. If we fill the matrices A and B with all ones we get similar timings (50 vs 3 seconds).
When we translate this to GAMS:
``set i /1*250/;alias (i,j,k);parameter A(i,j),B(i,j),C(i,j);A(i,i) = 1;B(i,i) = 1;C(i,j) = sum(k, A(i,k)*B(k,j));scalar s;s = sum((i,j),C(i,j));display s;``

we get even better performance:
``C:\projects\glpk\glpk\bin>timeit gams.exe x.gms--- Job x.gms Start 02/24/09 16:33:00 WEX-WEI 23.0.2 x86_64/MS WindowsGAMS Rev 230  Copyright (C) 1987-2009 GAMS Development. All rights reservedLicensee: Erwin Kalvelagen                               G090213/0001CV-WIN         Amsterdam Optimization Modeling Group                      DC4572--- Starting compilation--- x.gms(9) 3 Mb--- Starting execution: elapsed 0:00:00.008--- x.gms(9) 4 Mb*** Status: Normal completion--- Job x.gms Stop 02/24/09 16:33:00 elapsed 0:00:00.031Version Number:   Windows NT 6.0 (Build 6001)Exit Time:        4:33 pm, Tuesday, February 24 2009Elapsed Time:     0:00:00.109Process Time:     0:00:00.015System Calls:     7493Context Switches: 1558Page Faults:      3959Bytes Read:       495203Bytes Written:    418566Bytes Other:      702554C:\projects\glpk\glpk\bin>``

I.e. only 0.1 seconds. This is indeed due to sparse processing of the matrix multiplication. If we fill A and B with ones then the GAMS timing becomes close to AMPL: about 3 seconds. Indeed when we look at the GAMS profiling information we see how sparsity is exploited in the different statements:
``**** SPARSE MODEL (Unity matrix)----      1 ExecInit                 0.000     0.000 SECS      3 Mb----      4 Assignment A             0.000     0.000 SECS      4 Mb    250----      5 Assignment B             0.000     0.000 SECS      4 Mb    250----      6 Assignment C             0.016     0.016 SECS      4 Mb    250----      8 Assignment s             0.000     0.016 SECS      4 Mb----      9 PARAMETER s                    =      250.000 ----      9 Display                  0.000     0.016 SECS      4 Mb**** DENSE MODEL (Matrix with ones)----      1 ExecInit                 0.000     0.000 SECS      3 Mb----      4 Assignment A             0.000     0.000 SECS      6 Mb  62500----      5 Assignment B             0.015     0.015 SECS      8 Mb  62500----      6 Assignment C             2.746     2.761 SECS     13 Mb  62500----      8 Assignment s             0.000     2.761 SECS     13 Mb----      9 PARAMETER s                    =  1.562500E+7 ----      9 Display                  0.000     2.761 SECS     13 Mb``

All the action is in the assignment to C and this is where GAMS really shows it strength. To summarize, the timings are roughly:

Model Mathprog AMPL GAMS
sparse (identity) 50 sec 3 sec 0.1 sec
dense (all ones) 50 sec 3 sec 3 sec

I suspect AMPL is not fully exploiting sparsity here. In general AMPL does however do a really good job in using sparse patterns during model generation with performance similar to GAMS. I have seen before models with Mathprog is just collapsing where the same model executes and generates very quickly under AMPL or GAMS. Nevertheless it is quite amazing how many models generate fast with Mathprog. This is likely because in many cases, although the final LP matrix is very large and sparse, substructures in the model can often be dense and relatively small. I.e. many models can be generated quite efficiently using dense technology for most operations. Only when the substructures become very large and sparse, better sparse technology -- i.e. loop over the nonzero elements only -- as used in GAMS (and AMPL) becomes necessary.