Tuesday, February 24, 2009

Timing of Sparse Matrix Multiplication in Mathprog, Ampl and GAMS

> In Mathprog, can I use sparse matrix multplication?
In GAMS parameters are stored sparse and also the operations are executed sparse automatically. This can make a large difference in performance (cpu time and memory usage). Below I will investigate how GAMS, AMPL and Mathprog behave on a sparse matrix multiplication.
Here is a small example of multiplying two identity matrices in AMPL or Mathprog:
param N := 250;
set I := {1..N};
param A{i in I, j in I} := if (i=j) then 1;
param B{i in I, j in I} := if (i=j) then 1;
param C{i in I, j in I} := sum{k in I} A[i,k]*B[k,j];
param s := sum{i in I, j in I} C[i,j];
display s;
end;

Using the timeit.exe tool from the windows 2003 resource kit, we see:
C:\projects\glpk\glpk\bin>timeit glpsol --math x.mod
Reading model section from x.mod...
9 lines were read
Display statement at line 7
s = 250
Model has been successfully generated
Problem has no rows

Version Number: Windows NT 6.0 (Build 6001)
Exit Time: 4:28 pm, Tuesday, February 24 2009
Elapsed Time: 0:00:51.025
Process Time: 0:00:50.076
System Calls: 3306455
Context Switches: 841545
Page Faults: 191189
Bytes Read: 36549676
Bytes Written: 1409054
Bytes Other: 120151

C:\projects\glpk\glpk\bin>timeit \projects\ampl\amplcml\ampl.exe x.mod
s = 250


Version Number: Windows NT 6.0 (Build 6001)
Exit Time: 4:28 pm, Tuesday, February 24 2009
Elapsed Time: 0:00:02.571
Process Time: 0:00:02.542
System Calls: 104208
Context Switches: 20998
Page Faults: 9767
Bytes Read: 59854
Bytes Written: 784
Bytes Other: 6240

C:\projects\glpk\glpk\bin>

I.e. Mathprog takes 50 seconds and AMPL only 3 seconds. This difference is largely due to smarter implementation in AMPL, but likely not to better sparse execution. If we fill the matrices A and B with all ones we get similar timings (50 vs 3 seconds).
When we translate this to GAMS:
set i /1*250/;
alias (i,j,k);
parameter A(i,j),B(i,j),C(i,j);
A(i,i) = 1;
B(i,i) = 1;
C(i,j) = sum(k, A(i,k)*B(k,j));
scalar s;
s = sum((i,j),C(i,j));
display s;

we get even better performance:
C:\projects\glpk\glpk\bin>timeit gams.exe x.gms
--- Job x.gms Start 02/24/09 16:33:00 WEX-WEI 23.0.2 x86_64/MS Windows
GAMS Rev 230 Copyright (C) 1987-2009 GAMS Development. All rights reserved
Licensee: Erwin Kalvelagen G090213/0001CV-WIN
Amsterdam Optimization Modeling Group DC4572
--- Starting compilation
--- x.gms(9) 3 Mb
--- Starting execution: elapsed 0:00:00.008
--- x.gms(9) 4 Mb
*** Status: Normal completion
--- Job x.gms Stop 02/24/09 16:33:00 elapsed 0:00:00.031

Version Number: Windows NT 6.0 (Build 6001)
Exit Time: 4:33 pm, Tuesday, February 24 2009
Elapsed Time: 0:00:00.109
Process Time: 0:00:00.015
System Calls: 7493
Context Switches: 1558
Page Faults: 3959
Bytes Read: 495203
Bytes Written: 418566
Bytes Other: 702554

C:\projects\glpk\glpk\bin>


I.e. only 0.1 seconds. This is indeed due to sparse processing of the matrix multiplication. If we fill A and B with ones then the GAMS timing becomes close to AMPL: about 3 seconds. Indeed when we look at the GAMS profiling information we see how sparsity is exploited in the different statements:
**** SPARSE MODEL (Unity matrix)
---- 1 ExecInit 0.000 0.000 SECS 3 Mb
---- 4 Assignment A 0.000 0.000 SECS 4 Mb 250
---- 5 Assignment B 0.000 0.000 SECS 4 Mb 250
---- 6 Assignment C 0.016 0.016 SECS 4 Mb 250
---- 8 Assignment s 0.000 0.016 SECS 4 Mb

---- 9 PARAMETER s = 250.000
---- 9 Display 0.000 0.016 SECS 4 Mb

**** DENSE MODEL (Matrix with ones)

---- 1 ExecInit 0.000 0.000 SECS 3 Mb
---- 4 Assignment A 0.000 0.000 SECS 6 Mb 62500
---- 5 Assignment B 0.015 0.015 SECS 8 Mb 62500
---- 6 Assignment C 2.746 2.761 SECS 13 Mb 62500
---- 8 Assignment s 0.000 2.761 SECS 13 Mb

---- 9 PARAMETER s = 1.562500E+7
---- 9 Display 0.000 2.761 SECS 13 Mb

All the action is in the assignment to C and this is where GAMS really shows it strength. To summarize, the timings are roughly:

Model Mathprog AMPL GAMS
sparse (identity) 50 sec 3 sec 0.1 sec
dense (all ones) 50 sec 3 sec 3 sec

I suspect AMPL is not fully exploiting sparsity here. In general AMPL does however do a really good job in using sparse patterns during model generation with performance similar to GAMS. I have seen before models with Mathprog is just collapsing where the same model executes and generates very quickly under AMPL or GAMS. Nevertheless it is quite amazing how many models generate fast with Mathprog. This is likely because in many cases, although the final LP matrix is very large and sparse, substructures in the model can often be dense and relatively small. I.e. many models can be generated quite efficiently using dense technology for most operations. Only when the substructures become very large and sparse, better sparse technology -- i.e. loop over the nonzero elements only -- as used in GAMS (and AMPL) becomes necessary.