Thursday, June 25, 2009

gsl vs lapack performance

I had some doubts about the LU routines in the gsl library (GNU Scientific Library). See http://yetanothermathprogrammingconsultant.blogspot.com/2009/06/gsl-gnu-scientific-library.html. Here I try a quick experiment by inverting a square nxn matrix. As test matrix I used the Pei matrix (http://portal.acm.org/citation.cfm?id=368975). Here are the results:

library gsl lapack
routine gsl_linalg_LU_decomp +
gsl_linalg_LU_invert
dgesv
compiler cygwin gcc Lahey lf95
n=100 0.047 seconds 0.024 seconds
n=1000 6.735 seconds 2.017 seconds
n=2000 1:01 minutes 17.257 seconds

Indeed it looks that gsl is not quite competitive with lapack for this operation (although much better than figures I displayed earlier – these were wrong). I suspect this is due to how the gsl routines access many of the matrix elements: through an inlined function with the complicated expression m->data[i * m->tda + j]. I think the fast C way of doing this is to use pointers and pointer arithmetic. Another issue is cache misses. One should run through the matrix so that memory accesses as as much localized as possible.

PS. Of course it is noted that in most cases you do not want to invert a matrix, but rather solve for a given rhs.

PS2. Sorry: previous lapack timings were wrong (the input matrix was by accident zero for off-diagonal elements, that makes the inversion very simple!).

3 comments:

  1. The expression m->data[i * m->tda + j]

    does not look like a function call. It just seems to be
    an array indexing operation. Why is this so much slower than the method lapack uses? What does lapack use anyway?

    ReplyDelete
  2. The inline function is gsl_matrix_get(). Lapack uses normal array access: a(i,j).

    ReplyDelete
  3. If the function is truly inlined by the compiler (ie- it doesn't generate a new function call on the stack), then the code you quote IS just pointer arithmetic. The [] operator is syntactic sugar for pointer arithmetic.
    Thanks for the posts, I wasn't aware of GSL. Looks like they are not using LAPACK but BLAS 1 and 2 instead, which might explain the perf difference you are seeing? http://www.gnu.org/software/gsl/manual/html_node/Linear-Algebra.html

    ReplyDelete