I had some doubts about the LU routines in the gsl library (GNU Scientific Library). See http://yetanothermathprogrammingconsultant.blogspot.com/2009/06/gsl-gnu-scientific-library.html. Here I try a quick experiment by inverting a square nxn matrix. As test matrix I used the Pei matrix (http://portal.acm.org/citation.cfm?id=368975). Here are the results:
library | gsl | lapack |
routine | gsl_linalg_LU_decomp + gsl_linalg_LU_invert | dgesv |
compiler | cygwin gcc | Lahey lf95 |
n=100 | 0.047 seconds | 0.024 seconds |
n=1000 | 6.735 seconds | 2.017 seconds |
n=2000 | 1:01 minutes | 17.257 seconds |
Indeed it looks that gsl is not quite competitive with lapack for this operation (although much better than figures I displayed earlier – these were wrong). I suspect this is due to how the gsl routines access many of the matrix elements: through an inlined function with the complicated expression m->data[i * m->tda + j]. I think the fast C way of doing this is to use pointers and pointer arithmetic. Another issue is cache misses. One should run through the matrix so that memory accesses as as much localized as possible.
PS. Of course it is noted that in most cases you do not want to invert a matrix, but rather solve for a given rhs.
PS2. Sorry: previous lapack timings were wrong (the input matrix was by accident zero for off-diagonal elements, that makes the inversion very simple!).
The expression m->data[i * m->tda + j]
ReplyDeletedoes not look like a function call. It just seems to be
an array indexing operation. Why is this so much slower than the method lapack uses? What does lapack use anyway?
The inline function is gsl_matrix_get(). Lapack uses normal array access: a(i,j).
ReplyDeleteIf the function is truly inlined by the compiler (ie- it doesn't generate a new function call on the stack), then the code you quote IS just pointer arithmetic. The [] operator is syntactic sugar for pointer arithmetic.
ReplyDeleteThanks for the posts, I wasn't aware of GSL. Looks like they are not using LAPACK but BLAS 1 and 2 instead, which might explain the perf difference you are seeing? http://www.gnu.org/software/gsl/manual/html_node/Linear-Algebra.html