« The Deepwater Horizon, in context | Main | New R User Group in Atlanta »

June 11, 2010

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a010534b1db25970b013483f6f9c5970c

Listed below are links to weblogs that reference Performance benefits of linking R to multithreaded math libraries:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

There are two minor points that perhaps should be corrected in this posting. The crossprod function calculates A'*A not A*A', which is what is calculated by the tcrossprod function. And the lm function does use the BLAS but only level-1 BLAS (vector-vector operations) not level-2 (matrix-vector) or level-3 (matrix-matrix) BLAS. Most accelerated BLAS systems concentrate on boosting the performance of level-3 BLAS.

The reason that the lm.fit and glm.fit functions (the workhorses underneath the lm and glm functions) end up calling the older Linpack routines using level-1 BLAS instead of the newer, level-3-based Lapack routines is because of the need to handle certain rank-deficient cases cleanly. The output from example(fastLm) in the RcppArmadillo package shows such a case.

The ability to handle such a case adequately is not specifically a property of level-1 BLAS versus level-3; it requires a special form of pivoting in the QR decomposition. John Chambers modified the dqrdc Linpack subroutine for this in the original S language and Ross Ihaka did similar modifications for R back in the old days. No one has done similar modifications for Lapack's QR decomposition. Every once in a while some of us in R-core discuss alternative approaches that would allow using the Lapack, level-3 BLAS code but no one has sat down and coded them up to see if they would work.

I also commented in a thread on the R-help mailing list that a naive calculation of chol(crossprod(A)) where A is 10000 by 5000 (a combination of your first two examples) is almost certain to produce useless answers. Admittedly there are occasions where one wants the Cholesky decomposition of very large, symmetric matrices - I do this all the time in the lme4 package for mixed-effects models - but you must be very careful of the structure and there must be some form of regularization employed before the answers are of any value. Without regularization (inflating the diagonal of crossprod(A) in some way) the crossproduct matrix is virtually guaranteed to be singular. The problem with the calculation you describe is that it will complete and one could go on to calculate nonsense coefficients, etc., if you did not look closely at the result.

The calculation of singular values or principal components is a more meaningful example.

I know CRAN posts alternative RBLAS.dll files for some common Intel chips (at least for Windows). Using these libraries does speed up many calculations. I wonder how this fits in with your benchmarks above, especially on an Intel Core Duo series (which seems to make up a lot of laptops, at least).

Hi,

Thanks for an interesting post. I'm using R on Mac OS X, and this info is new (and rather surprising) to me:

"The MacOS port of R on CRAN is linked to ATLAS, a "tuned" BLAS that uses multiple cores for computations."

Never have I seen CPU usage above 100% (I have four cores), and I see that as an indication that R is *not* using multithreading. Am I missing something here?

@Doug -- many thanks for the insightful comments, I've updated the post accordingly. Good point about the slightly odd Cholesky example (a function of running code for the purposes of benchmarking than actual analysis).

@Michael -- interesting, thanks. I use Revolution R on the Mac, but I had thought that the CRAN binary for R was linked to a multithreaded library by default. This warrants more investigation.

@David -- Sorry! I just did some of the examples in your blog post, and my standard R 2.11.0 indeed *does* use multithreading. I probably haven't used large enough matrix calculation myself to notice it before.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid

R links

Recommended Sites

Search Revolutions Blog