by Andrie de Vries
Last week we announced the availability of Revolution R Open, an enhanced distribution of R. One of the enhancements is the inclusion of high performance linear algebra libraries, specifically the Intel MKL. This library significantly speeds up many statistical calculations, e.g. the matrix algebra that forms the basis of many statistical algorithms.
Several years ago, David Smith wrote a blog post about multithreaded R, where he explored the benefits of the MKL, in particular on Windows machines.
In this post I explore whether anything has changed.
What is the MKL?
To best use the power available in the machines of today, Revolution R Open is installed by default with the Intel Math Kernel Library (MKL), which provides BLAS and LAPACK library functions used by R. Intel MKL makes it possible for so many common R operations to use all of the processing power available.
The MKL's default behavior is to use as many parallel threads as there are available cores. There’s nothing you need to do to benefit from this performance improvement — not a single change to your R script is required.
However, you can still control or restrict the number of threads using the setMKLthreads()
function from the Revobase
package delivered with Revolution R Open. For example, you might want to limit the number of threads to reserve some of the processing capacity for other activities, or if you’re doing explicit parallel programming with the ParallelR suite or other parallel programming tools.
You can set the maximum number of threads as follows:
setMKLthreads(<value>)
Where the <value>
is the maximum number of parallel threads, not to exceed the number of available cores.
Testing the MKL on matrix operations
Compared to open source R, the MKL offers significant performance gains, particularly on Windows.
Here are the results of 5 tests on matrix operations, run on a Samsung laptop with an Intel i7 4-core CPU. From the graphic you can see that a matrix multiplication runs 27 times faster with the MKL than without, and linear discriminant analysis is 3.6 times faster.
You can replicate the same tests by using this code:
---
---
Simon Urbanek's benchmark
Another famous benchmark was published by Simon Urbanek, one of the members of R-core. You can find his code at Simon's benchmark page. His benchmark consists of three different classes of test:
- Matrix calculation
- This includes tests for creation of vectors, sorting, computing the cross product and linear regression.
- In this category, the MKL substantially speeds up calculation of cross product (~26x) and linear regression (~20x)
- Matrix functions
- This category includes test for computations that heavily involves matrix manipulation, e.g. fast fourier transforms, computing eigen values and the matrix inverse.
- On some of these, the MKL really shines, notably for cholesky decomposition (~16x).
- Programmation
- The final category includes tasks such as looping and recursion
- These functions do not activate any mathematical functionality, and thus the MKL makes no difference at all
I compared the total execution time of the benchmark script in RRO (with MKL) and R. Using Revolution R Open, the benchmark tests completed in 47.7 seconds. This compared to ~176 seconds using R-3.1.1 on the same machine.
To replicate these results you can use the following script runs (sources) his code directly from the URL and captures the total execution time:
---
---
Detailed results
Here is a summary of each of the individual tests:
R-3.1.1 | RRO | Performance gain | |
I. Matrix calculation | |||
Create, transpose and deform matrix | 1.01 | 1.01 | 0.0 |
Matrix computation | 0.40 | 0.40 | 0.0 |
Sort random values | 0.72 | 0.74 | 0.0 |
Cross product | 11.50 | 0.42 | 26.4 |
Linear regression | 5.56 | 0.25 | 20.9 |
II. Matrix functions | |||
Fast Fourier Transform | 0.45 | 0.47 | 0.0 |
Compute eigenvalues | 0.74 | 0.39 | 0.9 |
Calculate determinant | 2.87 | 0.24 | 10.8 |
Cholesky decomposition | 4.50 | 0.25 | 16.8 |
Matrix inverse | 2.71 | 0.25 | 9.9 |
III. Programmation | |||
Vector calculation | 0.67 | 0.67 | 0.0 |
Matrix calculation | 0.26 | 0.26 | 0.0 |
Recursion | 0.95 | 1.06 | -0.1 |
Loops | 0.43 | 0.43 | 0.0 |
Mixed control flow | 0.41 | 0.37 | 0.1 |
Total test time | 165.60 | 47.72 | 2.5 |
Conclusion and caveats
The Intel MKL makes a notable difference for many matrix computations. When running the Urbanek benchmark using the MKL on Windows, you can expect a performance gain of ~2.5x.
The caveat is that different the standard R distribution on different operating systems use different math libraries. For example, R on Mac OSx uses the ATLAS blas, which gives you comparable performance to the MKL.
Download Revolution R Open
To find out more about Revolution R Open, go to http://mran.revolutionanalytics.com/open/
You can download RRO at http://mran.revolutionanalytics.com/download/
OpenBLAS on Linux and Windows also gives comparable performance.
Posted by: Shige Song | October 22, 2014 at 12:15
This looks very interesting but it feels a bit of a cheat to not compare it to atlas/openblas. I am unlikely to invest the time to try it out until I see benchmarks for these so it would be great if you could run these for those of us who already use optimised linear algebra libraries with R.
Thanks
Posted by: Tim Webber | October 23, 2014 at 01:10
@Tim, Domino Data Labs ran some benchmarks on Linux comparing to a different multi-threaded BLAS (though they didn't specify which).
Posted by: David Smith | November 10, 2014 at 08:47