by Lixun Zhang, Data Scientist at Microsoft
As a data scientist, I have experience with R. Naturally, when I was first exposed to Microsoft R Open (MRO, formerly Revolution R Open) and Microsoft R Server (MRS, formerly Revolution R Enterprise), I wanted to know the answers for 3 questions:
- What do R, MRO, and MRS have in common?
- What’s new in MRO and MRS compared with R?
- Why should I use MRO or MRS instead of R?
The publicly available information on MRS either describes it at a high level or explains the specific functions and the underlying algorithms. When they compare R, MRO, and MRS, the materials tend to be high level without many details at the functions and packages level, with which data scientists are most familiar. And they don’t answer the above questions in a comprehensive way. So I designed my own tests (and the code behind the tests is available on GitHub). Below are my answers to the three questions above. MRO has an optional MKL library and unless noted otherwise the observations hold true, whether MKL is installed on MRO or not.
What do R, MRO, and MRS have in common?
After installing R, MRO, and MRS, you'll notice that everything you can do in R can be done in MRO or MRS. For example, you can use glm() to fit a logistic regression and kmeans() to carry out cluster analysis. As another example, you can install packages from CRAN. In fact, a package installed in R can be used in MRO or MRS and vice versa if the package is installed in a library tree that's shared among them. You can use the command .libPaths() to set and get library trees for R, MRO and MRS. Finally, you can use your favorite IDEs such as RStudio and Visual Studio with RTVS for R, MRO or MRS. In other words, MRO and MRS are 100% compatible with R in terms of functions, packages, and IDEs.
What’s new in MRO and MRS compared with R?
While everything you do in R can done in MRO and MRS, the reverse is not true, due to the additional components in MRO and MRS. MRO allows users to install an optional math library MKL for multithreaded performance. This library shows up as a package named "RevoUtilsMath" in MRO.
MRS comes with more packages and functions than R. From the package perspective, most of the additional ones are not on CRAN and are available only after installing MRS. One such example is the RevoScaleR package. MRS also installs the MKL library by default. As for functions, MRS has High Performance Analysis (HPA) version of many base R functions, which are included in the RevoScaleR package. For example, the HPA version of glm() is rxGlm() and for kmeans() it is rxKmeans(). These HPA functions can be used in the same way as their base R counterparts with additional options. In addition, these functions can work with a special data format (XDF) that's customized for MRS.
Why should I use MRO or MRS instead of R?
In a nutshell, MRS solves two problems associated with using R: capacity (handling the size of datasets and models) and speed. And MRO solves the problem associated with speed.
The following table summarizes the performance comparisons for R, MRO, and MRS. In terms of capacity, using HPA in MRS increases the size of data that can be analyzed. From the speed perspective, certain matrix related base R functions can perform better in MRO and MRS than base R due to MKL. The HPA functions in MRS perform better than their base R counterparts for large datasets. More details on this comparison can be found in the notebook on GitHub.
It should be noted that while there are packages such as “bigmemory” and “ff” that help address some of the big data problems, they were not included in the benchmark tests.
The takeaway for data scientists
For data scientists trying to determine which of these platforms should be used under different scenarios, the following table can be used as a reference. Depending on the amount of data and the availability of MRS's HPA functions, the table summarizes scenarios where R, MRO, and MRS can be used. It can be observed that whenever R can be used, MRO can be used with the additional benefit of multi-thread computation for certain matrix related computations. And MRS can be used whenever R or MRO can be used and it allows the possibility of using HPA functions that provide better performance in terms of both speed and capacity.
Follow the link below for my in-depth comparison of R, MRO and MRS.
Lixun Zhang: Introduction to Microsoft R Open and Microsoft R Server
The only difference between official R and MRO is Intel's MKL? Then one can simply stay with the official R implementation and use any of the optimized BLAS/LAPACK alternatives (Atlas, OpenBLAS, etc.). You can even use Intel's MKL with base R: https://software.intel.com/en-us/articles/using-intel-mkl-with-r
You should add in your table, next to "base R" (with the reference BLAS implementation), an extra columns with "base R + ATLAS", "base R + OpenBLAS" or "base R + MKL".
That may be the reason why the official product descriptions tend to be "high level without many details". Seriously, guys, can't you explain more in depth the product you're distributing?
Posted by: Sergi | April 27, 2016 at 01:55
I'm an amateur with R. However, like almost all on the planet, I have long experience with Microsoft. That company is not innovative and has displayed no inclination to change. I think that Sergi has an extremely good point. My first response is to worry that users will start to glom onto MS products and the reservoir of innovative talent I associate with R and the CRAN will shrink. I'd like to also know from all you older wiser R'ers, is an open source alternative to MRS available?
Thanks all.
Posted by: Larry Field | April 27, 2016 at 07:45
Hi Sergi, there's lots more detail on MRO available, at its homepage of mran.microsoft.com. A good place to start is the About Microsoft R Open page.
Anyone of course is welcome to build R with MKL or any other multithreaded BLAS. If you've tried to do it though, you'll know it can be a very tricky process. One of the goals of MRO is to provide the community with an easy-to-install binary distribution of multi-threaded R.
Posted by: David Smith | April 27, 2016 at 09:32
Hi Larry, I have to respectfully disagree with your statement that Microsoft "is not innovative and has displayed no inclination to change". I've been with Microsoft for a year now, and I've been very pleasantly surprised by the innovation Microsoft has been doing with R. (See my R at Microsoft talk for some examples.) As for change, speaking with the old-timers its very clear that Microsoft is a very different company than it used to be. The support for open source projects and practices is one very clear example.
Posted by: David Smith | April 27, 2016 at 09:45
Hi David, from the comfort of my debian distribution, where switching between BLAS implementations is just a simple call to update-alternatives, I see your point and apreciate Microsoft provides a working R binary distribution linked to the MKL.
Still, the comparison table under "Why should I use MRO or MRS instead of R?" is not fair and could be misleading.
From About Microsoft R Open, after clearing all verbosity, I can only read MRO is just the R implementation from the r-project plus a third party optimized BLAS and the checkpoint package. Is it what you call innovation? Wait, the R+D guys in Oracle had the very same idea!
Anyways, wouldn't be fair from my side if I didn't point out that other products from Microsoft may still be great.
Posted by: Sergi | April 28, 2016 at 01:41
A science scientist's perspective on MRO:
I do science with non-generic data, ranging from water quality and toxicology with below-detection-limits to very large rasters and vector spatial data. Thus, I use a large number of domain-specific packages, including a handful not on CRAN or r-forge or Bioconductor, and some computations that require cpu-weeks, or, more accurately, core-weeks.
I've run MRO 3.2.2 alongside R 3.2.3 for several months on both win7 and win10. [I haven't loaded MRO on linux because I see no need for it there, and I'm not about to try to get packages using cuda GPU cores working under MRO. I have productive work to do.]
I was skeptical, but I like MRO, and will continue to run it alongside CRAN R. No, MRO doesn't do anything I can't do by other means, but the simple binary installer with a BLAS that uses multiple cores for some computations is convenient, especially for recommending to colleagues running R on Windows who aren't going to build R from binaries. My preferred configurations for largish (spatial) datasets are PostgreSQL + R, or SQLite + R, but the first is prohibited on my work machine and the latter required an individual exemption from IT. IT in government & corporate settings are _much_ more likely to approve MRO than the alternatives.
Contra the above assertion, there are several packages that don't work with MRO. The big use-case where I work is RODBC: we're stuck with 32-bit MS office on 64-bit win7, and most colleagues can't install 64-bit ODBC drivers for Access. Yes that's a dumb constraint (and my life would be better without Access), but that's life as a government scientist.
I view MRS as a plausible replacement & upgrade for MS SQL Server Reporting Services. Our lead database contractor asserted that SSRS could do everything we need with our data. I politely gave him WQ data needing seasonal Mann-Kendall tests, bird point counts with imperfect detection needing hierarchical models of detection & occupancy, and adaptive cluster sampling data needing bootstrapping, and a couple of complex figures from ggplot2. He conceded my point, and opened remote ODBC access to the repository databases. But now, if they license MRS, the required analyses (to say nothing of solid graphs) can be done server-side when it makes sense and client-side in R when the bandwidth is there to ship the data.
So I completely agree with Sergi that MRO isn't particularly "innovative", and add R inside PostgreSQL to Oracle, SAP/HANA and other prior equivalents to MRS. But MRO is useful. I don't see MRO ever pulling developers away from CRAN R to write MRO-specific packages (except for the Revolutions folks who have MRS-specific tools as their business model). As long as MS supports the (corporate) R consortium, I see MRO/MRS & Microsoft's acquisition of RevolutionAnalytics as a (minor) positive.
My next test will be the lag between R 3.3 on CRAN (tomorrow) and an update to MRO from 3.2.2. I'm fine with not tracking the 3rd digit versions, but I will be happy if Microsoft's resources let the Revolution folks keep MRO much closer to the current R versions than they could before the MS acquisition.
Posted by: Tom 2 | May 02, 2016 at 10:49
update: I see MRO is now 3.2.4, so that's a good sign in terms of keeping relatively current and in sync with CRAN R versions, which is important when I have to install new non-CRAN packages only available for the latest version...
Posted by: Tom 2 | May 02, 2016 at 12:45