The past couple of years have seen a dramatic growth in the use of the R language in the enterprise. R has always been pervasive in academia for research and teaching in statistics and data science, and as new graduates trained in R have migrated to the workplace the demand for R in corporations has become more and more intense.
Database vendor Oracle estimates that "R has attracted over two million users since its introduction". James Kobielus, noted Forrester analyst and predictive analytics expert, recently said in PCWorld that "R has become a real ubiquitous force in advanced analytics. It's everywhere. Enterprise adoption of it has been growing steadily. When we ask our customers what they're using for statistical modeling they'll say SAS or [IBM's] SPSS, but they increasingly say R in the same breath."
With rapid growth of the use of R in the enterprise comes a corresponding increase in demand for enterprise support for R from its users, and demand for integration of R into corporate systems from IT (both areas in which Revolution Analytics provides software and expertise). Most large organizations have a sophisticated infrastructure devoted to data analysis, with an "analytics stack" of software to provide data warehousing and query, predictive analytics, reporting, presentation, and Business Intelligence (BI). As a result software vendors at every layer of this stack have added functionality to integrate R, accommodating demand from both users and IT, and to serve the needs of data-driven business decision makers. Let's take a look at some of the applications within the analytics stack which now provide integration with R.
The Data Layer
The data layer is where the lifeblood of the analysis — the data — is stored and prepared. Especially for high-performance and big-data applications, analytics based in R can benefit from the infrastructure that the data layer provides. The IBM blog has a great post with an in-depth discussion about integrating R with the data layer.
IBM Netezza, the high-performance data-warehousing appliance, is integrated with R (in partnership with Revolution Analytics). R users can use Revolution R Enterprise to run massively-parallel computations in R within the IBM Netezza appliance and implement high-performance, big-data analytics (with high-frequency financial data, for example). [Update: a free webinar on February 29 will describe the integration between IBM Netezza and Revolution R Enterprise in detail.]
Oracle announced last year a forthcoming connection between R and Oracle, which was made available in February 2012. Oracle R Enterprise is aimed at statisticians who are "don't know SQL" and are "not familiar with DBA tasks". It is available as part of the Oracle Advanced Analytics option (priced at around $23,000 per core), and provides a "transparency layer" for with functions to connect to Oracle and use R functionality in the Oracle database. Oracle also maintains the open-source ROracle package which provides similar functionality for open-source R.
Cloudera's Distribution Including Apache Hadoop provides support for R in partnership with Revolution Analytics. This connection makes it possible to manipulate Hadoop data stores in R directly from HDFS and HBASE, and give R programmers the ability to write MapReduce jobs in R using Hadoop Streaming.
IBM BigInsights, the Hadoop platform from IBM, is also integrated with R and Revolution R Enterprise. BigInsight queries can make use of the Map-Reduce construct while running R computations in parallel.
Teradata's Enterprise Data Warehousing platform provides in-database analytics using R via the free teradataR package. This package allows R users to connect to Teradata, create data frames linked to Teradata and to call in-database analytic functions. The Teradata Aster MapReduce Platform also provides integration with R.
Sybase RAP, the edition of the Sybase database for financial data, provides integration with R. Providing the R language alongside Sybase RAP allows for faster algorithm development and extensive backward testing on historical data. Sybase also regularly highlights R integration in its financial newsletters and webinars.
SAP HANA provides in-database analytics based on R with its SAP BusinessObjects Predictive Analysis module.
The Analytics Layer
The Analytics layer is where the magic happens: statistical modeling, predictive analytics and custom data visualization. Fed with (usually) structured data sourced from the Data Layer, R is widely used here to categorize, predict, and generally provide insight into corporate data stores. In many organizations, older data analysis tools remain in use, and so interfaces to R have been added provide support for analysts and data scientists who prefer to use R and to fill in the gaps of these legacy tools with modern, high-performance analytics.
Revolution Analytics is the leading commercial organization focused on software and support for R. Its Revolution R Enterprise software extends open-source R with productivity interfaces, high-performance statistical computing, big-data analytics, and enterprise integration of R.
SAS has been a statistical analysis workhorse since the early 70's. Now, with so many graduates in statistics trained in R instead of SAS, SAS has introduced the ability to call R from SAS/IML. (It's also possible to call R directly from base SAS thanks to a free package developed at Roche Pharmaceuticals.) SAS JMP, the point-and-click data analysis package, now also provides support for R.
IBM SPSS Statistics, the popular desktop data analysis software known simply as SPSS before being acquired by IBM in 2010, provides integration to R via the Statistics Programmability Extension module.
RStudio, an open-source software company, provides an integrated development environment for developing code in the R language.
Matlab, a numerical computing language used by engineers, also offers the ability to call R from Matlab on Windows.
Zementis software allows models created with R to be scored on massive data sets using the ADAPA Decision Engine and Revolution R Enterprise.
The Presentation Layer
Data analysis makes the most impact in the enterprise when it can be readily acted upon by decision makers: often, business executives not steeped in the arcana of data warehousing or statistical analysis. As a result, many reporting and business intelligence tools now make it possible to make it possible to incorporate the resuts of analyses generated in R in the presentation layer, in a format tuned to the needs of a business audiecne.
Jaspersoft's business intelligence software makes it possible to incorporate the results of R-based analytics into BI dashboards and reports, via integration with Revolution R Enterprise.
TIBCO Spotfire's interactive business intelligence dashboards make it possible to share results and models from R.
You might not think of Microsoft Excel as more of a spreadsheet than a presentation tool, but it is very widely used on the desktop as a "container" for static and interactive reports based on statistical analysis. While Excel does not have out-of-the-box integration with R, is is possible to integrate R-based computation into Excel spreadsheets via RExcel and Revolution Analytics' RevoDeployR web services API.
R: Integrated throughout the analytics stack
As you can see, for organizations who need to create advanced analytics applications, R is integrated throughout the analytics stack: for data access, for presentation of results, and of course for the statistical analysis process itself. This degree of integration by so many companies is indicative of the level of demand for R throughout the enterprise. As the leading provider of commercial software and support for R, Revolution Analytics supports R users througout the organization, helps IT departments integrate Revolution R Enterprise throughout the analytics stack, for high-performance and big data applications based on the R language.
[Update Mar 15 2012: Added SAP HANA to the list.]
Revolution R Enterprise: production-grade analytics software built upon the powerful open source R statistics language.
You forgot to mention Greenplum, which can run embedded R via PL/R, just as PostgreSQL can. (See http://www.bostongis.com/PrinterFriendly.aspx?content_name=postgresql_plr_tut01 for an intro.)
Posted by: ZS | February 27, 2012 at 12:27
The SEO optimization on this page is distracting. Competitors SAS and SPSS are linked to their Wikipedia pages, but Revolution Analytics is linked to its home page. It seems www.r-project deserves a link somewhere, but even the generic "R language" links to Revolution Analytics.
Otherwise, good article, and I appreciate Revolution Analytics's investments in open source R such as doSMP, doMC, and doSNOW. I use both R and SAS, and I've published some small code to transfer neural network and decision tree models trained in R to score in SAS data steps. It's nice not to have to worry about integration and scaling issues as these companies are working on.
Posted by: Andrew | February 27, 2012 at 14:27
Again (I really should read all the comments first), too bad you won't mention PL/R, which is everything the Oracle's $23,000 might be. PL/R will work with any extended (well, should anyway) Postgres; Netezza comes to mind.
Oh, right. PL/R isn't for sale.
Posted by: Robert Young | February 27, 2012 at 15:52
Thanks ZS and Robert. The scope of the article was commercial vendors supporting integration with R in their products, and while I'm aware of various community efforts integrating R with Greenplum, I couldn't find an official reference on Greenplum's website to refer to. (All the entries link back to the vendor's website; Andrew, in the case of SAS and SPSS they link back to the specific R links.)
I'm actually not so familiar with PL/R and how it's used; if you could point me to some references of applications I'd be glad to take a look.
Posted by: David Smith | February 27, 2012 at 16:40
Here's Joe's page: http://www.joeconway.com/plr/
Due to the nature of base Postgres, any external language which has a C API can be integrated straightaway, which is what Joe did. Unlike the Oracle bits, which I read to say that they'll only support base R (not supporting library loading?), PL/R in Postgres can load libraries.
Some additional sites:
(3 parts, and very informative)
http://www.postgresonline.com/journal/archives/188-plr_part1.html
http://www.varlena.com/GeneralBits/Tidbits/bernier/art13mar04/graphingWithR.html
And, in the spirit of self-immolation, a PoC for a possibly interesting use:
http://www.simple-talk.com/sql/learn-sql-server/going-beyond-the-relational-model-with-data/
Posted by: Robert Young | February 27, 2012 at 17:57
Take a look at 2 packages we just added to CRAN. RJMS for enterprise support for Active MQ (message queues) and RDROOLS for R integration into JBOSS Drools rules engine
http://cran.r-project.org/web/packages/Rjms/index.html
http://cran.r-project.org/web/packages/Rdrools/index.html
Posted by: zubin | February 27, 2012 at 20:55
You may want to add Rapid-I with its data mining and business analytics solutions RapidMiner and RapidAnalytics to the list:
http://www.rapid-i.com/
The RapidMiner Extension for R seamlessly integrates R scripts into RapidMiner and RapidAnalytics analysis processes:
http://www.kdnuggets.com/2010/11/rapidminer-r-extension.html
This allows to combine the strenghts of the two most powerful and most widely used open source data mining solutions R and RapidMiner:
http://rapid-i.com/component/option,com_myblog/task,tag/category,R-Extension/Itemid,172/
The following tutorial video demonstrates how to integrate R and RapidMiner:
http://www.youtube.com/watch?v=utKJzXc1Cow
Best regards,
Trevor
Posted by: Trevor Kemmer | April 11, 2012 at 08:37
Jaspersoft's business intelligence software is the epitome of excellence in my book....
Posted by: Jack | January 24, 2013 at 23:54