by Uday Tennety: Director, Advanced Analytics Services at Revolution Analytics
The R ecosystem has become widely popular lately with large players such as Pivotal, Tibco, Oracle, IBM, Teradata and SAP integrating R into their product suites. All these big players are using value chain integration and platform envelopment strategies to build a network effect in order to gain maximum leverage against their competitors in the Big Data and Analytics space.
Big Data movement has gained a lot of traction in the enterprise space, and the ecosystem is rapidly evolving. The end goal for most enterprises is not to collect, store and manage data, but to obtain new business insights through predictive modeling and analytics. With this objective in mind, many enterprise software vendors have embraced R for their analytics story.
Below is my analysis on the various distributions of R provided by large Enterprise software vendors along with their integration strategies:
Oracle R Enterprise
Oracle R Distribution is Oracle's free distribution of open source R. Oracle R Enterprise integrates Oracle R Distribution/ R, the open source scripting language and environment, with Oracle Database.
Oracle R Enterprise primarily introduces a variant to many R data types by overloading them in order to integrate Oracle database with R. But, the names of the Oracle R Enterprise data types are the same as the names of corresponding R data types prefixed by "ore".
Oracle’s strategy with Oracle’s R Enterprise is to provide in-database analytics capabilities for its widely adopted enterprise RDBMS, and for its Exadata appliance.
Tibco’s TERR
Tibco with its acquisition of S+ technology from Insightful in 2008, built its own distribution of R called TERR. TERR has been built from ground up in C++, and their team has redesigned data object representation by implementing those objects using abstract C++ classes.
Also, TERR claims to provide better performance and memory management compared to open source R. According to the company sources, TERR is compatible with open source R, and runs analytics by loading data in memory. But TERR does not yet support data from disk, streaming and database sources, and the company plans to support them sometime in future.
Tibco has recently integrated TERR with their data visualization tool, Spotfire, to make it easy for enterprises choosing Spotfire to run an R based analytics tool.
PivotalR
Pivotal was officially launched on April 1, 2013, as EMC decided to group together a set of EMC, VMware and Pivotal Lab’s products to offer a differentiated Enterprise grade big data platform.
Pivotal’s strategy with R is very similar to Oracle’s strategy with their Oracle R Enterprise. PivotalR is a package that enables users of R to interact with the Pivotal (Greenplum) Database as well as Pivotal HD and HAWQ for Big Data analytics. It does so by providing an interface to the operations on tables and views in the database. PivotalR also claims to provide parallel and distributed computation ability on Pivotal for big data analytics.
It also provides a wrapper for MADlib, which is an open-source library for parallel and scalable in-database analytics.
SAP
SAP has integrated R with their in-memory database, HANA, to allow usage of R for specific statistical functions. But, SAP does not ship the R environment with SAP HANA database, nor does it provide support for R. In order to use the SAP HANA integration with R, one needs to download R from CRAN and configure it. Also, an Rserve configuration is needed for this integration to work.
SAP’s strategy for integrating HANA with R is to provide a well-known and robust environment for advanced data analysis, while providing a support mechanism in HANA for specific statistical functions.
Teradata
Teradata has partnered with Revolution Analytics to provide a platform that brings parallelized analytical algorithms to the data. Revolution R Enterprise 7 for Teradata includes a library of Parallel External Memory Algorithms (PEMAs) that run directly in parallel on the Teradata nodes.
This strategy provides a scalable solution to run analytical algorithms in-database, in parallel, by bringing analytics to the data in its true sense.
IBM
Similar to Teradata, IBM has also partnered with Revolution Analytics to provide advanced data analysis capabilities to its PureData System for Analytics platform (fka Netezza). Revolution R Enterprise for PureData System for Analytics, enables the execution of advanced R computations for rapid analysis of hundreds of petabyte-class data volumes.
Today, businesses are scrambling to build IT infrastructure to extract value from all the data available to them. They are afraid that their competitors might get there first and gain a competitive advantage. In short, enterprises are now in a Big Analytics arms race. With strong partners, a powerful community and with a promise of an easy-to-integrate solution, R is in a great position to capitalize on the Big Data and Analytics revolution.
With the latest release, IBM SPSS Modeler 16 now includes a direct integration with R meaning users don't have to choose between the flexibility that R offers and the robustness, scalability, productivity and enterprise analytic deployment options available with IBM SPSS Modeler. R code can be dropped in at any point in the normal Modeler Stream workflow as required. The Modeler-R integration can leverage the IBM PureData System (Netezza), SAP Hana and Oracle R engines, and via IBM SPSS Analytic Server MapReduce in a Hadoop Distributed file system.
IBM SPSS Statistics has been integrated with R programmability since version 16, which was released over six years ago.
Posted by: Sarah Dunworth | January 08, 2014 at 06:18