Zack Urlocker, the EVP for Products at MySQL before it was acquired by Sun (and now an executive at Sun) has written an article at InfoWorld suggesting that the recent rash of articles looking at how IBM's acquisition of SPSS is affecting the BI market dominated by SAS are downplaying the real agent of change in the space: R. According to Urlocker, the BI battle isn't between IBM and SAS, instead "the little known open source project R may be the disruptor in this billion-dollar market". He goes on:
R continues to gather momentum, just as Linux, Apache, MySQL, and JBoss have in recent years. It's disrupting the market from the bottom, attracting new users who cannot afford the expensive license fees from IBM or SAS. R claims dozens of books on Amazon about the topic, 2,000 open source packages and extensions, and an estimated million users worldwide.
Recently, an open source company that provides an optimized version of R, Revolution Computing, received an injection of capital from North Bridge Ventures and Intel. Who would be equipped to lead a company competing against billion-dollar incumbents? None other than Norman Nie, founder and former CEO of SPSS. Game on.
InfoWorld: The BI battle isn't between IBM and SAS
I greatly enjoy this blog for its substantive content. It provides a great service to the R community.
But regarding R as a real competitor to SAS, color me skeptic. SAS is routinely employed in production environments, and that's why people pay for it. It's robust. It scales on current architectures. R is neither. Try to compile it on AIX or MVS, let alone port its packages on these platforms and stress-test them with serious workloads. As a language SAS is obsolete, and as a platform it's ridiculously expensive. But people buying SAS or Cognos have a budget for BI and are not motivated by elegance. Apache, JBoss, the early Python had hundreds of literate programmers working on the core technology. R's core team is made of a dozen of statisticians-turned-programmers working part-time. First-tier young computer scientists, the unsung force behind successful OS projects, don't commit patches to R. Until a company the size of HP or Intel "adopts" R the way Google adopted Python, R will not dent the segment of corporate customers. I am sure that R's user base will expand greatly, but not at the expense of SAS/IBM/Oracle etc. Besides, as a language I am afraid R is at the end of the line, but that's for another post.
Although I wish it good luck and believe it is pushing R in the right direction, I also doubt the business viability of Revolution Computing. My direct experience is that companies using R, even some very wealthy ones, don't want to pay for value-added services (e.g., specialized packages) as much as, say, Matlab users. Selling to R users is targeting people who have implicitly signalled they have a low willingness to pay for analytical software.
Posted by: gappy | December 04, 2009 at 18:47
Thank you for very interesting web log posts.
I tend to agree with some of @gappy's observations. Some points for Revolution computing to consider:
Libraries: Choice is good, but many packages addressing the same problem increase search costs by several orders of magnitude. Typical Q's: What package is best of breed? What packages work well together?
Coherence: Collating various implementations into a reference/best-of-breed library is a large effort. Tracking innovations that pop-up from time to time, then integrating them in said reference package, is challenging.
Documentation: SAS docs are exceptional and reduce search costs. R-project could achieve this with greater consistency, by focusing on reference implementations/libraries (see above).
Language: R-project's language is great compared to previous statistical language offerings, but not great as far as computer languages go. For asynchronous network IO it is useful to write code using the 'observer pattern'... seems impossible in the R-project?
Posted by: Hedge | December 06, 2009 at 14:53
Thanks for all the insightful comments. I'll follow up in a separate blog post.
Posted by: David Smith | December 07, 2009 at 10:12
The R programming language and environment has gained a lot of attention lately for its ability to perform complicated analysis and visualization without the need to know the inner workings of the computing systems used to process the data. It's great!!!
Posted by: Torrent Search Engine | February 25, 2010 at 06:28