« Butler Analytics: Real Analysts use R | Main | Demand for R jobs on the rise, while SAS jobs decline »

August 01, 2013


Feed You can follow this conversation by subscribing to the comment feed for this post.

If I may be so bold: Data Science is about calculating the parameters of census data. Census in the technical meaning: all the individuals are measured. Baby stat books, in the first chapters, introduce the baby stat student to Descriptive Statistics. These really aren't statistics, but just the calculated parameters.

Big Data & Data Science are not engaged in statistics, by and large. They're just census takers, trying to make some sense of the tsunami of bytes they've unleashed on themselves. Big Data & Data Science are largely engaged in finding unknown needles in multitudes of haystacks. They aren't doing statistics. And statistics departments could cover what little "theory" might be involved in a week's worth of classes.

The Big Data & Data Science zealots have gotten all dewy eyed over the NSA vacuum cleaner, lining up to get some access to those Cray machines, using Map/Reduce to find some needle. Credit card companies and insurance companies and the like employ them to do similarly: find the 1 in a 1,000,000 client that might, just might, end up costing them a bundle. A complete waste of money and time and effort. Just because we can look for needles in haystacks doesn't mean that such digging is worthwhile. For the cost expended, those had better be some mighty large platinum needles.

Bah, humbug.

Thanks for the post and highlighting the latest things in R presented at the user 2013 conference. I think one thing that might help also is to have presentations from useR conferences available for those not able to attend. For example, with Python conferences (pycon and scipy), videos of several tutorials and talks are available just after the conference and serve as very useful resources (www.pyvideo.org). Even if it is hard to have videos, just the presentations will be helpful (like the ones done by R in Insurance conference organizers where the presentations are posted in github).

To some degree, Data Science is simply the v2.0 name for Data Mining. Data Mining has a (reasonably-well-deserved) reputation for being ad hoc and over-promising, so smart practitioners had to distance themselves from that

A lot of people are making lots of money right now based on the hype of Big Data, because they know the "how" (Hadoop, et al), not necessarily the "what". As we've seen with other technologies, this kind of thing will eventually become a commodity. Hadoop makes a particular class of clustered tasks easy, and Pig, Mahout, etc, build on Hadoop. Eventually, the R 'foreach' package will build on one of these technologies, and Revolution will do that and more, and the pendulum will again swing back towards the "what" instead of the "how".

At which point, we'll go to name v3.0 and all the new field Massive Analytics or something like that.

The importance of R to modern statistics cannot be over-stated. Those R core members deserve to have their a place in the Statistics Hall of Fame.

Not to mention that every so called replacement for R, I mean Julia or Python with Pandas borrowed a lot from R (DataFrames, Factors, etc.).

Nowadays, I cannot imagine statistics or data mining without R.

I want also to thank Revolution for their job in promoting R everywhere and specially in the industry (hope that we'll have a linux version of Revolution pretty soon).

One item that would help further with promoting R is to have useR presentations available online for later viewing by folks who didn't attend the conference. For example, python conferences tend to have videos of tutorials and several talks right after the conference (PyCon, SciPy etc. in www.pyvideo.org). It will be nice to have at least the presentations (if not videos) of useR conference talks. R in Insurance conference is a good example where a github repo has slides from the conference.

As a scientist with a non statistical background I was very thankful for all the R help I received from our department of Statistics.

They themselves did quite a bit of work in R and were happy to share code with us, and more importantly explain what happened. For us R helped us to handle our 'big data' from our -omics studies, so being versed in R was fairly essential in getting basic stuff done.

Their contribution on the statistical side was something we couldn't have done, but since we used the same language we could understand what they were doing and reproduce it as script kiddies ourselves.

So R avoids, for us, a black box called statistics and it allows us to be less dependent on the statistics department while having access to their toolbox and expertise.

I am MSc PhD statistician. For 20 years I have worked on real world problems, very away from the academic world. I'm a data scientist because I work with real problems and because I am statistician - if I had no academic background in statistics I could not be a data scientist. I am very grateful to the developers and contributors of R - for me it was the greatest contributions to the development of data analysis - I really cannot believe that a department of statistics would not recognize or encourage a researcher to contribute to R.

It depends what you mean by statistician. While I call myself data scientist, I am still a statistician. But I am very different from an ASA or university-trained statistician. So much different indeed, it's almost like comparing a physicist and a geographer. As a result, and to avoid confusion, I have stopped publicly calling myself a statistician, though I still do privately.

What concerned me about the article was the constant mention of SAS and only a passing reference to R. I've seen the back room lobbyist like behavior or MATLAB and SAS to stay engrained in academia, but I wonder how deep SAS has reached into the ASA.

-- I wonder how deep SAS has reached into the ASA.

"The American Statistical Association (ASA), an 18,000-member scientific and educational society based in Alexandria, VA, has elected a longtime employee of business analytics leader SAS as its future president. Robert N. Rodriguez, Senior Director of R&D, will serve as the ASA’s 107th president when his term begins on Jan. 1, 2012. Three other SAS employees were also recently elected to positions within the association."

here: http://www.businesswire.com/news/home/20100607005155/en/SAS-Director-President-American-Statistical-Association

Deep enough?

The comments to this entry are closed.

Search Revolutions Blog

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid
Get this blog via email with Blogtrottr