« A conversation with Robert Scoble | Main | Because it's Friday: ASCII fluid simulator »

December 06, 2013

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

SO questions are a poor proxy for usage. If you employ your stats skills to normalize for sample bias, I think you'll be surprised. Python is just plain easier to use without outside assistance... more questions are answered by the docs. And the code is more comprehensible and readable (since this was the goal for Python as a language and this was not the primary goal during design of R).

I'm a huge Python lover, but this is really just Matt talking about something he has no clue about. As I said in a tweet [1] in the Twitter thread and I stand to this:

"Sorry to say this but this is a typical 'God knows everything and @mjasay knows everything better' ;)"

Cheers,
Michael

[1] https://twitter.com/mhausenblas/status/405941892187041793

While the statement of growth close to being exponential is true, I think it is true for only one of the above ..... I have to agree with Matt Assay, the one with exponential growth will replace the other in time .... looking around the innovation ..... unhindered by "grumpy" ..... and "unspoilt by progress" (I have stolen a catch phrase from a beer company) :)

Dear David,
Thank you for writing this VERY important post. I greatly appreciate the passion and commitment by which you help and nurture the R community.

Yours,
Tal

Interesting post, David. Thanks.

The growth in R usage also matches with what we have seen in our 2007-2013 Data Miner Surveys. We recently released the latest summary report, and we have included several pages that describe the skyrocketing growth in R usage. The highlights are here: http://www.rexeranalytics.com/Data-Miner-Survey-Results-2013.html. Anyone who wants a copy of the FREE 41-page report can contact us at DataMinerSurvey@RexerAnalytics.com.

Happy Holidays, everyone!
-- Karl

I use R, and not Python, for this type of analysis... but using support questions to proxy usage is not without problems. Critics would probably say that Python usage is underreported, since Python has "one obvious way to do things". This seems like a good first step, but a nice extension would be to find some other proxies for usage which aren't conflated with ease of use.

We can fight about R vs Python AFTER we save the humanity from SAS and Matlab. For now, lets just make both amazing!

PD: I use more python :P

Everyone thus far has been so nice about reflecting upon Asay's article. I applaud you all. My own character flaw becomes exposed when I tend to lose my poker face as I have an extraordinary distaste for baseless, FUD'ish blogging.

The issue I take with his article isn't so much the nature of his argument than how he tries to forward it. Does it not occur to him that, someone who is allegedly a 'data science' mogul and makes a positional statement, yet does not provide supporting evidence via his own craft might find a bit of a credibility problem?

Of course, what he misses is the real context in which his argument resides. From a purely fundamental statistical and more generally scientific viewpoint, one cannot compare the outcomes of an apple and orange simply by their visual attributes (such is one particular grudge I have in the infographics world today - another kvetch for another time). Naturally he would have had to look at the *use case* as a stratum to add at least some substance to his thesis. That is, he needed to compare the intersection where pythonistas and r users (and even dual users) converge.

Programmatically (not syntactically), R and Python have several points of congruence. They are both multi-paradigm: array (of the 2, R is more suited to vector programming), object-oriented, imperative, functional, procedural, reflective (thank you Wikipedia for that nice summary so I didn't have to recall that from dusty texts). Technically, they can be used for similar things. Nothing new there.

By contrast, R does not have as strict a typing discipline as Python, which can be both a strength and a weakness depending again on your use case.

The syntax and best coding practices are indeed different between the R and Py. For those coming from purely OOP studies and experience with more base languages (C++, etc), yes there will be plenty of gripes. Gee, we've never seen THAT before - yet these languages/environments persist and have their places, just as R and Python do (imagine those same folks being forced to learn SAS - I imagine the suicide rate in the world will have consequentially increased 4 fold :) ).

However, if the objective includes time-to-model development, and a more primary focus on method, then R is far more mature in this regard.

Note I didn't say better, worse, etc (my mention of SAS unequivocally being an exception :) ).

I personally use BOTH R and Py in my work, depending on the use case. I use other programming environments as well for the same reason - I don't believe that a single technology stack is a determinant of 'better or best' in DS.

Sure, it would be interesting to see an R/Py or some other hybrid to test R's mettle, where code discipline is a bit more unified and translatable, with the addition of every scientific package imaginable, with vector based programming, and better scalability. Change can be good, and is important.

But my guess is, you'd just have a mash-up where each of R and Py, or whatever else would retain their own characteristics. Hmmm I wonder about that RPy package thingy they have out there :). Even better, ever use RevoDeployR? No, I don't see either language being supplanted by the other - or a 'better' or 'worse' overall language in general. That's very myopic thinking. I believe Ruby was one such attempt at this experiment - and it certainly has its following, but it most certainly didn't diminish the importance of any of its component language contributors.

---------------Let me diverge here ---------------

So I'm kvetching about one comparatively small issue in the universe... blame my genes on that one :). But I do believe that Asay's blog is a small contributing part to a much larger problem in the 'data science' arena. It's as if there's this rather muted 'mortal combat' between those who are good at dangling shining lights, and those who are genuinely, measurably, and meaningfully impacting their objects, and the conglomerative discipline itself.

His blog resembles very much in my mind the article "The Death of the Statistician" . You can google the title itself and find other references which seem to indistinctly draw these boundaries between 2 different disciplines trying to achieve similar final objectives.

Where does this myopia come from? That's the easy one: economic/power advantage. This is nothing new of course, either conceptually or historically. However when this behavior is extended into the scientific research world itself, many new problems emerge - mostly in the realm of general credibility and value (allow for example clinical science, big pharma, and medical device histories:) ) - not a good thing.

The data science world and all related parties should be *very* concerned about this (even as small as the aforementioned blog) and should consider appropriate actions before the broad sweeping black eyes begin, affecting the credibility of the whole. We have much to do in this 'storming and norming' in each of our areas surrounding the rather infant face of data science to deal with this. Any science must maintain not only its creativity, but also its rigor, within its methods and within its ranks, if it is to be a science at all.

Many thanks to David for publishing this in a far better manner than I just did :)

Yes, using Python you can technically do anything that you could do in R. You can also do anything Python could do using C. Plus, C has more users than does Python. So, C must be best for data analysis, right?

Well, sometimes C is the right choice, depending on exactly what's being done. All of these, and more, could be used in a good data analysis workflow. Even Excel, Tableau, SPSS and Stata have their place when used properly and to their comparative advantage.

The comments to this entry are closed.


R for the Enterprise

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid

Search Revolutions Blog