« Data I/O performance tips | Main | R 2.11.0 released »

April 21, 2010

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Damn right it's not pretty. Even I get caught out on R quirks after 20 years of using it. Compare letters[c(12,NA)] and letters[c(NA,NA)] for the most recent thing that made me bang my head against the wall.

R gets used because of its functionality, in the huge library of available packages, builtup because of its long history. R is the same generation as XLISPSTAT, but it seems that XLISPSTAT was too ugly even for statisticians, and now seems to be a historical curiosity.

If I could transparently call all the functions in CRAN from Python I'd never write another line of S code again (Rpy is a start!).

Horrible ugly bodged-up historical mess of a language. But hey, Better Than FORTRAN!

I wish I could find the quote or remember who said it, but it's along the lines of "Show me a language no-one has complaints about, and I'll show you a language no-one uses". Yep, R is quirky, but that's because, not despite, of the way it's used.

Isn't it considered a good argument that 'R is a free software environment' (http://www.r-project.org/). I think that's one of they main reasons for using R.

Not sure I agree with your last sentence. The quirks surely come before the usage?

My example with 'letters' comes from a collision of three features - recycling of short subscripts, silent coercion of types (boolean NA to numeric NA), and the existence of five different NA values that all print the same.

Have you read 'the zen of python'? (start python and type 'import this'). Many things in R violate those ideas, often in the 'Simple is better than complex' department.

For example, to really understand that letters[c(1,NA)] is different from letters[c(NA,NA)] you have to see that:

* in the first case, the NA is coerced to a numeric NA because it's in a vector with a numeric '1'.
* in the first case, you are selecting elements by supplying a vector of indexes
* in the second case, your NAs are boolean (logical) NA values
* hence your subscript is a logical vector
* logical vectors are recycled
* now your subscript is a vector of TRUE/FALSE values (which are all NA) of the same length as 'letters'.

Zen: "Simple is better than complex". However, subscript recycling is shooting you in the foot. Yes:

x[c(TRUE,FALSE)]

is a simpler-looking way to get the odd elements of a vector than:

x[rep(c(TRUE,FALSE),length(x)/2]

but the simplest of all is:

oddElements(x) # to be written

Zen: "readability counts"

The R system is lovely, amazing, flourishing, but look at almost any R code and large chunks of it are probably quirks management :)

Hi Barry,
Thanks for the very educational example.

I republished your example (With reference here) on my blog:
http://www.r-statistics.com/2010/04/the-difference-between-lettersc1na-and-letterscnana/

And would love to credit you with a link to your website (please contact me on the post for that)

Thanks again,
Tal

Thanks a lot for an interesting post. Now that I know what happens, I can avoid it.

The comments to this entry are closed.

Search Revolutions Blog




Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid
Get this blog via email with Blogtrottr