« Because it's Friday: How to tell a Boeing 737 from an Airbus A320 | Main | Sustainability through Energy Load Shaping for Buildings using R »

July 18, 2016

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

David, I applaud your attempt, but I think R's handling of NA values defies explanation.

You wrote: "Now think of all of the numbers that could replace NA in the expression NA^0. Any positive number to the power zero is 1."

Allow me to change this slightly: "Now think of all of the numbers that could replace NA in the expression NA*0. Any positive number times zero is 0."

Thus, we expect NA*0 to be 0. Let's check:

R> NA * 0
[1] NA

Ahg, no.

I've seen people try to explain R's handling of NA values as being somehow consistent from a computer-science language-design point of view, but as a user who writes R scripts with lots of missing data, I claim there are some inexplicable inconsistencies with NA values in R.

Kevin Wright

Just for further example, I can sorta, kinda, maybe, tolerate R doing this:

R> sum(NA, na.rm=TRUE)
[1] 0

But this borders on insanity for real-life analytic scripts:

R> prod(NA, na.rm=TRUE)
[1] 1

Annoying counter point: if we would consider NA to replace any number, then the following should be TRUE instead of NA:

R> Inf >= NA

(instead we get NA). However, this counter point provides also a counter point to the previous comment that NA * 0 should be 0; in fact, Inf * 0 == NaN.

This also lead to a result that was slightly surprising to me: Inf^0 == 1 (I was expecting NaN!)

Hello Kevin,

I might be able to explain your results:
1) Notice that Infinity*0 is completely undefined, but Infinity^0 is still reasonable to be defined as 1 - you can try this in R with Inf*0 and Inf^0

2) It's reasonable, and standard, to define the empty product as the multiplicative unit - see this: https://en.wikipedia.org/wiki/Empty_product

Nice article i was really impressed by seeing this article, it was very interesting and it is very useful for Big data training.

Thanks @R-Stats for the link to the Empty_product. This is exactly what I meant about the R language being designed to some ideal standard. But consider the following example. Is there any possible way that you would ever want Q1 sales to print as 0? Wouldn't you want it to be NA? Printing 0 is extremely misleading in my opinion.

R> dat <- data.frame(yr=c("Y1","Y1","Y1","Y1","Y2","Y2","Y2","Y2"),
+ qtr=c("Q1","Q2","Q3","Q4","Q1","Q2","Q3","Q4"),
+ sales=c(NA,5,5,6,NA,6,7,8))

R> tapply(dat$sales, dat$yr, FUN=sum, na.rm=TRUE)
Y1 Y2
16 21

R> tapply(dat$sales, dat$qtr, FUN=sum, na.rm=TRUE)
Q1 Q2 Q3 Q4
0 11 12 14

@Kevin Wright

That makes sense. But let me present a different POV. If you're using

na.rm = TRUE,

shouldn't you be responsible for making sense of the absence of NA? If you do want to keep the NA, you can use

tapply(dat$sales, dat$qtr, FUN=sum, na.rm=FALSE)

which correctly results in

Q1 Q2 Q3 Q4
NA 11 12 14


I agree that, at the very least, the result of prod(NA,na.rm=TRUE) should be documented in the help page.

I did find this nugget at ?prod :

"For historical reasons, NULL is accepted and treated as if it were numeric(0)."

So now we can all start arguing about what NULL really is :-)

While I'm at it, just for fun:

> NA/NaN
[1] NA
> NaN/NA
[1] NaN

An interesting follow-up would be to find out why R claims that 0^0, Inf^0, and 1^Inf are all equal to 1. Whereas it returns NA for Inf * 0, Inf-Inf, Inf/Inf, and 0/0. It seems that R is not consistent in the treatment of indeterminate forms.

@flodel

That's not exactly an inconsistent treatment of indeterminate forms. That's the mathematical treatment.

0^0, Inf^0 and 1^Inf are all indeed equal to 1, in the mathematical sense. On the other hand, Inf*0, Inf - Inf, Inf/Inf and 0/0 are all undetermined - again in the mathematical sense - which is exactly what R returns - it actually returns NaN, at least in my machine.

@R-Stats, you could check http://mathworld.wolfram.com/Indeterminate.html or https://en.wikipedia.org/wiki/Indeterminate_form; both sources describe 0^0, Inf^0, 1^Inf, Inf * 0, Inf-Inf, Inf/Inf, and 0/0 as indeterminate forms.

After R made the choice that 0^0 and Inf^0 are both equal to 1, then it's understandable that it claims NA^0 is 1 as well. However, apply the log() to that result and you get that log(NA^0) is not equal to 0 * log(NA).

Similarly, after R made the choice that 1^Inf be 1, it is understandable that it returns 1 for 1^NA. However, take the log() and you get that log(1^NA) is not equal to NA * log(1).

With some work, one could probably come up with more examples of surprising results like the ones above, which exploit the inconsistent way R handles the indeterminate forms I have listed. Makes you wonder why the R authors had not decided to return NA for all these indeterminate forms.

Another counterpoint is to realize that in R, NaN^0 also equals 1. Since NaN is by definition 'not a number', it can't be the case that R is using a 'placeholder for an unknown number' logic.

There seems to be a little confusion between NaN (not-a-number) and NA (R's placeholder for a missing number) in the above. R shouldn't return NA for an indeterminate form; it should (and generally does) return NaN in such cases. James Howard has a recent blog post on this topic.

I suspect the reason why R Core adopted the 0^0=1 definition is because of the binomial justification, R being a stats package after all.

I can't think of any defense for NaN^0=1 though...

1. I think the post David was trying to link to was this one: https://jameshoward.us/2016/07/18/nan-versus-na-r/

2. The defense for NaN^0 = 1 comes from the hardware: https://jameshoward.us/2016/07/25/course-nan0-1/

@Kevin and NA * 0: before drawing too quick conclusions, note that Inf * 0 is (most of the time, at least in the double precision standard!) defined to be 'NaN' and basic arithmetic in R does follow that. So, replacing the placeholder x=NA by Inf (or -Inf !), you have cases where x * 0 is not 0.... and that was the reason NA * 0 was defined to be NA (and NaN * 0 to be NaN).

And yes, it is true, one *could have* adopted the definition that all of these, including 0^NA should return NaN ... which corresponds to typical floating point standards... *BUT* and here we are back to the original posting by David Smith, in almost all math-stat applications it is very convenient to have 0^0 = 1; this goes for the border cases of binomial, negbinomial and poisson and derived formulas IIRC.

The comments to this entry are closed.

Search Revolutions Blog




Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid
Get this blog via email with Blogtrottr