In R, the logical ||
(OR) and &&
(AND) operators are unique in that they are designed only to work with scalar arguments. Typically used in statements like
while(iter < 1000 && eps < 0.0001) continue_optimization()
the assumption is that the objects on either side (in the example above, iter
and eps
) are single values (that is, vectors of length 1) — nothing else makes sense for control flow branching like this. If either iter
or eps
above happened to be vectors with more than one value, R would silently consider only the first elements when making the AND comparison.
In an ideal world that should never happen, of course. But R programmers, like all programmers, do make errors, and using non-scalar values with ||
or &&
is a good sign that something, somewhere, has gone wrong. And with an experimental feature in the next version of R, you will be able to set an environment variable, _R_CHECK_LENGTH_1_LOGIC2_
, to signal a warning or error in this case. But why make it experimental, and not the default behavior? Surely making such a change is a no-brainer, right?
It turns out things are more difficult than they seem. The issue is the large ecosystem of R packages, which may rely (wittingly or unwittingly) on the existing behavior. Suddenly throwing an error would throw those packages out of CRAN, and all of their dependencies as well.
We can see the scale of the impact in a recent R Foundation blog post by Tomas Kalibara where he details the effort it took to introduce an even simpler change: making if(cond) { ... }
throw an error if cond
is a logical vector of length greater than one. Again, it seems like a no-brainer, but when this change was introduced experimentally in March 2017, 154 packages on CRAN started failing because of it ... and that rose to 179 packages by November 2017. The next step was to implement a new CRAN check to alert those package authors that, in the future, their package would fail. It wasn't until October 2018 that the number of affected packages had dropped to a suitable level that the error could actually be implemented in R as well.
So the lesson is this: with an ecosystem as large as CRAN, even "trivial" changes take a tremendous amount of effort and time. They involve testing the impact on CRAN packages, coding up tests and warnings for maintainers, and allowing enough time for packages to be udpated. For an in-depth perspective, read Tomas Kalibera's essay at the link below.
R Developer blog: Tomas Kalibera: Conditions of Length Greater Than One
Packages like Hadley Wickham's 'strict' could help too, though this particular case is an open issue that I raised: https://github.com/hadley/strict/issues/32
Posted by: Mark Adamson | October 17, 2018 at 12:27