« A Data Scientist’s Perspective on Microsoft R | Main | R Conferences: Europe 2016 »

April 27, 2016


Feed You can follow this conversation by subscribing to the comment feed for this post.

Is it reasonable to assume that the log-number of CRAN packages is trend stationary? Wouldn't it be more sensible to assume that the process is integrated and model the growth rates directly? These are more likely to be level stationary. Thus:

rpkg <- 100 * diff(log(CRANpackages$Packages)) / as.numeric(diff(CRANpackages$Date))
plot(CRANpackages$Date[-1], rpkg, type = "l")

This also shows some decline in the growth rate from around 0.1% to 0.06% per day - but it also exposes the large variance. Neither a linear model nor a single structural break would appear to be significant here. Possibly a higher sampling frequency could help to work out the differences between these possible data-generating processes.

But there are 10,000 R packages on github


So growth is probably only increasing

@Alex nice finding but it lists < 50% of my pkgs on github, so 10K isn't very accurate I think. Decentralized index for R packages would be a nice solution for that.

The comments to this entry are closed.

Search Revolutions Blog

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid
Get this blog via email with Blogtrottr