Comments on A segmented model of CRAN package growthTypePad2016-04-27T16:10:17ZBlog Administratorhttps://blog.revolutionanalytics.com/tag:typepad.com,2003:https://blog.revolutionanalytics.com/2016/04/a-segmented-model-of-cran-package-growth/comments/atom.xml/jangorecki commented on 'A segmented model of CRAN package growth'tag:typepad.com,2003:6a010534b1db25970b01b7c8538980970b2016-05-05T21:03:24Z2016-05-12T18:17:44Zjangoreckihttps://jangorecki.github.io@Alex nice finding but it lists < 50% of my pkgs on github, so 10K isn't very accurate I think....<p>@Alex nice finding but it lists < 50% of my pkgs on github, so 10K isn't very accurate I think. Decentralized index for R packages would be a nice solution for that.</p>Alex commented on 'A segmented model of CRAN package growth'tag:typepad.com,2003:6a010534b1db25970b01bb08f45833970d2016-05-03T00:24:05Z2016-05-12T18:17:44ZAlexBut there are 10,000 R packages on github http://rpkg.gepuro.net/ So growth is probably only increasing<p>But there are 10,000 R packages on github</p>
<p>http://rpkg.gepuro.net/</p>
<p>So growth is probably only increasing</p>Achim Zeileis commented on 'A segmented model of CRAN package growth'tag:typepad.com,2003:6a010534b1db25970b01b7c84cd030970b2016-04-28T21:31:26Z2016-04-29T00:05:43ZAchim Zeileishttp://eeecon.uibk.ac.at/~zeileis/Is it reasonable to assume that the log-number of CRAN packages is trend stationary? Wouldn't it be more sensible to...<p>Is it reasonable to assume that the log-number of CRAN packages is trend stationary? Wouldn't it be more sensible to assume that the process is integrated and model the growth rates directly? These are more likely to be level stationary. Thus: </p>
<pre>
rpkg <- 100 * diff(log(CRANpackages$Packages)) / as.numeric(diff(CRANpackages$Date))
plot(CRANpackages$Date[-1], rpkg, type = "l")
</pre>
<p>This also shows some decline in the growth rate from around 0.1% to 0.06% per day - but it also exposes the large variance. Neither a linear model nor a single structural break would appear to be significant here. Possibly a higher sampling frequency could help to work out the differences between these possible data-generating processes.</p>