By Andrie de Vries
At around 5pm (British Standard Time) on July 22, 2015, the total question count for R questions on StackOverflow went past the significant milestone of 100,000.
I was curious to know more about the pattern of growth of questions. Fortunately it is easy to use StackOverflow itself to create a query that produces data about usage. You can do this by creating a SQL query on the StackExchange data explorer.
My query, R Trends (# Questions per Tag per Month), creates a table as well as a plot of questions in the tags "[r]" (blue) as well as "[python]" (orange). Whereas people ask about 4,000 R questions every month, python attracts about 3x, i.e. more than 12,000 questions every month.
Replicating and analysing using R
The StackExchange data editor also allows you to download the data as a csv file. This means we can start to analyse it using R itself.
First, use ggplot2 to replicate the basic data (all the code is at the bottom of this blog post). Note that the first questions were asked in August 2008 (python) and September 2008 (R).
It seems as if there might be exponential growth in the rate of questions asked. So, using a logarithmic y-scale and adding a smoother.
From this plot, it seems as if there was reasonably stable exponential growth since around 2012.
Creating a forecast
Once the data is in a data frame, it is easy to analyse using any of the tools available in R and CRAN.
For example, I uses Rob Hyndman's CRAN package, forecast, to create a forecast. This package has a collection of automated forecasting functions. I used auto.arima() to create a time series forecast for both the python and r tags.
This forecast indicates that we can expect python questions to increase to around 17,000 questions per month within the next two years. We can also expect the number of R questions to increase to more than 6,000 every month.
My favourite question
Here is the code to download the data from StackExchange and analyse it in R: