by Daniel Moore
Director of Applied Statistics Engineering, Console Development
Microsoft
In Xbox Hardware, we are interested in the various ways that our hardware is used, and we are especially interested in how that usage changes over time. We employ several several time series analysis techniques that are helpful in getting a holistic view of usage of the Xbox console. When it comes to the actual data that we look at – it could be an individual game or app usage or it could be specific features of the console. For all of these, there are a few parts of the time series that we are interested in. The first is the trend of usage over time. Many games are very popular when they are first released and then lose popularity as they age. Some games remain steady in their popularity. The consoles themselves may show increased usage on holiday periods. All of these would be reflected in the overall trend of the data. We are also interested in the weekly cycle of usage. As you can imagine, use of a gaming console goes up on the weekend and down during the week. This is probably even more so among children than adults – as they have more time (and permission!) to play on non-school days than they do on school days.
We use R extensively to perform a time series analysis. In this post we’ll explore the initial analysis and decomposition of the time series into its component parts. Though some packages offer more complete time series analysis options, the base version of R has some good built in features for this initial analysis. The data for this example is the usage of a single game over more than a year of usage on the Xbox One. This is done with the following code below:
data<-read.csv("dataset.csv") # loads the data set in plot(data, type="l") # plot the data first to get a look at it. Here, just a line plot showing weekly periodicity over a trend tsdata<-ts(data, start=1, freq=7) # here we define a time series out of the data, with the first observation set at 1 and a weekly frequency.. decomposeddata<-stl(tsdata, s.window=7) plot(decomposeddata) # this shows four panes of line charts – the data, the sinusoidal seasonal fluctuations, the trend, and the remainder
The object “decomposeddata” is of class “stl” with several components useful for time series analysis. This uses loess to decompose the time series and many smoothing and other settings are available, depending on specific needs and the analysis being performed.
To highlight something that can be illuminated with this, look at the seasonal component of the chart below.
You’ll notice a period (highlighted) where the seasonal fluctuations dropped significantly. This is the summer time when school is out and weekdays/weekends blend together for kids. It’s always fun to find those artifacts in the data!
How does this decomposition work?
Thank You.
Posted by: Royi | February 24, 2016 at 11:56
I know you're trying to use base R, but Twitter's anomaly detection package I think would do wonders for this dataset!
Posted by: Amit | February 25, 2016 at 01:37
Daniel,
thank you for intresting example. I have a feeling, that Data are having also multiple seasonal components (sommer/winter seasonality). Would it be intresting to try Fourier or Wavelet transformations?
It looks like that Xbox and TV usage having some Weekly and Monthly correlation.
Thanks,
Igor
Posted by: Igor | February 25, 2016 at 07:27
Thanks for the comments.
Royi - the decomposition uses the LOESS algorithm built in to R. It's a non-parametric least squares regression. It basically fits a low-degree polynomial to each data point using a weighted least squares method. The Wikipedia article is a good starting point with references for more detail - https://en.wikipedia.org/wiki/Local_regression
Amit - I agree. The anomaly detection package helps a lot!
Igor - thanks for your comments. I have used both a Fourier and Wavelet transformations in looking at these data sets to help reveal multiple seasonality. However, I find that simple decompositions lend themselves to easier explanations (useful when reporting results to non-data scientists/statisticians), even though they may not have the fidelity that Fourier or Wavelet transformations have...
Posted by: Daniel Moore | March 02, 2016 at 09:17