by Joseph Rickert
In addition to the considerable benefit of being able to meet other, like-minded R users face-to-face, R user groups fill a niche in the world of R education by providing a forum for communicating technical information in an informal and engaging manner. Conferences such as useR!, JSM and countless smaller statistical meetings solicit expert level talks, and the many online sites do an excellent job of providing introductory material. However, there are few places that adequately address the "middle level" talk where a speaker can assume an audience has some experience with R and then go on to develop the R code to perform an analysis, illuminate an application, or show how to get started with a new package.
A recent talk on Hidden Markov Models (HMM) that Joe Le Truc gave to the Singapore R User Group (RUGS) provides a very nice example of the kind of mid-level technical presentation I have in mind. I didn’t attend this talk myself, but the organizers were kind enough to post Joe’s slides and code on the RUGS' meetup website.
The general idea of a HMM is easy enough to understand: one observes some time series or stochastic process and imagines that it has been generated by an unobserved or "hidden" Markov process. However, the details of formulating and fitting a HMM involve some specialized knowledge, and the sophisticated tools available to develop a HMM in R can add an additional level of complexity. Joe’s presentation helps a beginner to dive right in. He briefly states what HMMs are all about, presents some practical examples, and then goes on to show how to use the functions in the very powerful depmixS4 package to fit an HMM model to a time series of S&P 500 returns.
The following slide from Joe’s presentation sets the stage for a concrete example
Consider the following plot of the log returns for the S&P500 for the period from 1/1/1950 to 9/9/2012.
The graph shows what looks like a more or less stationary process punctuated by a few spikes of extreme volatility, the most extreme being October 19, 1987. Joe's code shows how to construct a four state HMM to model this process. The next plot zooms in on the period around the crash of October 1987 and also shows the probabilities of being in the first state of the HMM built with Joe's code.
Note that the model shows 0 probability of being in state 1 during the crash and the other extreme low points. The general idea is that by examining the probabilities associated with the various states and the transition matrix that determines the probabilities of moving from one state to another:
Transition matrix
toS1 toS2 toS3 toS4
fromS1 4.792678e-01 2.361060e-19 5.207322e-01 3.515133e-21
fromS2 7.503595e-01 4.377190e-10 2.496405e-01 2.669647e-24
fromS3 4.806678e-01 6.005592e-02 3.978485e-01 6.142784e-02
fromS4 8.655515e-35 1.923142e-01 2.286245e-48 8.076858e-01
one can gain some insight into the dynamics of the observable time series.
Although Joe's code is only an incremental modification of the example given in the documentation for the depmixS4 package, I believe that it serves the valuable purpose of helping to popularize a package that otherwise might be a bit intimidating to someone who is not an expert in this area. The code to generate the plots shown above may be found here: Download HMM_blog_post.
For more material on HMMs have a look at the Thinkinator post, the little book of R for bioinformatics, or the very accessible and thorough treatment in Hidden Markov Models for Time Series: An Introduction Using R (Chapman & Hall) by Walter Zucchini and Iain L. MacDonald which shows how to code HMMs in R from first principles.
With an opening slide like the one on the linked-to RUGS presentation, I can see why it's difficult to get more women interested in STEM fields.
Posted by: Bob Rudis | March 06, 2014 at 13:24
-- one can gain some insight into the dynamics of the observable time series.
This is, generally, the notion that causes quants to crash economies. The common logical leap is to predict the future based on "figuring out" past data. While I'm not inclined to go all out Taleb, quant analysis of data generated by (mostly) humans making financial policy and decisions is dangerous. While one might be able to predict gross moolah movements, The Great Recession proved that most quants couldn't even figure that out. There were reasons that all that money moved to US housing, and all of them were the result of explicit policy decisions. The money flows happened as a result of the policies, not the other way round. Policy drove (and predicts) the data.
Financial markets are overwhelmingly driven by changes to the rules of the game, which changes are often hidden and nearly always carried out by those who've got much skin in the game. While The Great Recession, although not necessarily its exact date of occurrence (pick a day), was easily seen in advance by looking at the trajectory of house prices and their unsustainability, it was not obvious in less granular data. The old fashioned non-quant macroeconomists were the first to figure it out. Not that anyone would listen. "We all have to keep dancing while the music still plays."
IOW, identifying the existence of prior black swans is of little use to predicting future ones, since the manipulations driving past ones are, more or less, made illegal in their aftermath. Future black swans will be caused by nefarious manipulation in other parts of the financial landscape, and likely by other operators.
The financial system doesn't follow neat and clean algorithms a la Newton, thermodynamics, or even Heisenberg.
Posted by: Robert Young | March 06, 2014 at 17:40