Something strange happened with Pollster.com's charts of the various polls of citizen sentiment in the US: there appeared to have been a recent uptick in responses to "Is the country on the wrong track?" and a downturn in responses to "Is the country going in the right direction?"
The curves are loess smooths of poll results from various polling firms over time. Some avid poll-watchers suggested the recent behaviour may have been the effect of two particular polling firms, and removing them from the sample indeed results in a flat trend on both questions over the past month or so. Problem solved, right?
But if you then redo the chart using only the two "problem" firms, the trend is also flat over the past month (and perhaps even shows a slight convergence). So where does the divergence in the omnibus chart come from?
The reason is that both groups of pollsters showed consistently flat trends over the past month, but at slightly different levels. The "problem" pollsters were sampled more frequently in recent days, and so received more weight in the loess smooth. Hence the apparent (but artificial) divergence. It's a good example of
Simpson's Paradox in regression data.
By the way, when I last checked (a couple of years ago), Pollster.com was using
R for all their regressions and graphics. Since then they've become a major media player (and are apparently too big to answer my emails), and their
FAQ has been "under construction" that entire time, but I presume they're still using it.
You know... the HBS article about Gender bias in Twitter followers seems like it may have a Simpson's paradox as well. I may be misusing the term, but they seem to be messing up the denominator: http://blogs.law.harvard.edu/fireunderembers/2009/06/02/gender-bias-a-twitter-folly/
This doesn't explain, however, why my wife has more twitter followers than I do. That's explained by my drunken twittering and her good looks.
Posted by: JD Long | June 03, 2009 at 14:53