The Upshot, FiveThirtyEight, Predictwise, etc: their predictions for President varied over the campaign as you'd expect as new data came in, but consistently made Clinton a solid favorite, with a probability of a win topping 70% the day before election day. So what went wrong? As in any statistical forecast, there are three possibilities:
- The models were wrong. No model is perfect, but it seemed to me at least that the various forecasts, despite their differing methodologies, all captured the essential mechanisms of being elected President: the electoral college; the similar behaviours of some states; the influence of economic and demographic statistics; the relationship between polls and votes. Clearly, something was missed, but these models have been good enough before, and it's not clear why they weren't this time.
- The models were right, and this is a fluke. Even a 95% probability isn't a guaranteed outcome. It's entirely possible that the electorate behaved in exactly the way the models described, just at the most extreme Trump-favouring end of the predicted spectrum. Looking at the "residuals" of the model should give us some clue, like the county-by-country swings from the 2012 election shown above. To me, though, that looks like something systematic was missed in the model.
- The data were bad. The US election process, because of its haphazard nature and inconsistent processes across the country, makes it unlikely that the actual election results were incorrect. That leaves the data going into the models. I see no reason why the economic and demographic data shouldn't be considered solid. That leaves the polling data: why, this 2016 election, did the responses to polls translate into votes in a fundamentally different way than in previous elections? These models corrected for "house effects" and weighted for polling methodologies, so they shouldn't have been caught out unless data from prior elections are not a useful guide for this election. One possibility that comes to my mind is that the "feedback effect" -- the influence on voters from the poll results and projections themselves -- behaved unexpectedly this year, thanks to the power of personality of the candidates, and the increased influence of the network effects of social media.
These possibilities will be considered in depth online in the weeks and months ahead and in political science theses for years to come, I'm sure. No-one knows the answers yet.
But this is a sad day for the profession of political forecasting.