« Data Manipulation with sparklyr on Azure HDInsight | Main | Airbnb grows by sharing data scientist knowledge »

November 09, 2016


Feed You can follow this conversation by subscribing to the comment feed for this post.

Sometimes we are so blinded with our own belief that we cannot phantom something else. Hence, we follow a script blindly and miss the tectonic shifts, and miss new evidences that don't follow our script. This explains perfectly why the financial market couldn't predict the housing and financial meltdown in 2008 (see Big Short). In short, we develop complex beautiful useless models.

More than usual, I think we need to look at how much people disguised their intent in polling, and how good the sampling was - in an age when people being at home to answer the landline might be an outlier.

Was the model wrong or the data not accurate because people did not respond truthfully to the survey?

Michael Moore got it right some time ago with a simple conceptual model based on electoral votes from 2012 and the focus of Trump working for votes in the US Rust Belt.


In spite of that, I think there were problems with the polling methodology. EG, did respondents answer differently to female pollsters etc etc

How is this a sad day for political forecasting? If polling and predictions always work, then there would be little new to learn. With this "miss" (not a failure, if you're statistically inclined) there will be much learning. Some of it will apply to the mechanics of polling, predicting and so on. Much of it may help us in other arenas, including macroeconomics and demographics (where the data are considered "solid;" I'm not sure about that hypothesis today), political activism and more.

Null hypothesis: Deviation from model predictions are unrelated to the type of voting machine (and the type of internet connectivity) at the polling location, and also with the precinct's likelihood of changing electoral outcome.

This seems a bit out there, but might be worth getting a p-value for the above.

As some of the post election numbers come in, we see that in some key places there were fewer blue voters than there were in 2012. For example, in Wisconsin, a tally I saw yesterday indicated that in 2012, there were 1,620,985 democratic votes vs 1,407,966 republican votes. In 2016 the numbers were 1,377,588 democratic (-243,397) and 1,404,376 republican (-3,590). While this could be the result of voter suppression, my thinking is that the models so convinced everybody of a Clinton win, that some people on the blue side didn't make it a priority to go out and vote. And yes, there's the possibility that some democratic voters stayed home because they didn't like either candidate. Perhaps there is such a thing as creating such an atmosphere of overconfidence that you change the results your models predict?

I think MaryPCBUK identified the problem, and it's down to basic human psychology more than data or methodology. During the Brexit campaign here in the UK, people were reluctant to say they were voting to leave the EU, for fear of being labelled as a racist, because the immigration issue was such a highly discussed and emotive part of the entire campaign. So my guess is that the same happened in the USA, people would either not discuss how they would vote, or conceal it by saying they would vote for Clinton because they were afraid of being tarred. The same effect was seen in Germany after WWII - no one would admit to being a member of the Nazi party, even though millions were.

Great map and visualization, David! The Rust Belt shift comes through in stark relief. I am wondering if random survey samples drawn at the national level could have watered down the enormous swing that was coming in the upper Midwest (which proved decisive in the outcome), and if that might partly explain why there was systematic error in so many forecasts?

“If you torture the data long enough, it will confess.”

― Ronald H. Coase, Essays on Economics and Economists

Thanks William, but credit for the map goes to the NYT, not me. Pretty sure 538 at least does include national surveys in their predictors, but the bottom line is that with such little data it's basically impossible to predict a regional effect like that that hasn't been seen before, unless it can be explained by demographics/economics. Remember, there had only been 57 presidential elections before 2017. That's 57 data points, and only the most recent few are even relevant. Not much to make a forecast from.

Looking back at the UK 2015 General Election, a significant error in polling then was that voters easy to contact had different preferences when compared to voters hard to contact (found by "British Election Study" post vote). Quick, cheap polls were biased towards voters who are easy to contact, and I haven't seen anything from this US election that suggests pollsters tried to avoid this potential issue.

The comments to this entry are closed.

Search Revolutions Blog

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid
Get this blog via email with Blogtrottr