The Devil in the Polling Data

The same problem that caused the 2007 financial crisis also tripped up the polling data ahead of this year’s presidential election.

By Pradeep Mutalik

November 14, 2016

Unravel the biggest ideas in science.

The full Nautilus archive • eBooks & Special Editions • Ad-free reading

The full Nautilus archive
eBooks & Special Editions
Ad-free reading

Subscribe to Nautilus

Explore

The devil in the data that left election forecasters with egg on their facean amazingly prescient article in which he predicted exactly how Trump would win in excruciating detail. Both of these reasons are ultimately related to the well-documented enthusiasm gapput it back in August, “If people could vote from their sofa via their Xbox or remote control, Hillary would win in a landslide.”

Nautilus Members enjoy an ad-free experience. Log in or Join now .

Other factors like the inability to contact rural voters have been proposed, but it seems to me that good pollsters should have been able to overcome those kinds of problems.

So even the best of the pollsters have a lot to learn. How about the modelers?

I think modelers need to make some changes too.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

Consider a hypothetical state that had numbers similar to Michigan this year. The raw polls showed about a 3.5 percent edge for Clinton. I’ve tried to reverse engineer two simple models with predictions and behavior similar to the FiveThirtyEight and PEC models using the same kinds of tools they used. Imagine that Model 1 predicted a 70 percent probability of Clinton winning and Model 2 predicted a 99 percent probability. Here is how these predictions would have to be modified in the presence of systematic correlated polling error:

With correlated error of:	0%	1%	2%	3%	4%
Probability of Clinton win:
Model 1	70%	65%	59%	53%	47%
Model 2	99%	95%	84%	63%	37%

Nautilus Members enjoy an ad-free experience. Log in or Join now .

The actual correlated error for Michigan turned out to be four percentage points. If Model 1 had known and taken into account this magnitude of correlated error, its prediction of Clinton winning would have changed from 70 percent to just 47 percent, and Model 2’s prediction would have changed from 99 percent to 37 percent. Both models would have predicted a Trump win in this hypothetical scenario. What’s interesting is how large the swings in the probabilities are with very small changes in the correlated error.

Some readers here defended Nate Silver’s forecast, which had the probability of Clinton winning at 71.4 percent, on the grounds that it should not surprise anyone that about a one in three chance materialized. Technically, that is correct. I also agree that Silver’s model had some built-in defense against correlated error, while the other models had much less or none. But remember how large the swings in probabilities were in the models above. The modelers knew about the Brexit fiasco, which had a correlated error of four points, in an election with a similar “enthusiasm gap.” As I argued last month, it is extremely misleading to state such a potentially fragile probability to one decimal place: It implies that you are confident about the accuracy of the prediction to the precision stated. Most people are not deeply familiar with the technical details of a probability and tend to think of it as a “score” of the race. They are easily misled by the falsely stated precision. As I recommended then, probabilistic election forecasts should be dispensed with altogether and replaced with the seven-point qualitative scale already in wide use. If probabilities have to be stated, they should include a hedging statement that shows how much they would change in the presence of, say, a two or four percent correlated error as “margins of error.” If forecasters had done this, the potentially large error swings would have discouraged people from taking the forecasts as gospel truth. It would have saved the entire field of election forecasting from public embarrassment.

Hopefully, further research will identify the causes of correlated polling errors and find ways to detect them, and the modelers will build on the lessons learned from this humbling experience.

Nautilus Members enjoy an ad-free experience. Log in or Join now .

Lead image: Lucy Reading-Ikkanda for Quanta Magazine

Pradeep Mutalik

Posted on November 14, 2016

Pradeep Mutalik is a medical research scientist at the Yale Center for Medical Informatics and a lifelong puzzle enthusiast. He has published work in neurophysiology, animal behavior, artificial intelligence, radiology and consciousness. He wrote puzzle columns for The New York Times from 2009 to 2012 and curated the Enigma Cafe at the National Museum of Mathematics from 2012 to 2013.

The Devil in the Polling Data

Unravel the biggest ideas in science.

Become a more curious you.

Unravel the biggest ideas in science.

Become a more curious you.

Pradeep Mutalik

Fuel your wonder. Feed your curiosity. Expand your mind.

Access the entire Nautilus archive,
ad-free on any device.

The Ecosystem Dynamics That Can Make or Break an Invasion

Searching for Alien Life Along the Cosmic Shoreline

A Closer Look at the Science of Mirror Neurons

Insects and Other Animals May Have Consciousness

Doubts Grow About the Biosignature Approach to Alien-Hunting

The Devil in the Polling Data

Unravel the biggest ideas in science.

Become a more curious you.

Unravel the biggest ideas in science.

Become a more curious you.

Pradeep Mutalik

Fuel your wonder. Feed your curiosity. Expand your mind.

Access the entire Nautilus archive, ad-free on any device.

The Ecosystem Dynamics That Can Make or Break an Invasion

Searching for Alien Life Along the Cosmic Shoreline

A Closer Look at the Science of Mirror Neurons

Insects and Other Animals May Have Consciousness

Doubts Grow About the Biosignature Approach to Alien-Hunting

! There is not an active subscription associated with that email address.

Already a member? Log in

Subscribe to continue reading.

! There is not an active subscription associated with that email address.

Already a member? Log in

This is your last free article.

Access the entire Nautilus archive,
ad-free on any device.