We use cookies to help improve and maintain our site. More information.

06 November 2020

What the porcupine tells us

The sighs of relief have for a brief moment silenced the alarm bells that started ringing last week. US voters chose a new president but they did not repudiate Trumpism. The pollsters got the binary result right, but they got everything else wrong. Like economic forecasting, opinion polling is no longer working. 

Statistics is simultaneously the most potent and most dangerous of the modern mathematical disciplines. Statistical methods help doctors detect cancers in grainy medical images and help signal analysts trace terrorists. But they can also lead us astray. There are known technical problems associated with political polling, like sample sizes that are too small. And of course, a few pollsters and forecasters may even be dishonest. But conspiracy theories cannot explain the mass failure of forecasting and polling. Not only do the pollsters get it wrong. They keep doing the same thing over and over again and expecting different results. US pollsters underestimated Donald Trump and the Republicans in 2016, and then again in 2020. They did not learn. 

My explanation is that the political divisions of our time have infected the professionals, perhaps unconsciously. I noted that the betting markets in the US did a better job. I don't think that the participants have better information, but skin in the game constitutes a counter-acting force. 

The behavioural psychologist Daniel Kahneman famously helped the Israeli army improve the quality of its recruitment by instituting an anonymous selection process. He discovered that senior officers had previously been prejudiced about who would make a good soldier and who would not. Once you took that prejudice out of the equation, the quality of the recruitment improved. 

Polling is at the stage where Israel's army was before it hired Kahneman. If you hate Trump and everything he stands for, you are less likely to direct your pollsters to the sections of the population that Trump manages to energise.

The same thing happened to polling in the UK. Brexit energised a previous apolitical section of the population on both sides of the debate. This is how pollsters came to underestimate the support for Brexit in 2016, the momentum behind Jeremy Corbyn in 2017 and Boris Johnson's triumph in the 2019 elections. Still today, pollsters publish polls that show a big lead for the Remain position. They keep making the same mistake.

Doubling down is also the bane of economic forecasting. One of our all-time favourite charts is the porcupine - where the upward spikes represent the ECB's inflation forecasts. The problem is the same as with polling: a persistent bias. The ECB's inflation forecast is always too optimistic. The forecasting model has a built-in mean reversion. It is hardwired to predict that inflation will revert to the 2% inflation target. In a situation like this, you would be better off asking a fortune teller or simply throwing a die. The rational thing to do would be to dump the model. But this would constitute a repudiation of the macroeconomic mainstream of the last 40 years, and an admission that the ECB's inflation target has lost traction. Lack of honesty is behind most polling and forecasting errors.

The pandemic gave rise to a statistical delusion of a related kind: the notion that you can compare the spread of the virus across countries and across infection cycles within a country in real time. The problem is that the data are simply unusable. Some countries like the UK hardly did any testing during the first wave, while Germany tested everyone who asked for it. You are dealing with data that have an error margin of 1000% or more. I am not surprised to see economists, in their capacity as bloggers and tweeters, extend their dark arts to epidemiology. 

On this point, I also need to direct the criticism at my own profession. Data journalists should have been questioning those practices rather than amplifying them with fancy charts. 

What all these examples have in common is self-delusion. This leads me to make a forecast myself: machine learning techniques are the future of polling. They will outperform polls and forecasters simply because they couldn't care less who wins or loses.