A Theory Of Prediction

Review of The Signal and the Noiseby Nate Silver

The Signal and the Noise is probably the most informative non-technical book about the art of predicting ever written. It outlines what is best described as Nate Silver’s “Theory of Prediction”. Silver, creator of the data journalism site fivethirtyeight, takes readers through an information trip of diverse fields including meteorology, baseball, poker, finance, and politics, documenting estimates that either failed badly, or were extremely successful and the tactics employed by the forecasters who made them. Both the successful and the failed forecasts have much to teach us, and Silver distills the lessons from these examples into a cohesive message: we are inherently terrible at making predictions, but by adopting a few principles, we can improve our estimates and benefit at the personal and national level. In my view, there principles fall under three rules:

  1. Think like a fox
  2. Think like a Bayesian
  3. Think like a (basketball) shooter

It’s hard to make out much about these rules from their names, so let’s walk through each one in turn.

# Think Like a Fox

At the outset of the book, Silver relates the contrasting viewpoints of the hedgehog and the fox as put forth by the philosopher Isiah Berlin. While the hedgehog has one big idea, the fox has many small ideas. Everyone tends to fall on one side or the other: we either believe in a black and white world with well-defined boundaries, or a gray world in which nothing is ever completely certain. Each point of view has its own merits, but when it comes to prediction, the foxes will be significantly more successful. A fox can draw on a diverse range of sources and assume different perspectives but the hedgehog is locked into a single line of thought.

This is best related to predictions through a simple example. Let’s say we are in a country with only two political parties (how ludicrous a concept!), the bears and the tigers. We are vying against a rival forecaster trying to outperform him on predictions for the upcoming election. Whoever wins will be put on the prime news channel for the next four years. Our rival is a classic hedgehog; his one big idea is that the tigers will sweep every single one of the next elections. On the other hand, we have made the rational decision to be more fox-like. We believe that indeed the overall trend favors tigers, but we are willing to look at individual races where the story-line may not be straightforward. When it’s time to make our predictions, we examine each race in turn, looking at all the relevant news and averaging multiple polls to select a winner. Our rival confidently picks tigers for every race, backing up his picks with the assertion that his view of the world is infallible. In the end, we handily win and have the honor of appearing on the news every night for the next four years because we were willing to see the nuance whereas our rival was blinded by a single idea.

This may be a contrived example, but it shows the dangers in adopting a simplistic right/wrong view of the world. When you are a hedgehog and hold only one belief, such as that the tigers will win all the elections, every piece of new information you hear will confirm your beliefs. You will gladly put all your faith in the pro-tiger papers and ignore any conflicting evidence. Meanwhile, as foxes, we have no existing biases and are willing to objectively evaluate each and every source of data. The hedgehogs suffer from confirmation bias, where there manipulate any evidence to fit their viewpoint. The foxes produce more accurate predictions because of their willingness to approach any problem with an open mind. In reality, it can be difficult to shed all of our pre-conceived beliefs, but we can work to counter them by collecting information from as many sources as possible as fivethirtyeight does when constructing election predictions.

Unfortunately, in the real world as Silver points out, hedgehogs often are readily handed a bullhorn to broadcast their message. With tv news and the internet fracturing into two political sides with a massive no-man’s land in between, people still in the middle are overwhelmed by the fanatics at the extremes. When people tune into a news channel, they are not looking for objective coverage, they want to hear that their side is right. When it comes to predictions, they would rather listen to the hedgehog telling them their “team” will sweep the legislative branch than the fox with her wise forecasts detailing each race based on objective data. Thus, the majority of forecasters who appear on tv and in the news are hedgehogs who make bold claims and not foxes who make reasonable but less exciting predictions. Despite evidence that those who make more extreme predictions are the worst predictors, they get the majority of screen time. Nonetheless, this does not prevent us from being foxes in our day-to-day lives. In order to think like a fox, we need to shed our pre-conceived beliefs, collect information from diverse sources, listen to both sides of an argument, and make reasonable predictions that reflect the gray nature of the world.

Think Like A Bayesian

Before you are scared off by the strange sound of “Bayesian,” let me explain: Thomas Bayes was a statistician and minister in the 18th century known for formulating methods on how to update our beliefs about the world based on new evidence. When we approach any new prediction problem, we first need to form an initial estimate of the situation, called a prior. Let’s say we want to predict whether or not we will get a promotion at the start of the next quarter. It’s one month out, and because we have been more productive lately and are on good terms with our manager, we can put the initial odds at 50%. The idea behind Bayes’ theorem (the formal version of his ideas) is as we gather information related to the problem, we update our first estimate in accordance with the data. If our manager sends us an email praising our work, we might increase the probability of a promotion to 75%. As the date of the event grows closer, our prediction should converge on the true probability if we are incorporating all relevant information. Evidence is weighted according to how much it decreases our uncertainty. The prior is taken into account at each step, but as the amount of new information (observations) increases, the weight of the prior in the prediction decreases. Our final prediction is a combination of the initial estimate and the observed data.

We use Bayesian reasoning all the time in our daily lives. When watching a sports event, we have an idea at the outset what the end result will be, and as the game progresses, we update that estimate, until by the end, we can be 100% sure of the result. Likewise, the stock market rises and falls as a result of news because investors believe the information reveals something about the worth of a company. The Bayes’ point of view is in contrast to what is called the frequentist worldview, where the chance of something occurring is based only on the observed frequency of the event in past data. When forecasting the chance of snow on January 21, 2018 from two weeks out, the frequentist would predict the average chance of snow on that day and stick with that estimate until the day in question. As a Bayesian however, we would make an initial forecast, perhaps using the historical occurrence as a starting point, and then revise our estimate as the day got closer based on new information. Warmer than average weather might decrease our predicted probability for snow while a snowstorm approaching from the west would increase it. The Bayesian is naturally at an advantage for making predictions because she is constantly changing her beliefs in response to evidence.

Another critical aspect of Bayesian thinking is that predictions are expressed as a probability rather than a yes/no. Although most people tend to want a straight answer, we live in an uncertain world where a yes or no is not completely possible. Any prediction will have uncertainty, and when someone makes a prediction that the stock market will go up tomorrow, they are obscuring that there is actually a range of possible values the stock market could take. The most likely scenario may be an increase, but there will always be the chance of a decrease. Again, answering prediction problems with a range of values or a probability with likely not get us a spot on the news, but it will mean that we will on average be closer to the truth. Responsible data scientists and forecasters must communicate results with the attached uncertainty. Answering with a range of values should not be interpreted as a lack of confidence but rather a reflection of the shifting state of the world.

Think Like a (Basketball) Shooter

The optimal way to get good at anything is to fail at it many times and make adjustments. The best basketball shooters in the world fail about half the time, but they do not let that deter them from taking another shot. Each time they shoot and miss, they make minor adjustments, and then take another shot. While repeatedly failing may be difficult in some domains (it’s great that we invented airplane simulators to allow pilots the luxury of failing more than once), it is easy to implement in situations where the stakes are minor and we can receive immediate feedback. Predictions, at least on a personal level, fit these conditions perfectly. We can very easily (and safely) predict our weight a week from now, the score of our favorite team’s next game, or the optimal length of time to cook popcorn without suffering dire consequences. The world would have a severe shortage of meteorologists if everyone was held to perfect standards for predictions. Instead, every time we miss the mark, we examine the causes of our failure, and make adjustments we think will help us for the next time.

Anyone can fail often, but it’s about examining and using mistakes to improve our performance. Silver’s team infamously failed at predicting the 2016 elections, and afterwards, they took a long look at what they had done wrong. Fivethirtyeight does not conduct polls, but aggregates a wide variety, weighting each one based on historical performance. In the case of 2016, it was clear that the polls were systematically biased towards Hillary Clinton which Silver and his team will undoubtedly take that into account for the next round of elections. Silver also documents a baseball model that made absolutely ridiculous predictions, which, on inspection was caused by a single mis-typed letter. Had the creators dumped the model after the failures, their development time would have been wasted, but they had the wisdom to keep making new predictions and mistakes to isolate the problem. Thinking like a shooter involves repeating the shoot — miss — adjust — shoot cycle continuously, improving performance with each iteration.

Losing the Signal in the Noise

There are limitations to any prediction model as Silver points out throughout the book. Humans have a tendency to see patterns where there are none, noticeably in the case of chance correlations. The Super Bowl indicator, which is supposed to predict the performance of the stock market based on which league wins the Super Bowl has been correct 40 out of the past 50 years. However, there is no macroeconomic effect whereby the winner of the NFL championship influences the markets and it is only a surprising correlation. Moreover, some systems, such as the weather, are extremely sensitive to initial conditions (these fall under the intriguing field of chaos theory). One minor change at the start of a weather simulation can lead to drastically different forecasts, which is why predicting the weather and climate is notoriously difficult. Nevertheless, thanks to many failures over time, weather prediction has improved remarkably in the past several decades. Weather services create hundreds of billions of dollars in value due to reduced damages from storms because of storm warnings and better harvests due to climate predictions. Although we normally tend to prefer a simple model, sometimes we need a massive, complicated model for accurate predictions.

Another type of problem where predictions fail is out-of-sample situations. These are situations that have never before occurred in our data and therefore are nearly impossible to see coming. Consider a pilot who has made the flight from Houston to New York 800 times without incident in clear weather. On his next flight, a massive hurricane is impacting the East Coast and the airline must decide weather or not to cancel his flight. The pilot argues that he has never crashed before and therefore there is no chance of him running into trouble on this flight. However, this is an out-of-sample situation because each of his previous flights were made under perfect conditions. In this case the prudent measure would be to ground the flight because there is too much uncertainty. The attack on Pearl Harbor by the Japanese is often considered an out-of-sample event because no major attack by a foreign power of that scale had ever before occurred on US soil. Despite indications Japan was preparing for a large military operation, the US failed to predict the event. The signal was there but was ignored because an attack like Pearl Harbor had never before occurred.

In this age of blind belief in big data and complex models, Silver takes a needed critical view of predictions that rely only on statistics. Drawing on his past experience with baseball models, Silver explains how computers alone often cannot capture all of the intricacies in many human pursuits. He found that his own numbers-only based models under-performed those built using both human intuition and data. Likewise, after computers beat the best chess players for the first time, it was predicted that humans had lost all relevance in the game. However, subsequent open world championships that allowed any combination of people and computers to compete were won by teams that used both a program and humans with knowledge of the game. In most fields, it is a good bet that domain knowledge plus computer models will surpass predictions relying on either alone.

Recommendation

My method for judging a book is whether I could get all the relevant information in a five-minute summary, or if reading the entire book is worthwhile for the extra insights. In the case of The Signal and the Noise, I recommend reading the whole work. In this post, I outlined the basics of the book but skipped over nearly all the real-world examples Silver uses to illustrate each one of his points. The book covers many statistical concepts in an intuitive style, and for an academic-seeming topic such as predictions, was quite readable. Anyone aspiring to be a data scientist or who wants to examine forecasts skeptically should read this informative work on the methods of accurate predictions.