Adapted from the new book, How Not to Be Wrong: The Power of Mathematical Thinking.
Consider Spike Albrecht. The freshman guard for Michigan's men's basketball team, standing at just 5-11 and a bench player most of the season, wasn't expected to play a big role when the Wolverines faced Louisville in the 2013 NCAA final. But Albrecht made five straight shots, four of them three-pointers, in a ten-minute span in the first half, leading Michigan to a 10-point lead over the heavily favored Cardinals. He had what basketball fans call "the hot hand"—the apparent inability to miss a shot, no matter how great the distance or how fierce the defense.
Except there's supposed to be no such thing.
In 1985, in one of the most famous contemporary papers in cognitive psychology, Thomas Gilovich, Robert Vallone, and Amos Tversky (hereafter GVT) took aim at the hot hand. They obtained records of every shot taken by the 1980-81 Philadelphia 76ers in their 48 home games, and analyzed them statistically. If players tended towards hot streaks and cold streaks, you might expect a player to be more likely to hit a shot following a basket than a shot following a miss. And when GVT surveyed NBA fans, they found this theory had broad support; nine of ten fans agreed that a player is more likely to sink a shot when he's just hit two or three baskets in a row.
But nothing of the kind was going on in Philadelphia. Julius Erving, the great Dr. J, was a 52 percent shooter overall. After three straight baskets, a situation which you'd think might indicate Erving was hot, his percentage went down to 48 percent. And after three straight misses, his field goal percentage stayed right at 52 percent. For other players, like Darryl "Chocolate Thunder" Dawkins, the negative effect was even more extreme. After a hit, his overall 62 percent shooting percentage dipped to 57 percent; after a miss, it shot up to 73 percent, exactly the opposite of the fan predictions. (One possible explanation: A missed shot suggests Dawkins was facing effective defenders on the perimeter, inducing him to drive to the basket for one of his trademark backboard-shattering dunks, which he gave names like "In Your Face Disgrace" and "Turbo Sexophonic Delight.")
Does this mean there's no such thing as the hot hand? Not just yet. The hot hand, after all, isn't a general tendency for hits to follow hits and misses to follow misses. It's an evanescent thing, a brief possession by a superior basketball being that inhabits a player's body for a short glorious interval on the court, giving no warning of its arrival or departure. Spike Albrecht is Ray Allen for ten minutes, mercilessly raining down threes—then he's Spike Albrecht again. Can a statistical test see this? In principle, why not? GVT devised a clever way to check for these short intervals of unstoppability. They broke up each player's season into sequences of four shots each; so if Dr. J's sequence of hits and misses looked like:
Then the sequences would be:
HMHH, HMHM, MHHH, HMMH, …
GVT then counted how many of the sequences were "good" (3 or 4 hits), "moderate" (2 hits), or "bad" (0 or 1 hits) for each of the nine players in the study. And then they consider the results of what statisticians call the null hypothesis—namely, the hypothesis that there's no such thing as the hot hand. For a 50 percent shooter like Dr. J, all 16 possible sequences should then be equally likely. Five of those sequences are good, five are bad, and six are moderate.
Good: HHHH, MHHH, HMHH, HHMH, HHHM
Moderate: HHMM, HMHM, HMMH, MHHM, MHMH, MMHH
Bad: HMMM, MHMM, MMHM, MMMH, MMMM
For a 50 percent shooter like Dr. J, all 16 possible sequences should then be equally likely, because each shot is equally likely to be an H or an M. So you'd expect about 5/16, or 31.25 percent, of Dr. J's f our-shot sequences to be good, with 37.5 percent moderate and 31.25 percent bad.
But if Dr. J sometimes experienced the hot hand, you might expect a higher proportion of good sequences, contributed by those games where he just can't seem to miss. The more prone to hot and cold streaks you are, the more you're going to see HHHH and MMMM, and the less you're going to see HMHM.
The customary way scientists assess a hypothesis is a significance test, which asks, more or less: How likely would the actually observed outcome of the experiment be, were the null hypothesis to be correct?
So if the null hypothesis about the hot hand—that there is no such thing—were correct, would we be likely to see something like the results that were actually observed? And the answer turns out to be yes. The proportion of good, bad, and moderate sequences in the actual data is just about what chance would predict, any deviation falling well short of the statistically significant.
"If the present results are surprising," GVT write, "it is because of the robustness with which the erroneous belief in the "hot hand" is held by experienced and knowledgeable observers." And indeed, while their result, was quickly taken up as conventional wisdom by psychologists and economists, it has been slow to gain traction in the basketball world. This didn't faze Tversky, who relished a good fight. "I've been in a thousand arguments over this topic," he said I've won them all, and I've convinced no one."
A significance test is a scientific instrument, and like any other instrument, it has a certain degree of precision. If you make the test more sensitive—by increasing the size of the studied population, for example—you enable yourself see ever-smaller effects. That's the power of the method, but also its danger. The truth is, the null hypothesis is probably always false! When you drop a powerful drug into a patient's bloodstream, it's hard to believe the intervention literally has zero effect on the probability that the patient will develop esophageal cancer, or thrombosis, or bad breath. Each part of the body speaks to every other, in a complex feedback loop of influence and control. Everything you do either gives you cancer or prevents it. And in principle, if you carry out a powerful enough study, you can find out which it is. But those effects are usually so minuscule that they can be safely ignored. Just because we can detect them doesn't always mean they matter.
On the other side: If the test is less sensitive, it will declare the results of the experiment insignificant, whether or not there's really an effect. If you look at Mars with a research-grade telescope, you'll see moons; if you look with binoculars, you won't. But the moons are still there! And it's this problem that makes it so hard to pin down the hot hand.
GVT had answered only half the question: Namely, what if the null hypothesis were true, and there was no hot hand? Then, they say, the results would look very much like the ones observed in the real data.
But what if the null hypothesis is wrong? The hot hand, if it exists, is brief, and the effect, in strictly numerical terms, is small. The worst shooter in the league hits 40 percent of his shots and the best hits 60 percent; that's a big difference in basketball terms, but not so big statistically. What would the shot sequences look like, if the hot hand were real?
Computer scientists Kevin Korb and Michael Stillwell worked out exactly that in a 2003 paper. They generated simulations where a player's shooting percentage leaped up to 90 percent for two 10-shot "hot hand" intervals over the course of the trial, and ran these simulations more than a hundred times. In more than three-quarters of the trials, the significance test used by GVT reported that there was no reason to reject the null hypothesis—even though the null hypothesis was completely false. Their design was underpowered, destined to report the nonexistence of the hot hand whether or not the hot hand was real.
If you don't like simulations, consider reality. Not all teams are equal when it comes to preventing shots; last year, the stingy Indiana Pacers allowed opponents to make only 42 percent of their shots, while 47.6 percent of shots fell in against the Cleveland Cavaliers. So players really do have "hot spells"—namely, they're more likely to hit a shot when they're playing the Cavs. But this mild heat—maybe we should call it "the warm hand"—is something the tests used by Gilovich, Vallone, and Tversky aren't sensitive enough to feel.
The right question isn't, "Do basketball players sometimes temporarily get better or worse at making shots?"—the kind of yes/no question a significance test addresses. The right question is "How much does their ability vary with time, and to what extent can observers detect in real time whether a player is hot?" Here, the answer is surely "not as much as people think, and hardly at all." A recent study has shown that players who make the first of two free throws are slightly more likely to make the next one, but there's very little convincing evidence of any sizable hot hand in real-time gameplay, unless you count the subjective impressions of the players and coaches themselves. (A paper released just this February, does appear to find a small, measurable effect.)
The short life of the hot hand, which makes it so hard to disprove, makes it just as hard to reliably detect. Gilovich, Vallone, and Tversky are absolutely correct in their central contention that human beings are quick to detect patterns where they don't exist and to overestimate their strength where they do. Any regular hoops-watcher will routinely see one player or another sink five shots in a row. Most of the time, surely, this is some combination of indifferent defense, wise shot selection, or, most likely of all, plain good luck, not a sudden burst of basketball transcendence. Which means there's no reason to expect a guy who's just hit five in a row to be particularly likely to make the next one. Analyzing the performance of investment advisors presents the same problem. Whether there is such a thing as skill in investing, or whether differences between the performance of different funds are wholly due to luck has been a vexed, murky, unsettled question for years. But if there are investors with a temporary or permanent hot hand, they're rare, so rare they make little to no dent in the kind of statistics contemplated by GVT. A fund that's beaten the market five years running is vastly more likely to have been lucky than good. Past performance is no guarantee of future returns. If Michigan fans were counting on Spike Albrecht to carry the team all the way to a championship, they were badly disappointed; Albrecht missed every shot he took in the second half, and the Wolverines ended up losing by 6.
A 2009 study by John Huizinga and Sandy Weil suggests that it might be a good idea for players to disbelieve in the hot hand, even if it really exists! In a much larger data set than GVT's, they found a similar effect; after making a basket, players were less likely to succeed on their next shot. But Huizinga and Weil had records not only of shot success, but shot location. And that data showed a striking potential explanation; players who had just made a shot were more likely to take a more difficult shot on their next attempt. Yigal Attali, in 2013, found even more intriguing results along these lines. A player who made a layup was no more likely to shoot from distance than a player who just missed a layup. Layups are easy and shouldn't give the player a strong sense of being hot. But a player is much more likely to try a long shot after a three-point basket than after a three-point miss. In other words, the hot hand might "cancel itself out"—players, believing themselves to be hot, get overconfident and take shots they shouldn't.
Jordan Ellenberg is a professor of mathematics at the University of Wisconsin. He blogs at Quomodocumque, and you can buy his new book, How Not to Be Wrong: The Power of Mathematical Thinking here.
Image by Jim Cooke