The eight finalists for the paper contest at MIT's Sloan Conference—the most prestigious and most branded sports analytics conference in the U.S.—were announced this week. They're a fascinating mix of creative problem solving and impresario number-wrangling, and if you read through them, you'll see a lot of great data discussed. Data you probably can't look at.
Winning Sloan is a pretty big deal, no only because of the conference's name stature, but also because of the $20,000 grand prize. Here's a quick rundown on the ones that made the cut; all eight are worth a read:
- POINTWISE: Predicting Points and Valuing Decisions in Real Time with NBA Optical Tracking Data. Link to paper; we covered this one on Regressing yesterday.
- A Data-driven Method for In-game Decision Making in MLB. Link to paper.
- "Win at Home and Draw Away": Automatic Formation Analysis Highlighting the Differences in Home and Away Team Behaviors. Link to paper.
- What Does it Take to Call a Strike? Three Biases in Umpire Decision Making. Link to paper.
- The Three Dimensions of Rebounding. Link to paper.
- The Hot Hand: A New Approach to an Old "Fallacy". Link to paper.
- Automatically Recognizing On-Ball Screens. Link to paper.
- Can't Buy Much Love: Why money is not baseball's most valuable currency. Link to paper.
All of these sound great (and we'll be at the conference later this month as they're presented). The thing is, what allows them to be so good is a complicated issue for a scientific conference.
Michael Lopez—whose own paper on the NHL point system was accepted as a Sloan semi-finalist—breaks down some of the issues with the process over on his blog:
By my estimation, six of the eight data sets used are propriety. Sloan is not sponsoring a research conference as much as it's sponsoring a data access contest. ... So why does this matter? It means that six of the eight papers are not reproducible. Specifically, the best statisticians and sports researchers in the country could not know for sure if the paper's findings are correct, simply because the data is private. Who double checks SportVU's data? Who can double check Goldsberry and his colleagues' work? The answer is no one.
Reproducibility is a cornerstone of good research, and to have six out of eight finalists use proprietary data is a big deal. It also looks like three of the papers look like they were presented at previous conferences, and there are some concerns that some of the papers failed to acknowledge relevant concurrent research. Go check out his post for more details.