Ben Blatt of the Harvard Sports Analysis Collective used word frequency and Bayesian statistics to determine, well, nothing really except that Rick Reilly, Bill Simmons, and Jason Whitlock write like Reilly, Simmons, and Whitlock, respectively. But he found some cool stuff, too.
Blatt, a Harvard sophomore, selected a sampling of the three writers' work and applied the same analysis used in Inference and Disputed Authorship: The Federalist , a 1964 study by Frederick Mosteller and David L. Wallace. Mosteller and Wallace used word frequency and Bayesian inference to determine the authorship of the disputed Federalist papers.
Blatt's research is impressive, and not just because he actually knows what he's talking about, unlike me. He found the 10 words that best predict authorship by using a formula that provides "the probability that if you saw only one word from an article that the article belonged to a particular author":
Simmons writes a lot about Boston, movies, movie scenes, and "biggest" things. For Whitlock, it's quarterbacks and "Tiger's" fill-in-the-blank. Middle-aged Reilly seems to have gotten over his rather unnatural and perhaps Freudian teeth fixation, and is now focused on beer, moms, and golf. Fitting.
Blatt used an additional formula shown below (I won't pretend to be able to explain it) to correctly predict which author had written six different columns:
He found that "Simmons' most recent article had about a 1 in 50 tredecillion chance of being written by Reilly." I think we'd all agree to that on principle, but it's always refreshing to see something like that corroborated by the maths, too. Thanks, science!