When Good Statistics Go Bad: The Case Against The Case Against R.A. Dickey

R.A. Dickey, objectively speaking, is the greatest human being in history. His knuckleball destroys cities and he climbed a huge fucking mountain. But should he win the Cy Young Award?

As of yesterday, he has 20 wins. That, along with his playing in a major market, will go a long way toward a Cy Young. Longer than it should, as any responsible analyst would tell you. Pitcher wins are a relic of the dark ages, an artifact of luck and run support and bullpen help. We'd be better off if they fell off the face of the earth.

But there's a good chance that writers who disdain voting for a Twenty-Game Winner could vote against Dickey on the basis of equally inappropriate statistics. When a knuckleballer is in the race, careless sabermetrics can create more problems than it solves.

One of the most popular sabermetric pitching stats among writers and fans is Fielding-Independent Pitching, or FIP. For most pitchers, under most circumstances, what happens when the ball is in play is out of their control, and it's subject to huge random variations. This is one of the great revolutionary findings of baseball analysis. The same pitcher, pitching the same way, might allow a .350 average on balls in play one year and .250 the next.

So FIP cuts out all the noise by flattening the results of balls in play to the league average (i.e. ignoring it altogether), which is usually somewhere around .300. The pitcher who has 35 percent of balls in play fall for hits is treated the same as one who has 25 percent.

Unfortunately, for a few pitchers, it removes a a great deal of signal along with the noise. There are some pitchers who do have the ability to induce bad contact at a significantly higher rate than their peers. But because batting average of balls in play—BABIP—is affected by so many factors outside of pitcher skill, it can take years to establish that someone can reliably get outs in a field more often than the average pitcher does.

Matt Cain is an example of career sample size bearing out the ability: He's posted a .264 BABIP in over 1,500 career innings, so his .261 figure in 2012 is right in line. Hitters really have trouble making good contact off him.

FIP, however, still treats his ability as luck. And so Fangraphs' Wins Above Replacement calculation, which uses FIP in combination with innings pitched, gives Cain a WAR of just 4.0 on the season even though his contribution was almost certainly greater. This has happened almost every year of his career.

Dickey might not have Cain's sample size, but his knuckleball is a Get Out of FIP Free card. The pitch has caused hitters to make weak contact throughout its history. Dickey's BABIP in 2012 is .273, 20 points below this year's league average—and right around his career average since he became a knuckleballer exclusively.

FIP, however, ignores this aspect of Dickey's success. So while Dickey has the fourth-best ERA in the majors, his FIP is only 11th. Consequently, his Fangraphs WAR is half a win lower than that of Gio Gonzalez. If this difference holds through the end of the season, some voters might use it as a rationale to give Gonzalez the award over Dickey. There are valid reasons for preferring Gonzalez, but FIP isn't one of them.

As an alternative, Baseball Reference's WAR uses a runs component that includes balls in play results, which makes it a better stat for this race. Dickey is seven-tenths of a win ahead of Gonzalez on that list—but four-tenths of a win behind overall leader Johnny Cueto. This is not an easy Cy Young race.

There are lots of other run and win estimators out there (Baseball Prospectus' Fair Run Average is a good one too) that have all kinds of advantages and disadvantages over their competitors. Behind each statistic is a different epistemology about what makes a pitcher the best. Is it how he should have done, theoretically, under fair and neutral conditions? Or is it how he did do, in the specific and arbitrary conditions he encountered? What is the best way to separate luck from skill? Is it even possible to make a distinction about who the best pitcher is?

These are the questions that make award discussions worth having. My worry is that unless voters are really interested in statistics, they'll pick a Cy Young winner, then pick the stat that makes him seem the most impressive. That's just as bad as picking the pitcher with the most wins. If I had a vote, I'd go with Dickey for his life story and a great knuckleball. But it's hard to fault anyone for picking Cueto, Gonzalez, or Clayton Kershaw. There are compelling reasons to choose any of them. The quality of an NL Cy Young vote this year depends not on the quality of the candidate chosen, but on the quality of the thought process behind the choice.