Waiting For The Revolution At Soccer Analytics Bootcamp

Waiting For The Revolution At Soccer Analytics Bootcamp
Illustration: Benjamin Currie (G/O Media)
We may earn a commission from links on this page.

When the 2019 Champions League final between Liverpool and Tottenham kicked off, I wasn’t jammed into a sports bar downing beers with hundreds of other soccer-mad Americans, as I had originally planned to be. Instead, I was dead sober, watching the game in a lecture hall situated on the Columbia University campus in Manhattan, trying to keep my brain from being calcified by all the decimal points and low-resolution charts I’d been consuming for the last six hours.

I was at a soccer analytics bootcamp, which, when asked why I made a weekend trip to New York, was both difficult to explain and embarrassing to say out loud. If I’d wanted to make the journey sound as lofty as possible, I could have said that I was going to get a firsthand glimpse at the future of how soccer is consumed, analyzed, and possibly even played.

About 10 minutes into the game, Tottenham’s Moussa Sissoko collected the ball in Liverpool’s half and looked to be lining up a shot from 30 yards out. “Have it,” one of my classmates shouted ironically. Perhaps in that moment he was thinking the same thing I was: That a shot from that distance, as we’d learned earlier in the day, came with an expected goal value of around .02. String a few passes together from that position, though, and a team might be able to create a chance worth, say, 0.25 expected goals.

But Sissoko obliged, thumping a shot high and wide while everyone in the lecture hall laughed. Ted Knutson, the course’s organizer, looked up from his Twitter feed to join in on the fun. He then looked back down and typed away, scolding BT Sport commentator Jermaine Jenas:

Knutson is the co-founder, CEO, and public face of StatsBomb Services, an analytics company and a relative giant—“relative” being the operative word—in soccer’s nascent analytics community. He never played the sport. They didn’t even have it where he grew up, in Indiana farm country. And even when he did become a fan after the 1998 World Cup, the only soccer he could watch was a two-hour highlights show on Fox Sports World. But 20 years later, the former betting modeler is on an almost messianic mission to revolutionize the sport.

StatsBomb, which was founded in 2016, has worked with clubs in at least 12 leagues around the world, including all of Europe’s top five. It claims a year-over-year point total increase of 20 percent for its clientele, which spans from giants like PSG to literal startups like LAFC. Before StatsBomb, Knutson cut his teeth in house, first at Brentford in England’s second tier and then at Midtjylland in the Danish Superliga. The former is now a mainstay in the Championship despite a shoestring budget; the latter won its first ever league title during Knutson’s tenure.

Knutson’s operation is as well-marketed as it is impressive. Sky Sports, Financial Times, and SBNation are media partners, using StatsBomb’s data to broadcast or embed shiny graphics. Mike Goodman, whose bylines include The Ringer, The Athletic, Grantland, and ESPN, edits the firm’s in-house publication.

Knutson and co. have cultivated a public-facing image to match their on-field success, which explains why they could get people like me to make the trek to New York and pay $300 to attend their daylong seminar.

The attendees were all enthusiastic about soccer and data analysis in equal measure, but a few dozen people getting together to crunch numbers for a few hours is a far cry from a revolution. Meanwhile, analytics booms in baseball and basketball have not only filled front offices with quants and data-crunchers, but changed the very nature of how those sports are played. Soccer isn’t there quite yet, but if Knutson has his way, it will be soon.

Finding the bootcamp at Columbia’s massive 26-acre athletic complex turned out to be pretty simple. Amid a horde of eight-year-olds exiting a school bus and scampering toward the soccer fields, there was a slow trickle of 20-something men in backpacks walking into a ritzy administrative building. Those were my friends for the day.

Once inside, Knutson was even easier to spot. In what can only be described as an act of self-caricature, he arrived wearing a Naby Keïta Liverpool jersey. The Guinean midfielder, who had just finished a debut season in the Premier League in which he made 25 appearances and 16 starts, is a darling of the analytics community. To wear a Keïta jersey is to signal your deeper understanding of the game, much the same way wearing a Kevin Youkilis jersey in 2005 would have told people just how much you appreciated the often overlooked value of on-base percentage.

Naby Keïta
Naby Keïta
Photo: Nathan Stirk (Getty)

This class was the first one StatsBomb had hosted outside their home turf of Bath, England, and the start was a little shaky. Co-teacher and StatsBomb analyst Derrick Yam—who just landed himself a job in an NFL front office—handed out worksheets to the 30-odd people in the room. He then promptly cut the lights, and nobody could see the size-11 Arial font in front of them.

He showed us a video compilation of shots from this past season, and asked us to estimate the probability of a shot becoming a goal in various situations. Everyone in the classroom focused in, giggling at some point-blank misses, straining to determine the expected value of a far-off Harry Kane free kick. In one clip, Liverpool midfielder Jordan Henderson, who scored once all of last season, took aim from 30 yards out and shanked the ball into the stands. “And that’s Jordan Henderson,” Knutson quipped to a chorus of laughter.

The point of the exercise was to show how expected goals, or xG, work. If there’s one advanced statistic that’s come close to penetrating conversations had by casual soccer fans the same way WAR or OPS has in baseball, it’s xG. StatsBomb and a few similar companies have compiled and analyzed tens of thousands of shots to determine the probability of a goal being produced by a shot taken from any given location on the field. Knutson believes StatsBomb’s model is the most advanced, applying variables like the footedness of the shot, the positioning of the goalkeeper, and how many defenders are between the ball and the net.

xG and its opposite, xG allowed, or xGA, are at the core of soccer analytics, and they are intuitively easy to understand—a shot with a .01 xG is bad, but one with a .84 xG is very good. At its most useful, xG can tell you which teams are under- or overperforming. If a team keeps winning games 1-0 despite producing around 0.5 xG and 1.2 xGA per game, you might want to tab them as a squad that’s due for a regression in the table. At its most annoying, the stat can be wielded like a cudgel by fans who want to pick out a one-game xG map and argue about who actually “deserved” to win the game.

Almost everyone in that lecture hall already knew this stuff. The guy sitting next to me was the founder of a soccer analytics blog focused on the Brazilian league. One woman I talked to had a master’s degree in sports analytics. The collected knowledge of the attendees meant that the group wasn’t easily impressed, and a lot of them had specific questions about how exactly xG, which is far from being a perfect statistic, is calculated. One questioner got Knutson to reveal that player skill is not yet something that is taken into consideration. As Knutson told me the day before, most soccer people “don’t question assumptions the way the nerds do.”

After the initial segment on xG, we moved into offensive and defensive tactics. The basic question was, if a team wants to maximize xG and minimize xGA, how does it do so?

Knutson showed us a video of a Liverpool attack ending with a fatal Virgil van Dijk shot worth .01 xG. Then he diagramed a still of the play. A pass to James Milner, located closer to goal and more centrally, would raise the move’s expected value on its own. It would also unlock the kinds of passes into the box that could create a really good chance. It all came together with a swanky visualization that showed how a sub-0.1 xG chance can become one worth 0.7 with a throughball and a pass across goal.

Not everyone was convinced.

One man, the only attendee over 40 and by far the most engaged person at the bootcamp, pointed out, with some degree of truth, that Milner was sandwiched between two defenders. His exact words were, “He’s not open.” Knutson was unmoved. “I assure you he can make that pass,” he responded.

Knutson’s a pleasant and talkative guy, but he also has astounding confidence in his models and the conclusions they produce. Throughballs, cutbacks, and far-post runs are efficient offense. Long shots and high, deep crosses are not. Pressing prevents shots, and is therefore often good defense. Failing to close down—high up or in a defensive block—is not.

Sitting in that classroom, it was hard not to be swayed to his way of thinking. Knutson and Yam weren’t making assertions about how soccer works just by throwing a bunch of numbers and acronyms in our faces, but by showing us the specific plays that gave meaning to all those numbers and acronyms. We saw Eibar pressing the shit out of Barcelona to thwart the Catalonians’ attack before it could start. We saw Manchester City drawing defenders out wide with throughballs to open up the box for tic-tac-toe cutbacks. The logic was tight, and the video analysis was just as good, if not better, than anything you’d see from Gary Neville and Jamie Carragher on Sky Sports. Knutson and Yam know their stuff, and that’s why clubs hire them.

Not as many clubs as they’d probably like, though.

“This is a very niche field,” Ravi Ramineni, who heads up analytics for the Seattle Sounders, told me. He and Knutson estimate that only about 20 to 25 percent of clubs in Europe’s top five leagues have substantial data operations, or employ outside consulting companies like StatsBomb. Some clubs may be avoiding the analytics movement because they don’t believe in it, but for those in smaller leagues, it’s also a matter of cost. Several years ago, Knutson asked an MLS counterpart why he didn’t just fork over the $75,000 needed to purchase higher-quality data. Pretty simple: “That’s a player for us.”

Heading into it, I was a little worried that StatsBomb’s class would feel something like a pitch from a for-profit college, offering attendees the tools they’d need to enter into an exciting and lucrative industry that in reality didn’t yet fully exist. After all, peddling insider knowledge is a booming business in sports. Conferences, classes, and pay-to-see-the-numbers websites have popped up everywhere in the last decade. The world’s preeminent stats orgy–the Sloan Sports Analytics Conference at MIT—welcomed just over 100 attendees to its first iteration in 2007. This year, more than 3,500 people made the trip to Cambridge, and standard tickets went for $850 a pop.

At a hair under $300, StatsBomb’s bootcamp wasn’t cheap. But Knutson wasn’t giving out false promises about a lucrative new career, either. When I asked him whether the company was breaking even on the course, his face suggested the answer was somewhere between “no” and “barely.”

Both he and Ramineni were transparent about the difficulty of becoming analysts. They recommended holding down a full-time job and producing unpaid public work on the side, perhaps for several years, with no guarantees at the end of the tunnel. “It’s hard to put all your eggs in this basket,” Ramineni said.

Ramineni himself got bored of parsing through Bing databases while employed at Microsoft, and eventually started doing side work for the Sounders. He would occasionally meet his team contact at a coffee shop to pass along his findings, which eventually led to a full-time offer.

Knutson spent several years producing models for a gambling company called Pinnacle Sports. Diagnosed with cancer in 2012 and forced to take a month off of work for chemotherapy, he passed the time tinkering with soccer stats and writing. That month led to the first iteration of StatsBomb, a blog brimming with geeky tangents and witty breakdowns. Brentford and Midtjylland snapped him up in 2014 after fellow betting savant Matthew Benham purchased both clubs.

Knutson seems to be on a genuine mission to carve out a place for more analysts and statisticians in the sport. His work at Midtjylland, and his set-piece routines in particular, became a medium-size deal in certain circles. But, upon his departure from the club, he was disappointed to see that copycats hadn’t followed his lead into jobs at other clubs. “I came back out and nothing had changed,” he told me. He says that frustration is in part why he founded StatsBomb instead of going back in-house at a bigger club.

Knutson is convinced that we are close to an impending data explosion. “We’re at the very cusp of Moneyball for soccer,” he said, comparing the situation to the NBA a decade ago, when less than a dozen revolutionary analysts were snapped up by teams and went on to radically transform the game. He even has a slide deck outlining the three phases of data integration in sports.

Soccer’s movement through those three phases has been slower than any other major sport’s, with the possible exception of hockey. The path to the revolution that Knutson wants to see followed currently features some large obstacles. One is that the advanced stats themselves aren’t even all that advanced yet. We learned about xG, the efficiency of different types of attacks, the advantage of certain set-piece tactics, and how to evaluate goalkeepers during the bootcamp, but nearly all of the key performance metrics we were introduced to flagged on-ball actions in the final third of the field. The center of the pitch, where most would argue games are decided, remains a messy, confusing, model-less morass.

How to make more sense out of that morass is still anyone’s guess. Soccer is a uniquely dynamic sport, with 22 players in continuous free motion on a massive field. It presents far greater computational challenges than baseball, which was seemingly designed to be filtered through spreadsheets, and even basketball, which for all its associations with jazzy improvisation can still be broken down into discrete actions producing concrete results. Similar actions can be found in soccer, but they exist in a much wider and deeper sea, making their effects much more difficult to quantify.

Even the initial act of compiling the data is fraught. Events within a soccer game can’t just be logged algorithmically, and thus an operation like StatsBomb requires the work of scores of data collectors who manually sift through and flag relevant events. It’s easy to see how this might lead to problems with the data itself—was James Milner really open? Any moment in a soccer game can be interpreted differently based on who is doing the interpreting.

StatsBomb tries to circumvent this problem by attempting eliminate bias from the collection phase. The company employs a small army of 70 data collectors who log 3,500 events per match from an office building in Cairo. Their job isn’t watching soccer so much as marking down that Player A at one coordinate on the pitch passed to Player B at another coordinate, bypassing Player C from the opposing team. Every event gets the eyes of two different collectors, with a quality assurance manager on hand to break any ties. “We don’t want people that are either experts or are pretending to be experts to then create stuff people will argue with,” Knutson told me.

Instead, he wants them to create heaps of raw data that StatsBomb can then analyze and turn into conclusions. Pressures, goalkeeping errors, passes, runs–everything can and should be quantified. Because in the mind of the data scientist, numbers and models are more elegant, more “scalable,” and less biased than the trained eye.

Even after the data is compiled, there remains the challenge of how to legibly present it to potential customers. StatsBomb has had some success here, with their player radars becoming increasingly ubiquitous in analytical conversations about soccer. The company produces graphics in which various advanced stats are assigned sectors of a radial plot, and the amount that each section is or isn’t filled in is meant to provide a profile of what the player in question is good or bad at doing on the field. It’s hard to deny their visual appeal:

The radars are nice to look at, but they come with problems of their own. In 2017, StatsBomb indirectly caught some shit from Luke Bornn, the Sacramento Kings’ vice president of strategy and analytics, when he called out radar plots for being visually misleading. His critique set up Rockets GM Daryl Morey for an alley-oop, which Morey finished by declaring that “No analytics person worth his salt” uses radar plots:

To his credit, Knutson handled the situation good-naturedly, and published an article on StatsBomb in which he acknowledged the shortcomings of radar plots, but maintained that they are useful analytical tools in part because they are so visually appealing. He explained:

Most of the people ranting about radar charts on Twitter yesterday are pretty hardcore quants. To many of them, sacrificing precision for anything is strictly verboten. The problem with this perspective for me was: radars aren’t for you. Hell, radars aren’t even for me.

I work in the database, and my conclusions are largely drawn from that perspective. The minor inaccuracy issues of radars don’t affect my work. BUT I wanted to talk to a resistant public about soccer stats, and this enabled discussion. I needed to talk to coaches about skill sets and recruitment, and this was a vital way of bringing statistics into that discussion while comparing potential recruits to their own players.

As I designed them, radars exist to help you open the door with statistical novices, and from that perspective they have been wildly successful.

Even in 2017, football/soccer doesn’t have the volume of knowledgeable fans that basketball and baseball have in the U.S. We also don’t have coaches who are comfortable with almost any statistical discourse, although that is definitely changing in the last year.

Figuring out how to present their work to coaches, players, and executives in ways that the audience will find both understandable and enlightening is a problem that every sports quant has to solve, but doing so in soccer is particularly challenging. Baseball found its silver bullet in WAR, and the points-per-possession breakdowns of various basketball plays are highly intuitive. Soccer is still searching for the right vessels to carry its analytics movement forward. Good data needs to be collected and analyzed, but it also needs to be sold. The problem, as Knutson is all too aware, is that people don’t like to buy things that make them feel stupid or confused.

Those of us at the bootcamp were spared the watered-down analysis that might be included in a sales pitch to a gruff manager from England’s second tier. Everyone there was predisposed to thinking about soccer, or anything really, the same way Knutson does. The attendees mostly had backgrounds in data or tech. One of the guys with whom I ate lunch crunched numbers for Bloomberg in New York. The other left his job at PwC to do data ops for a Democratic presidential campaign. He doesn’t expect his candidate to make it past the third debate, but no matter. This stuff is taking over every industry, he pointed out.

The crowd was white-collar. And that’s no real coincidence. What a statistician does isn’t all that different from what a data wiz at Facebook might do (a Facebook intern was indeed there, by the way.) The datasets are different, but the processes and logic are the same.

And would-be statisticians don’t hold down just any full-time job before making the transition to soccer. They tend to hold down elite jobs. Ramineni was at Microsoft. Liverpool’s Ian Graham earned a doctorate in theoretical physics from Cambridge. Before becoming CEO of StatDNA, a data firm Arsenal purchased in 2012, Jaeson Rosenfeld was a McKinsey consultant for 14 years. Knutson told me that clubs aren’t competing with other clubs for employees, but with a who’s who of 21st century plutocrats, from tech giants to hedge funds.

Liverpool lifts the Champions League trophy
Liverpool lifts the Champions League trophy
Photo: Laurence Griffiths (Getty)

It shouldn’t come as much of a surprise, then, that the longer the bootcamp went on, the more it felt like an executive education course for Fortune 500 SVPs. Each module began with its own “learning outcomes” slide. Hell, the attendees could’ve been the male half of Bain & Company’s incoming analyst class.

The final session of the bootcamp was an abbreviated version of a typical StatsBomb opposition analysis primer. We examined a middling MLS club, tracking the shot values of its best strikers, its goalkeeper’s action charts, and its anomalous vulnerability on the right side of defense. It was no doubt some high-level scouting. It was also not much different from a SWOT analysis.

Beyond the mere aesthetics of the bootcamp, its narrow analytical focus and efficiency goals were also classically data-scientific. The toolbox for soccer analysts may be limited right now, but this course was all about optimizing performance using whatever numerical tools are available—“drawing conclusions off of incomplete information,” as Knutson put it.

It’s easy to see how quants could come off as condescending to coaches, players, and even scouts. “Data and video are the same thing,” if they’re doing their jobs correctly, Yam claimed. In other words, if the models are sound, they will neatly align with the conclusions that smart managers have already been coming to simply by watching the game closely. Managers bucking the numbers, then, are doing themselves a disservice. Knutson put it more bluntly: “There are lots of coaches who do not know how the game actually works.”

Parts of the bootcamp verged on paternalistic. Knutson called some coaches “relentless learners,” praising them for putting their trust in intelligent-sounding data gurus with glossy CVs. He loves to tell an anecdote about consulting on direct free kicks for a Champions League side. He explained the logic of positioning attackers next to the opposition wall in order to limit the goalkeeper’s reaction time. And he made sure to give his recommendations “gently,” as if the manager of this elite club needed special coddling to understand basic physics. Knutson believes the coach just wasn’t questioning his assumptions enough. (StatsBomb never got the contract, and the team continued to waste free kicks.)

Other moments were downright cringe-inducing. Knutson told attendees that broadcasters could leverage advanced statistics into increased revenue. Why? He reasoned that intelligent analysis speaks to educated viewers with the kind of disposable income advertisers lust over.

Knutson’s ultimate goal may be to revolutionize the sport, but in the meantime he has to worry about running a business. For now, third-party companies like his are major players in the soccer analytics movement. The consultant-client relationship allows people like Knutson to reach more clubs than they could in-house, but it also gives clubs the chance to keep them at arm’s length.

“The culture in the teams has not put a priority on saying ‘we need to do this in house and we’ll find an advantage,’” Ramineni said. He reasoned that clubs instead ask, “Why don’t we just pay X amount of money and they’ll give us what we want?” They are consultants, after all.

Of course, many teams still forgo analytics altogether. The issue is in part structural. Some top-tier leagues have agreements with data providers to provide raw numbers to all their clubs, but stats are tough to translate across leagues, and different providers collect data according to different glossaries and different standards.

Things get even messier at the team level. Clubs that have incorporated advanced stats most successfully are often stable organizations that get buy-in from top to bottom. Liverpool owner John Henry used an algorithm to become a self-made billionaire, and his club now has a world-class analytics department and a managerial staff that appears willing to consider the department’s recommendations.

Liverpool is not the norm. Soccer clubs can be volatile, with the on-field product sometimes masking—often times reflecting—internecine power struggles behind the scenes. Knutson said that StatsBomb will frequently start work with a club only to run into a skeptical manager or an indifferent scouting department after being retained.

But he’s still convinced stats will take off. He has an almost teleological view of sports—the data revolution happened in baseball, and basketball, and even football. “This sport is not different,” he says.

Knutson portrays soccer’s numerical backwardness as if American sports had a head start. They did, to an extent. But Opta Sports has been around since 2001, and economists and data scientists have been studying soccer for decades. All those clubs that have yet to buy into statistics as fully as Knutson would prefer can’t just be ignoring this stuff. They must have reasons why they don’t want to establish a dedicated department, or hire a consultant like StatsBomb. Or, as Knutson theorizes, “They will create reasons why.”

It’s possible that there’s an aesthetic component to the sport’s continued shyness. In Inverting the Pyramid, Jonathan Wilson’s bible on soccer tactics—literally the methods managers use to win matches—Wilson writes, “It is not even so simple, though, as to say that the ‘correct’ way of playing is the one that wins most often, because only the dourest of Gradgrinds would claim that success is measured merely in points and trophies; there must also be room for romance.” That may sound like pie-in-the-sky idealism, but stylistic concerns do still inform plenty of decisions at major clubs. This is a sport in which managers, sometimes to their great detriment, will adhere to visually pleasing, attack-minded philosophies even when their team’s talent level would seem to demand a more dour approach.

Knutson told me that managers usually know how to train players for specific styles of soccer, making it tough to integrate broad analytical mandates like “start pressing higher up the field.” It gets even tougher when those managers are fired and replaced by their footballing opposite on a seemingly yearly basis.

It’s more likely, however, that the work being produced by people like Knutson hasn’t yet reached a level of maturity necessary to force the sport’s hand. For as fascinating and innovative as the visualizations presented by StatsBomb’s client platform can be, a lot of the metrics it produces—“deep progressions” or “pressure regains,” for instance—feel like no more than the result of a smart person counting things. And as Yam himself admitted, a lot of the insights that StatsBomb’s analysis provides are already aligned with the sort of conclusions smart managers can come to through simple video scouting.

The success of any analytics movement in sports depends not on the data that already confirms established doctrines—in most cases this describes a majority of the work being done, particularly in the early stages of the movement—but in its ability to find the small but meaningful advantages on the margins. It’s not until all those little advantages are added up, and effectively leveraged by smart teams, that a true analytics revolution begins. Knutson thinks many of the obvious inefficiencies are already on their way out. But the number of clubs still wasting corner kicks, sending in dozens of hit-and-pray crosses, and failing to press seems to indicate that there’s still a fair bit of convincing left to be done.

In the end, getting to that point will probably have a lot less to do with radar plots and xG and a lot more to do with the success of clubs like Liverpool, whose analytics work is all happening in-house and behind closed doors. A consultation with StatsBomb might be able to convince a club that they are doing some things wrong, but watching Liverpool lift a trophy or two will do a lot more to convince them that wrong is something they can no longer afford to be.

Nate Wolf is a Washington, DC-based writer whose bylines include These Football Times, NBA Math, and BBall-Index.