I love baseball a lot. I'm not generally a sports fan, my love of baseball derives, as L. would call it, from narrative--the stuff that's born from your dad taking you to games when you're small and telling you about seeing Bob Gibson and Stan Musial, how he and his brother used to go to Browns games when the Cardinals were sold out, and so on. Lots of guys love baseball this way, but L. is the first and so far only woman I've ever met who loves baseball for this reason (in her case it was her grandmother who supplied the narrative). The other class of reasons I love it, also fairly common, are the myriad and constantly evolving techniques used to model it.
If you don't follow baseball, or just follow it for the narrative, things like OPS, VORP, EqA, DIPS, and UZR have no meaning for you, but they're part of the family of new statistics that attempt to determine how valuable an individual is to a team, in context with other players at the same or different positions, how much of what appears to be good or bad pitching is actually due to the defense playing behind the pitcher, and so on. If you weren't in the business of actually putting together a baseball team, why would you care about these things? Because you're a dork (and by "you're a," I mean, "I'm a"). Anyway, your or my dorkiness, or lack thereof, is not my theme for today. The theme is a particular stat. Somewhere in the vicinity of two years ago they came up with my all time favorite statistic of...um...all time. It's called WPA, for "Win Probability Added," and it's pretty much "The Odds Are One" applied to baseball.
What WPA does is look at every baseball game that has ever been played (or as large a sample as you can find. You can do this now that you've got computers) and, given a particular situation that's occuring in the particular game you're watching (e.g. the home team batting in the bottom of the ninth inning, down by a run, with two outs and a runner on second), look for all the similar situations that have ever occurred in baseball. Then it looks at what percentage of those games were won by the home team and what percentage were won by the away team. So let's say in the above situation, with a runner on and two outs, 75% of the time the away team won and 25% of the time the home team won. Now you go back to the game you're watching and see what happens. The batter strikes out? The probability of the visiting team winning is 100% because, well, they won. The pitcher gets a +.25 WPA because he took his team's chances of winning from .75 to 1, and the batter gets a -.25 WPA because by striking out he took his team's chances of winning from .25 to 0. If the batter hits a home run, he takes his team's chance of winning from .25 to 1, and gets +.75 WPA, and the pitcher gets a -.75 WPA. If, on the other hand, the batter singles and the runner scores, the score is tied and the game is still going, so you have to go back to all games where the score was tied with two outs in the bottom of the ninth and see what percentage of the time the home team won and what percentage of the time the away team won in order to figure out the WPA of that play. Say it's roughly 50/50. Then, since it went from 25/75, the hitter gets +.25 WPA and the pitcher gets -.25 WPA.
If you think this statistic sounds completely insane, you are correct. This statistic is completely insane. There are people out there, however, who are doing this for every single Major League Baseball game that's played. Here, for instance, is the graph from last night's Mariners/Angels game. The Mariners were always ahead in this game, so the line tracks steadily upwards, with sharp dips where the Angels got hits that caught them up. What makes WPA cool is that it quantifies the individual contributions of players and when they occur in the game--e.g. if you're a pitcher in a 0-0 game and you give up a run in the bottom of the 4th inning, it doesn't matter that much because your team has time to catch up. If you're a pitcher in a 0-0 game and you give up a run in the bottom of the 9th, it matters a lot, because you just lost the game. At the end of the game you can add up everybody's WPA and see who contributed, and how much, to the win or loss.
What makes WPA completely insane is that, as you already know because you read OaO, it's completely meaningless (and I mean this in the usual Odds Are One way--it's meaningless in the statistical sense. It tells us quite a lot, in this case, about how we think about baseball games). As a statistic it has a number of rather inherent flaws, the most obvious of which is that there are often important plays where it's not clear which individual should get the credit, or how you should score it (if a player hits what's otherwise going to be a home run to win the game, but a fielder climbs the wall and catches the ball for an out, do you penalize the pitcher for losing the game, but then reward the fielder for un-losing it? Or do you treat it as an ordinary out? Do you penalize the hitter for making an out even though it took an extraordinary play to get him out?). It's also not in any way predictive. It looks at games that have already happened and then uses them to quantify plays in the particular game...that have already happened. The only thing it predicts is that when the game is over, one team will have won (Win Probability 1) and one will have lost (Win Probability 0).
Win Probability Added is all about...wait for it...narrative. It takes each event in the game and decides how much it contributed to the final story of that game. As with actual narratives, big events that happen at the end tend to be much more important to the final story that little events at the beginning, and the major characters in the story will have very high WPA or low negative WPA, and the minor characters will have a WPA near 0. All of us who are compiling or studying these metrics are, yes, big dorks. But we're also just making narrative.
Next: Still More Narrative!
Tags: Baseball, Sabermetrics, Narrative