Wednesday, April 12, 2006

Utterly Predictable Sequel to the Inevitable Entry About Baseball

It's a week into baseball season. Everything is just better when it's baseball season--it's suddenly getting warmer and the days are longer, and, being that I live on the West Coast, when I arrive at work somebody somewhere is already playing a game that I can listen to online. There are games all day. I go home at night and there is baseball being played almost until I go to bed. So here's one more baseball/OaO-themed post, and then I'll move on. For awhile, anyway.

Last time I referenced some of the ways that baseball is being analyzed and also statistics versus narrative. One offshoot of these statistical studies, not just in baseball, but in sports in general, was a series of statistical studies about athletes being "in the zone," or "having the hot hand." Everyone in sports, and pretty much in life, recognizes this phenomenon: sometimes players just seem to be "on," everything they swing at gets hit hard somewhere, or they're striking everybody out, or whatever. However, if you're a statistician (or read this blog), you know that actual, real-life randomness has this creepy tendency to look not-at-all random--you flip a coin 32 times, you are more than likely to find a streak of five heads or five tails somewhere in there, for instance.

(You could, of course, immediately conclude that "being in the zone" is a post-hoc narrative interpretation of random statistical phenomena, as statisical studies have generally shown, so that of course it doesn't exist, but then of course it also does exist because of narrative. Then you could, you know, stop reading this post, and also this blog, forever, because as you've realized by now I pretty much only ever say one thing. But then what if I suddenly switch things up on you, as I'm going to do at the end of this post? Then you'd miss out.)

So, there was Ted Williams. Boston Red Sox 1939-1960, easily one of the top five hitters of all time; Bostonians, tired of hearing about that freakin' guy they sold to the Yankees in 1918, frequently argue that he was number one. In 1957, Ted Williams set the major league record by reaching base in 16 consecutive plate appearance. This is about as hot as you can get--an average baseball player will make an out somewhere between 6 and 7 out of 10 times he comes to bat. Ted Williams was no average major-leaguer, but still--for almost four straight games, they just couldn't get this guy out. Now say you want to figure out the odds of this happening...


The Stoat: Touché.

Say you're not me and you want to figure out the odds of this happening. You'd probably take Williams' lifetime on-base percentage (.482) (which is insanely high, by the way)--he had a 48.2% chance of reaching base safely in a given plate appearance. Two times in a row? 48.2% * 48.2% = 23.2%. Three times? 48.2% * 48.2% * 48.2% = 11.2%, so sixteen times in a row: (.482)16 = .0000085. 8.5 in a million, or a little less than one chance in 100,000. Seemingly, this was an amazing hot streak.

There's a catch, though (no, not the usual one. A different one). What I've just done is calculated (very roughly) the odds that any point in his career, you would see Ted Williams come to the plate and get on base the next 16 times in a row. During his career Ted Williams came to the plate 9,791 times. As we've learned because iPods read our minds, random data doesn't look random. If you've got a coin that flips tails 48.2% of the time, and you flip that coin 9,791 times, the percentage chance you'll find a streak of 12 tails in a row somewhere in there is about 95%. The percentage chance that you'll find a streak of 14 tails in a row somewhere is greater than 50%. The chance that you'll find a streak of 16 tails in a row is a little less than 8%.

Calvino: Yeah! I knew it, wait. I don't get it. Shouldn't the point have been that we were virtually certain to see such a streak somewhere in Ted Williams' career? What about random distributions and humans wanting to see patterns where there are none?

The Stoat: Well, the first point is that (statistically speaking, not Odds Are One speaking) the "real" odds of Ted Williams having the on-base streak that he had, which at first seemed incredibly small, actually turn out to not be that far-fetched at all.

Calvino: So, I mean, other than the fact that it happened and so the odds are one, what are you saying? That the odds are really 2 in 25? Or that the major league record for consecutive plate appearances getting on base should really be 12, or maybe 14, but because of baseball and narrative, it's 16?

The Stoat: Well, another way to think about it is that if you had a splinter of Ted Williamses, say 13 of them, one of them would likely have such a streak. There haven't been 13 Ted Williamses, there's only been one, but there have been thousands of people who have hit a major league baseball. Even the very very good ones had a smaller chance of randomly getting on base 16 times in a row, but given all of them together, it's not surprising that such a streak exists. It happens to belong to the one of them that has the highest career on-base percentage, but to me that's the Odds-Are-One part of it. We talk about it (where we = "baseball dorks") because it belongs to such a hitter (rather than John Olerud or Barry Bonds, who are in second with 15 plate-appearance streaks. John Olerud was a fine player, but is a marginal Hall-of-Famer. Barry Bonds is, well, still to be judged by history).

Calvino: So, the statisticians are right. There's no such thing as being "in the zone." It can all be explained by the vagaries of random chance.

The Stoat: No.

Calvino: No?

The Stoat: Not quite all of it. There's still Joltin' Joe.

There's still Joe DiMaggio. In 1941, Joltin' Joe hit safely in 56 straight major league games. Joe D. hit .325 for his career, and had 6821 at bats in 1736 games played. With those numbers he had about a 78% chance of getting a hit during a game. With random statistical distribution, a streak like his 56 gamer would show up in his career .24% of the time. Not 24 in 100, 24 in 10000. Given the actions of random chance, in 400 careers like DiMaggio's, you'd expect to find one streak. There haven't been 400 careers like DiMaggio's in baseball history. There haven't been 10.

And here are a couple more things, strictly for narrative: Joe DiMaggio also owns the minor-league record for consecutive-game hitting streaks, with 61. And in the game on July 17th, 1941, when DiMaggio finally went hitless after 56 games, he was twice retired due to great plays by Cleveland third baseman Ken Keltner on balls DiMaggio ripped down the third base line that might have otherwise gone for hits. He hit safely in the next 19 games after that.

Next: Same thing, different packaging!
Tags: , , ,

1 comment:

Rebecca said...

this post reminds me of why I love watching ESPN and SportsCenter. unlike you and your other 50%, I did not have any postive formative experiences surrounding sports or really sports watching. but I am a watcher of media, of reporting, of visual culture.

and I love Peter Gammons. because he has these commentaries on baseball that are clearly incisive, brilliant, and help the viewer to really get a handle on what's going on in the game.

and yet I never have any clue what the hell he's talking about.

it's awesome. each word, I could define for you. the totality? no idea. I love watching him. it's like magic, or listening to another language that you just barely know. fascinating.