Thursday, April 20, 2006

Ephemeral Fame is Mine, and Other Thoughts

  • Going through my regular internet stops this morning whilst eating my bowl of cereal, I came across this article on The Hardball Times, which is the blog bible for statistical analysts of baseball, and the people who love them, everywhere. The article this morning is about Win Probability, and has linked back to my post on the subject. It's not the first time that somebody I don't know has acknowledged my existence, but on the other hand, dude: The Hardball Times! They're, like, real bloggers and stuff.

  • Sam and TenaciousMcD started up a thread (here, and then here) from the end of my last post, which addresses the question "if all is randomness, whither order? Whither morals? Whither justice?" (I'm paraphrasing. A lot). TenaciousMcD's question:
    How, for example, could any structure--even one we, creatures of nature that we are, superimpose--come from shere [sic] abject randomness, or how sense out of nonsense?
    Sam's response:
    When I read [TenaciousMcD's question], when I hear it, I don't think the question has the 'hook', for lack of a better word, that [Mr. Tenacious] expects it to. Perhaps I should be, but I'm not really all that bothered about how sense emerges from nonsense. Indeed, I think that's just how sense emerges –from nonsense. We produce meaning in the world, we read it into a world without meaning, and as Paul tells us continually, we construct narratives that give that meaning a place to rest, and a place from which to emerge. Structure emerges out of randomness (see Paul's many iPod posts); order comes from chaos. I'm OK with that.
    I take Sam's position in this, though I want to radicalize it a little. Or hone it. Or something. I once likened the way order seems to emerge from chaos, at least in the evolution of life on earth, as the action of an editor--he or she had no control whatsoever over what was produced, The Editor could only look at the results and, essentially, say yea or nay (pretty much everybody else is calling this, "Natural Selection" these days). In a truly random model of evolution, it seems like this mechanism is always glossed over. I think this is the "hook" to TMcD's question that Sam is looking for. It just seems so obvious that organisms or systems that are more "fit" would naturally survive, even if they emerge totally by chance, because...well, because they're more "fit." It's a hidden tautology. This is something I've long been meaning to blog about--maybe next post.

  • I'd also like to observe, as TMcD brings up apparently non-random constructions like morals and justice, that these things are also (literally) evolving, and that there's no particular reason to think that they too aren't borne out of an endless frenzy of random attempts to optimize the interactions of communities or societies or entire species.

  • Tarn asks:
    [S]uddenly I am hearing many voices -- both the familiar and alien -- and though their stories and their histories are, in detail, not alike, something tugs from deep within the seat of my capacity and nudges at me to learn a specific lesson common to All The Variety of Living....Does anyone understand this language I'm speaking? If so, please help me identify it. Seriously. I'm not joking. Anyone? Anyone?
    L. and I frequently ask this question, in a different form, usually when we keep running into the same obstacles in life over and over again. "Clearly," says one of us to the other, "There is something we are meant to be learning from this that we are not learning." In one instance, where we eventually got an answer, we had an ongoing (about three year) bout with trying to figure out where we were going to live, Seattle versus Baltimore (versus Florida. Don't ask). The first time, after an incredibly agonizing process, we decided Baltimore. Then, a year later, the decision came up again and we decided Seattle, but only temporarily, because we were going to move to Florida. Then it came time to move to Florida, and lo! The same incredibly agonizing decision came up again. Eventually, we figured out that we wanted, or the universe wanted us, to live in Seattle. This goes back to my earlier recurring point: life is hard when it's happening to you. To Tarn I have this to say: yes, I understand your language. Give me specifics, I can probably come up with an answer for you.

Next: More Bullet Points!

Friday, April 14, 2006


I'd be remiss if I didn't point out that the statistics I presented in my last post are far from canonical, and that there is in fact a fair amount (among, you know, anyone who cares about this at all) of disagreement about how likely was Joe DiMaggio's 56 game hit streak. My methodology was extremely simplistic, essentially treating each game in DiMaggio's career as a coin flip. Here, by contrast, is a study that tries to estimate how likely a 56 game hitting streak was to occur in a season like Joe DiMaggio's 1941. In this season he hit .357, but of course a season's worth of games is far fewer than a career's worth. They conclude that by random chance, in about 16,000 seasons like 1941, Joe DiMaggio would have a 56 game streak exactly once (by comparison, I said 24 in 10000 careers, DiMaggio's career was 15 years long, so my estimation was more or less 1 streak in 6250 seasons).

You might at this point note that all of this statisticalizing pretty much flies in the face of everything I profess to believe about statistics. The fact is, DiMaggio's 56 game streak already happened, so did his 61 game minor league streak, so did his .325 lifetime batting average. As such, my statistical analysis above is just as valid as the above linked folks who figured out the odds of hitting in 56 straight games in 1941, that is to say: not. The odds of these phenomena happening are one, because they happened. In the universe in which they didn't (which I don't believe exists, but whatever), this entry is an analysis of some entirely different phenomenon from baseball or somewhere else in the world of happenstance.

The nice part of Odds-are-One-ness is that it cuts through a lot of otherwise intractable problems--for instance, a direct corollary of it is that the disagreement about the statistical odds of DiMaggio's streak exists precisely because the question has no statistical answer. The more difficult part of OaO-ness is avoiding the resulting nihilist pitfall, e.g. saying that DiMaggio's streak was meaningless. It clearly wasn't: it happened 65 years ago and statisticians and baseball fans alike still talk about it. Once you figure out that you're posing the question in such a way as to be unanswerable--"What are the odds that Joe DiMaggio will hit in 56 consecutive games?" OaO-ness doesn't really direct you other than to say, "Mr. DiMaggio has already done this. Can I interest you in rephrasing your question such that I can give you a more informative answer?"

So what is it about things that, while they have in fact occurred, somehow seem unlikely to have occurred? It's the only thing that's keeping the Creationists going at this point, for instance. It seems to point to a hole in our model of history, of how events transpire, of how we got from there to here. I'm still working on the new model. I'm sure I'll have it for you any day now.

Next: Propositions! Lemmas! Corollaries!
Tags: , , ,

Wednesday, April 12, 2006

Utterly Predictable Sequel to the Inevitable Entry About Baseball

It's a week into baseball season. Everything is just better when it's baseball season--it's suddenly getting warmer and the days are longer, and, being that I live on the West Coast, when I arrive at work somebody somewhere is already playing a game that I can listen to online. There are games all day. I go home at night and there is baseball being played almost until I go to bed. So here's one more baseball/OaO-themed post, and then I'll move on. For awhile, anyway.

Last time I referenced some of the ways that baseball is being analyzed and also statistics versus narrative. One offshoot of these statistical studies, not just in baseball, but in sports in general, was a series of statistical studies about athletes being "in the zone," or "having the hot hand." Everyone in sports, and pretty much in life, recognizes this phenomenon: sometimes players just seem to be "on," everything they swing at gets hit hard somewhere, or they're striking everybody out, or whatever. However, if you're a statistician (or read this blog), you know that actual, real-life randomness has this creepy tendency to look not-at-all random--you flip a coin 32 times, you are more than likely to find a streak of five heads or five tails somewhere in there, for instance.

(You could, of course, immediately conclude that "being in the zone" is a post-hoc narrative interpretation of random statistical phenomena, as statisical studies have generally shown, so that of course it doesn't exist, but then of course it also does exist because of narrative. Then you could, you know, stop reading this post, and also this blog, forever, because as you've realized by now I pretty much only ever say one thing. But then what if I suddenly switch things up on you, as I'm going to do at the end of this post? Then you'd miss out.)

So, there was Ted Williams. Boston Red Sox 1939-1960, easily one of the top five hitters of all time; Bostonians, tired of hearing about that freakin' guy they sold to the Yankees in 1918, frequently argue that he was number one. In 1957, Ted Williams set the major league record by reaching base in 16 consecutive plate appearance. This is about as hot as you can get--an average baseball player will make an out somewhere between 6 and 7 out of 10 times he comes to bat. Ted Williams was no average major-leaguer, but still--for almost four straight games, they just couldn't get this guy out. Now say you want to figure out the odds of this happening...


The Stoat: Touché.

Say you're not me and you want to figure out the odds of this happening. You'd probably take Williams' lifetime on-base percentage (.482) (which is insanely high, by the way)--he had a 48.2% chance of reaching base safely in a given plate appearance. Two times in a row? 48.2% * 48.2% = 23.2%. Three times? 48.2% * 48.2% * 48.2% = 11.2%, so sixteen times in a row: (.482)16 = .0000085. 8.5 in a million, or a little less than one chance in 100,000. Seemingly, this was an amazing hot streak.

There's a catch, though (no, not the usual one. A different one). What I've just done is calculated (very roughly) the odds that any point in his career, you would see Ted Williams come to the plate and get on base the next 16 times in a row. During his career Ted Williams came to the plate 9,791 times. As we've learned because iPods read our minds, random data doesn't look random. If you've got a coin that flips tails 48.2% of the time, and you flip that coin 9,791 times, the percentage chance you'll find a streak of 12 tails in a row somewhere in there is about 95%. The percentage chance that you'll find a streak of 14 tails in a row somewhere is greater than 50%. The chance that you'll find a streak of 16 tails in a row is a little less than 8%.

Calvino: Yeah! I knew it, wait. I don't get it. Shouldn't the point have been that we were virtually certain to see such a streak somewhere in Ted Williams' career? What about random distributions and humans wanting to see patterns where there are none?

The Stoat: Well, the first point is that (statistically speaking, not Odds Are One speaking) the "real" odds of Ted Williams having the on-base streak that he had, which at first seemed incredibly small, actually turn out to not be that far-fetched at all.

Calvino: So, I mean, other than the fact that it happened and so the odds are one, what are you saying? That the odds are really 2 in 25? Or that the major league record for consecutive plate appearances getting on base should really be 12, or maybe 14, but because of baseball and narrative, it's 16?

The Stoat: Well, another way to think about it is that if you had a splinter of Ted Williamses, say 13 of them, one of them would likely have such a streak. There haven't been 13 Ted Williamses, there's only been one, but there have been thousands of people who have hit a major league baseball. Even the very very good ones had a smaller chance of randomly getting on base 16 times in a row, but given all of them together, it's not surprising that such a streak exists. It happens to belong to the one of them that has the highest career on-base percentage, but to me that's the Odds-Are-One part of it. We talk about it (where we = "baseball dorks") because it belongs to such a hitter (rather than John Olerud or Barry Bonds, who are in second with 15 plate-appearance streaks. John Olerud was a fine player, but is a marginal Hall-of-Famer. Barry Bonds is, well, still to be judged by history).

Calvino: So, the statisticians are right. There's no such thing as being "in the zone." It can all be explained by the vagaries of random chance.

The Stoat: No.

Calvino: No?

The Stoat: Not quite all of it. There's still Joltin' Joe.

There's still Joe DiMaggio. In 1941, Joltin' Joe hit safely in 56 straight major league games. Joe D. hit .325 for his career, and had 6821 at bats in 1736 games played. With those numbers he had about a 78% chance of getting a hit during a game. With random statistical distribution, a streak like his 56 gamer would show up in his career .24% of the time. Not 24 in 100, 24 in 10000. Given the actions of random chance, in 400 careers like DiMaggio's, you'd expect to find one streak. There haven't been 400 careers like DiMaggio's in baseball history. There haven't been 10.

And here are a couple more things, strictly for narrative: Joe DiMaggio also owns the minor-league record for consecutive-game hitting streaks, with 61. And in the game on July 17th, 1941, when DiMaggio finally went hitless after 56 games, he was twice retired due to great plays by Cleveland third baseman Ken Keltner on balls DiMaggio ripped down the third base line that might have otherwise gone for hits. He hit safely in the next 19 games after that.

Next: Same thing, different packaging!
Tags: , , ,

Monday, April 10, 2006

Right Now

Looking from the windows of our office building, which overlooks downtown Seattle, there is a line of people as wide as the four lane street they're walking down, which stretches as far as I can see. They are heading through Seattle's International District, presumably heading for the King County courthouse buildings. There are easily upwards of 50,000 people, maybe more like 100,000, and certainly more than at any point during the WTO protests here several years ago (Edit: Today's (4/11/2006) Seattle P-I pegs the number at 15,000+, so my crowd estimating skills are apparently not what they could be). They are marching in protest of the proposed zero-tolerance immigration policies in congress, and right now they are having more of an effect on national policy than all of the left's protesting, screaming, and hand-wringing over the last six years. This goes to show you something important. But I don't know what it is.

Next: Dammit. Dammit. Dammit.
Tags: , ,

Thursday, April 06, 2006

Inevitable Entry About Baseball

I love baseball a lot. I'm not generally a sports fan, my love of baseball derives, as L. would call it, from narrative--the stuff that's born from your dad taking you to games when you're small and telling you about seeing Bob Gibson and Stan Musial, how he and his brother used to go to Browns games when the Cardinals were sold out, and so on. Lots of guys love baseball this way, but L. is the first and so far only woman I've ever met who loves baseball for this reason (in her case it was her grandmother who supplied the narrative). The other class of reasons I love it, also fairly common, are the myriad and constantly evolving techniques used to model it.

If you don't follow baseball, or just follow it for the narrative, things like OPS, VORP, EqA, DIPS, and UZR have no meaning for you, but they're part of the family of new statistics that attempt to determine how valuable an individual is to a team, in context with other players at the same or different positions, how much of what appears to be good or bad pitching is actually due to the defense playing behind the pitcher, and so on. If you weren't in the business of actually putting together a baseball team, why would you care about these things? Because you're a dork (and by "you're a," I mean, "I'm a"). Anyway, your or my dorkiness, or lack thereof, is not my theme for today. The theme is a particular stat. Somewhere in the vicinity of two years ago they came up with my all time favorite statistic time. It's called WPA, for "Win Probability Added," and it's pretty much "The Odds Are One" applied to baseball.

What WPA does is look at every baseball game that has ever been played (or as large a sample as you can find. You can do this now that you've got computers) and, given a particular situation that's occuring in the particular game you're watching (e.g. the home team batting in the bottom of the ninth inning, down by a run, with two outs and a runner on second), look for all the similar situations that have ever occurred in baseball. Then it looks at what percentage of those games were won by the home team and what percentage were won by the away team. So let's say in the above situation, with a runner on and two outs, 75% of the time the away team won and 25% of the time the home team won. Now you go back to the game you're watching and see what happens. The batter strikes out? The probability of the visiting team winning is 100% because, well, they won. The pitcher gets a +.25 WPA because he took his team's chances of winning from .75 to 1, and the batter gets a -.25 WPA because by striking out he took his team's chances of winning from .25 to 0. If the batter hits a home run, he takes his team's chance of winning from .25 to 1, and gets +.75 WPA, and the pitcher gets a -.75 WPA. If, on the other hand, the batter singles and the runner scores, the score is tied and the game is still going, so you have to go back to all games where the score was tied with two outs in the bottom of the ninth and see what percentage of the time the home team won and what percentage of the time the away team won in order to figure out the WPA of that play. Say it's roughly 50/50. Then, since it went from 25/75, the hitter gets +.25 WPA and the pitcher gets -.25 WPA.

If you think this statistic sounds completely insane, you are correct. This statistic is completely insane. There are people out there, however, who are doing this for every single Major League Baseball game that's played. Here, for instance, is the graph from last night's Mariners/Angels game. The Mariners were always ahead in this game, so the line tracks steadily upwards, with sharp dips where the Angels got hits that caught them up. What makes WPA cool is that it quantifies the individual contributions of players and when they occur in the game--e.g. if you're a pitcher in a 0-0 game and you give up a run in the bottom of the 4th inning, it doesn't matter that much because your team has time to catch up. If you're a pitcher in a 0-0 game and you give up a run in the bottom of the 9th, it matters a lot, because you just lost the game. At the end of the game you can add up everybody's WPA and see who contributed, and how much, to the win or loss.

What makes WPA completely insane is that, as you already know because you read OaO, it's completely meaningless (and I mean this in the usual Odds Are One way--it's meaningless in the statistical sense. It tells us quite a lot, in this case, about how we think about baseball games). As a statistic it has a number of rather inherent flaws, the most obvious of which is that there are often important plays where it's not clear which individual should get the credit, or how you should score it (if a player hits what's otherwise going to be a home run to win the game, but a fielder climbs the wall and catches the ball for an out, do you penalize the pitcher for losing the game, but then reward the fielder for un-losing it? Or do you treat it as an ordinary out? Do you penalize the hitter for making an out even though it took an extraordinary play to get him out?). It's also not in any way predictive. It looks at games that have already happened and then uses them to quantify plays in the particular game...that have already happened. The only thing it predicts is that when the game is over, one team will have won (Win Probability 1) and one will have lost (Win Probability 0).

Win Probability Added is all about...wait for it...narrative. It takes each event in the game and decides how much it contributed to the final story of that game. As with actual narratives, big events that happen at the end tend to be much more important to the final story that little events at the beginning, and the major characters in the story will have very high WPA or low negative WPA, and the minor characters will have a WPA near 0. All of us who are compiling or studying these metrics are, yes, big dorks. But we're also just making narrative.

Next: Still More Narrative!
Tags: , ,

Tuesday, April 04, 2006

It's a "Metaphor"

What is blogging like? I mean this question to be framed in the recurring theme of modern life about which I blog from time to time. The paraphrase of this idea is something like this: "The things that seem new and different about the modern human condition are repetitions of the same old things in new context. The things that seem like they never change are, in fact, new." My argument for this is somewhat tenuous and perhaps tautological, but it is more or less that the Nature of Things is for them to continuously change, so that a phenomenon that seems constant is actually having to resist the sort of natural order of things. I realize that this doesn't entirely make sense. But, for the purposes of reading this post, take it as an a priori. After that you can dismiss it.

If ones operating theory is that something that seemed like an utterly new phenomenon was in fact the natural evolution of an old one, that the new one had simply grown out of the old one, what was blogging before it was blogging? Blogging is nigh ubiquitous now, some people get paid to do it, and I was noticing yesterday that an alarming amount of my information about the outside world is coming from blogs. It is, at least right now, relatively democratic--anybody with internet access can write one for free, anybody with internet access can read anybody else's blog (offer not valid in China). (nearly) Everybody's writing. (nearly) Everybody's reading. So I don't know, what's that like?

Next: next!
Tags: , ,

Sunday, April 02, 2006

Dreaming Brain Redux

L. sometimes wakes up in the middle of the night and says things to me. I am an extremely heavy sleeper, and otherwise don't wake up for almost anything, but when she starts talking, I'm immediately awake and (I think) relatively coherent, trying to figure out what she's saying. It usually tends, as a rule, to be kind of vaguely creepy. One night several months ago she woke up said, "Oh...oh my god, there are two cats in the bed with us." This, of course, was rather alarming, seeing as how we don't have any cats. Well, that, and her general sense of panic at suddenly waking up to find cats in our bed. Except, of course, there were no cats. I looked around to verify this fact and told her that in fact there were not any cats in the bed with us, and she said "Oh...okay," and went immediately back to sleep.

The other thing is, she doesn't remember that this happens. So, for instance, this blog entry will be the first she's heard of the incident from the other night that I'm about to relate. I woke up and she was saying this: "I understand now. The air from outside is coming in through the door of the basement." And then she immediately rolled over and went back to sleep.

I was lying there, now momentarily awake, and the weird thing was that I could pretty much piece together where her mind had been when it had decided to wake up and say that. I can't quite recall what it was now, but I remember thinking I ought to go downstairs and make sure the basement door was locked, but then realizing that this wasn't what she was telling me. She was telling me something else.

Next: More mysteries!