But They Don't Lie

The subject of the damned lies of statistics is something I've been meaning to blog about forever, and this post would be, well, untimely, except that Liz has just posted a response to reading Freakonomics. Somewhere in this book, apparently (I haven't read it, just a variety of summaries of it), appears the old saw, "Numbers don't lie."

In the vast wide history of time, everyone who has ever uttered or written this phrase was, in fact, lying. Liz says this:
Not only do numbers lie, but they actual exist in a natural state of lying. It's not intentional. After all, numbers aren't actually sentient and making the choice to lie. I would also argue that the purveyors of numbers are (generally) not intending to lie, but are unable to gather all the information necessary to force the numbers to tell the truth.
While I'm writing a critique of Liz's argument, I agree pretty much straight up with Liz's larger point about numbers, right up to the a priori of the last sentence I've excerpted here. Numbers don't lie. Numbers don't tell the truth, either.

A quick quiz for you: name the statistical method (or methods) used to show cause (vis à vis effect). That is to say, given an observed phenomenon X (e.g. a drop in crime rates), how do you mathematically demonstrate that X was caused by other phenomenon Y (e.g. the availability of abortions to poor women during the late 70's and 80's, one of the arguments made in Freakonomics)? If you can't remember the exact method, you're probably at least aware that it's a difficult thing to calculate. Mull that over for a second.

Lying is a function of language. Numbers don't lie because they don't mean. Meaning (and lying, and misrepresenting, and misconstruing, and so on) occur when somebody tries to take the result of an equation and meld it to a result in the real world--that is, when somebody takes numbers and tries to model something with them. Some of these models work quite well (Calculus to the motion of accelerating bodies, e.g.) and some of them work for crap (attempts to model the weather or the stock market). But even in the models that are extremely good at making predictions there's nothing that inherently ties it to the phenomenon it models. The planets orbitted the sun before Calculus was invented, and they'll continue to do so if it's ever forgotten.

So how do you mathematically demonstrate that a drop in crime was due to the availability of abortions to poor women and not to, say, better policing methods? You can't. It was a trick question (ha ha on you). There is no way to demonstrate cause mathematically. Cause and effect is a narrative interpretation of phenomena that arise together (were I Buddhist I'd just come right out and say there's no such thing as cause and effect, but I'm not there yet). You can demonstrate correlation until the cows come home, but any more than that is subject to interpretation. What Freakonomics demonstrated (apparently--I haven't read it) is that a mathematical correlation exists between fall in crime rates and availability of abortions, and that this correlation is stronger than the one between falling crime rates and new policing methods. Correlation, though, is an extremely tricky thing to interpret. There is, for instance, a very high general correlation between shoe size and IQ scores, but nobody out there is claiming one causes the other (you would, for instance, find the same correlation between age and IQ scores and age and shoe size). There was also that rather infamous study a few years back that claimed that race and IQ score were linked. Now, this correlation demonstrably exists. African Americans do, on average, have lower IQ scores than white Americans. Here are some other correlations that also demonstrably exist: poverty and lower IQ. Lack of access to health care and lower IQ. Lower levels or quality of education and lower IQ. What is to be learned from this? Probably a lot more about the authors of The Bell Curve than about anything having to do with genetic factors and IQ. Of course, that's just my interpretation.

Nostalgia Happens When You Realize How Beautiful You Were Then

Something I realized just now.

Amazon S3

Another event that went down while I was out in the field was the launch of Amazon S3, a web service for storing data. Amazon S3 was known internally as "Amazon Storage" until about a week before launch, at which point the folks running it sat down with Jeff Bezos, who opined that soon we'd be competing with Google Storage and Yahoo Storage, and how thus would we ever successfully brand or trademark our particular service? Hence Amazon S3 (S3 = "Simple Storage Service") was born. And thus have you learned an important lesson about being the CEO of a major corporation.

Being able to get generic storage space somewhere out in the ether is not a particularly spectacular innovation. When the first iMac came out (which was, what, ten years ago at this point? Good lord) and there was general uproar over the fact that it lacked a floppy drive, some wonk offered everybody with an email account a couple of megs of storage space on his server so that they wouldn't need one. L. backs up her data by attaching .doc files to emails and sending them to herself at her Yahoo mail account, being that they now provide a Gigabyte of storage for your mail. I suspect this practice is quite widespread. There are, though, two actual innovations with Amazon S3, one lesser and one greater: the lesser one is that this is an enterprise level service (which is code for it being reliable and available enough for Amazon's own use--we're drinking our own Kool-Aid, as it were). The greater innovation here is that we're now able to meter the usage and charge for it (Capitalism: it's what's for dinner).

S3 was slashdotted (here 'slashdot' appears in its past participle form, meaning "to post an item for comment on") a day or so after it launched, and 99% of the posts imagined it as an ethereal backup for ones personal data. You encrypt it (and here I'll make my little pitch again: any electronic data that you would mind if everyone in the world were to read, you should be encrypting. This, sadly, now includes stuff you think is only ever going to be sitting on your harddrive. Don't ask why. Just do it) and send it to S3, where it's reliably stored and available from anywhere. Even with a superfast internet connection, retrieval times aren't going to approach the speed of, e.g., a firewire drive, but if you're just storing and retrieving once in awhile, it works fine. At $0.15 a Gig per month, and $0.20 a Gig for bandwidth (to store or retrieve data), it's a pretty reasonable price for a highly reliable archive. But personal (or even business) archive storage like this isn't the point of S3.

If you're a regular OaO reader and pay very close attention to my postings on things Web 2.0 (hi Sam), you might notice that this is one more piece to the puzzle I laid out six months ago, just after I joined Amazon Web Services. S3 is a virtual harddrive. One use for such a virtual harddrive is to store your data. But in the brave new world of Web 2.0, it's also a data store and information server for this brave new virtual computer running brave new virtual software. The virtual computer is not a machine (hence the word 'virtual'), it's a distributed network of computers that are all somewhere out in the world, and the software isn't a piece of code that's loaded onto a machine, it's tens or hundreds or thousands of distributed functions that, again, run on arbitrary machines somewhere.

So, possibly you're not as excited about this as I. Maybe it really isn't that cool. It might just end up being a place to store your data, that's all, the end. Or it might not.

Our love is like the border between Greece and Albania

I've returned from vacation and I'm sitting here, on this Tuesday morning, with twenty-thousand things to post on, but a dirth of ideas on how to winnow them down and present them. There was, for instance, this interview with Edward Wilson on the topics surrounding evolution v. religion. Or this complete mind-fuck, a quantum computer program that can solve a problem without ever running. Perhaps the emerging (or, for those who are not upper-middle class and white, ever-present), battle for access to birth control that Rebecca's covering is more your cup of tea. And then there's the alternative medicine seminar I went to the weekend before last, about which I have long been meaning to post something.

Surely I can tie all those threads together here in one neat paragraph, then go fetch some lunch and get back to work...wait...thinking...nope, can't do it today. Instead what I've got is this overarching thing that's been on my mind lately--something about the massive distributed system that makes up the entirety of my being, or your being. I, you, they out there in the world, are the sum of differentiated cells that know nothing of the being of which they are part. They respond to changes in pressure or light or chemicals or electrical charge that occur in their immediate environment; when a change occurs, the cell alters in some way. You and I are one giant collection of stimulus and response--at this level there is no happy or sad, or pain or relief, or love or hate. There are merely chemicals and patterns of electricity.

But I don't feel like a machine that's the sum of these things. Maybe the human condition exists because we have chemical signals telling us to survive and reproduce and that really everything we do is part of the strategy to make sure that our genetic material is propagated--red in tooth and claw and all that. Maybe everything else is, for lack of a better word, narrative that we make up to convince ourselves otherwise. Maybe religion is a narrative created to fill in that gap between the massive network of cells and the resulting experience that occurs when you are that massive network of cells. Maybe all of that. But then, and I don't know how it is for you, but I don't feel that way.

Someone Still Loves You Boris Yeltsin a rocking band I have just discovered because of Pandora, a.k.a. "the music genome project." I am probably the last person on earth to try this idea of custom-tailored internet radio, but now I have, and I am here to tell you: Dude, it's fucking awesome. For the last two days I've been sitting at work meandering through a playlist that I started by telling it I liked Neil Finn and going from there. Subsequently hearing songs by 20 different bands that I a) had never in my life heard of, and b) freaking loved is only one of the the cool things about it. There's also the slow drift through genres that it has done based on whether I give particular songs the thumbs up or thumbs down--yesterday I was in kind of a wispy, folk-influenced kind of place and so after five songs of feedback it was playing this steady stream of Elliott Smith-y, Iron & Wine-y (and the aforementioned 20 folks of whom I'd never previously heard) kind of stuff, and today I'm in a little more of a emo rock kind of place, which led me to thumbs-down some of the folky stuff, so I've been hearing such things as "I Am Warm + Powerful" by these Yeltsin-loving fellows from Springfield, Missouri (which, oddly, is where my mom's family used to live).

The third cool thing is this melding of the unfamiliar with the familiar (stuff I already own) and the ubiquitous (stuff one would hear on the radio), which creates all manner of new contexts for music. The next song Pandora played after the Yeltsins was Stone Temple Pilots' "Big Empty," a song that had previous drifted right over my consciousness, and right now I'm listening to a song by some sort of Poor-Man's John Mayer teen rocker which I'd immediately dismiss in the real world. But the fact that this music algorithm (which affiliates songs by the type of singer, instrumentation, underlying musical influence, etc.--my favorite criterion so far which it's inferred that I like is "a good dose of acoustic guitar pickin'") plays mostly indie stuff overcomes my built-in coolness filter and I end up listening to the songs with a different ear. I mean, okay, that teen-rock song still sucks, but today it gets a pass.

I suppose there's some comparison to be made between the iPod playlist which isn't really reading your thoughts, and this, which is trying to do just that. Like last time, I'm leaving that up to you. It's time you all stop riding on my coat tails and learn to synthesize for yourselves.

iPod Moments

  • 9, 3, 10, 1, 5, 8, 2, 6, 4, 7
  • 1, 4, 5, 4, 5, 4, 8, 10, 6, 7

One of these two strings of numbers was generated by a computer using a random number function. The other one was generated by a human (me). Look, ponder, and then file that fact away for a moment.

This morning on my stroll to work I had a nice little iPod Moment. An iPod Moment is when the iPod does something droll whilst it is shuffling randomly through its playlist. A sort of mundane one would be playing two songs by the same artist or from the same album back to back. A more interesting one is when it plays a song and then a cover of that song back to back. This morning's glimpse from Serendip was the selection of Jason Falkner's "Before My Heart Attacks," immediately followed by Falkner's former band, Jellyfish, doing "I Wanna Stay Home."

I am not alone in noticing this phenomenon. I believe, e.g., that the father of one OaO reader has actually called him up on occasion when the iPod does something witty with its random playlist. And there's apparently a small band of fanatics out there who believe that the iPod is reading their minds, or at the very least that the random shuffle algorithm that the iPod uses is not truly random. I have wondered aloud at times whether there's a little humanness built into the algorithm, especially after we hear three songs in a row from the same artists but from different albums.

Generating true randomness is indeed a notoriously difficult task for a computer, and it's caused a lot of people a lot of problems when sets of numbers that were thought to be random turned out not to be. What makes this a somewhat more obtuse problem is that things that truly are random tend to be full of instantly noticeable patterns. Go back to those numbers at the top. There's an absolute dead giveaway that one of them is machine generated and one of them is human generated (actually, there are two). In the first string there are no discernable patterns--the numbers aren't in order, they don't repeat, there's basically no connection between one number and the next. The second sequence has that unusual string of repeating 4s and 5s.

You are clever OaO readers, so you already know that the second pattern is the machine generated one (I didn't manipulate that second pattern in any way--I generated ten random numbers between 1 and 10 and those are the ten numbers I got. I was expecting some sort of pattern to show up somewhere, and indeed one did). The other list was carefully manipulated to remove everything that a human might think wasn't random (the first dead giveaway that it wasn't truly random) and also to use each number exactly once (the second dead giveaway -- a truly random number generating routine will generate a string like this about .03% of the time).

At this point I'd usually offer you some sort of narrative tag that ties all this together neatly, but today I'm all about randomness. So screw you. Make your own narrative.

