Thursday, March 30, 2006

But They Don't Lie

The subject of the damned lies of statistics is something I've been meaning to blog about forever, and this post would be, well, untimely, except that Liz has just posted a response to reading Freakonomics. Somewhere in this book, apparently (I haven't read it, just a variety of summaries of it), appears the old saw, "Numbers don't lie."

In the vast wide history of time, everyone who has ever uttered or written this phrase was, in fact, lying. Liz says this:
Not only do numbers lie, but they actual exist in a natural state of lying. It's not intentional. After all, numbers aren't actually sentient and making the choice to lie. I would also argue that the purveyors of numbers are (generally) not intending to lie, but are unable to gather all the information necessary to force the numbers to tell the truth.
While I'm writing a critique of Liz's argument, I agree pretty much straight up with Liz's larger point about numbers, right up to the a priori of the last sentence I've excerpted here. Numbers don't lie. Numbers don't tell the truth, either.

A quick quiz for you: name the statistical method (or methods) used to show cause (vis à vis effect). That is to say, given an observed phenomenon X (e.g. a drop in crime rates), how do you mathematically demonstrate that X was caused by other phenomenon Y (e.g. the availability of abortions to poor women during the late 70's and 80's, one of the arguments made in Freakonomics)? If you can't remember the exact method, you're probably at least aware that it's a difficult thing to calculate. Mull that over for a second.

Lying is a function of language. Numbers don't lie because they don't mean. Meaning (and lying, and misrepresenting, and misconstruing, and so on) occur when somebody tries to take the result of an equation and meld it to a result in the real world--that is, when somebody takes numbers and tries to model something with them. Some of these models work quite well (Calculus to the motion of accelerating bodies, e.g.) and some of them work for crap (attempts to model the weather or the stock market). But even in the models that are extremely good at making predictions there's nothing that inherently ties it to the phenomenon it models. The planets orbitted the sun before Calculus was invented, and they'll continue to do so if it's ever forgotten.

So how do you mathematically demonstrate that a drop in crime was due to the availability of abortions to poor women and not to, say, better policing methods? You can't. It was a trick question (ha ha on you). There is no way to demonstrate cause mathematically. Cause and effect is a narrative interpretation of phenomena that arise together (were I Buddhist I'd just come right out and say there's no such thing as cause and effect, but I'm not there yet). You can demonstrate correlation until the cows come home, but any more than that is subject to interpretation. What Freakonomics demonstrated (apparently--I haven't read it) is that a mathematical correlation exists between fall in crime rates and availability of abortions, and that this correlation is stronger than the one between falling crime rates and new policing methods. Correlation, though, is an extremely tricky thing to interpret. There is, for instance, a very high general correlation between shoe size and IQ scores, but nobody out there is claiming one causes the other (you would, for instance, find the same correlation between age and IQ scores and age and shoe size). There was also that rather infamous study a few years back that claimed that race and IQ score were linked. Now, this correlation demonstrably exists. African Americans do, on average, have lower IQ scores than white Americans. Here are some other correlations that also demonstrably exist: poverty and lower IQ. Lack of access to health care and lower IQ. Lower levels or quality of education and lower IQ. What is to be learned from this? Probably a lot more about the authors of The Bell Curve than about anything having to do with genetic factors and IQ. Of course, that's just my interpretation.

Next: Less ranting!
Tags: ,

1 comment:

Sam said...

It's simple: to get from correlation to causation (assuming you believe in the latter) you've got to have an argument for how or why x caused why. You have to have a theory. And this means that only something beyond statistics or numbers can allow you to move from correlation to causation. And, in truh, any good statistics class should teach this point (or a similarl one). Alas, most members of the media never take good statistics classes, or any statistics classes. Do they even take classes?