Tuesday, November 28, 2006

Applied Mathematics

A good article in about the perils of mass-screenings (for security, disease, etc.) appears in Slate today. It points out the proverbial sharpness of the other edge of the Sword of Inference (please don't strain yourself going after that metaphor. It's not worth it). There are perils trying to make discrete inferences based on larger trends and there are perils trying to apply a trend as a discrete principle to large sets of data.

I took one applied math course in college, and in it I learned a bunch of things I have never since applied to anything, anywhere, ever. I really liked the professor, though; his name was Dr. Elderkin and he had these enormous hands that he would flap open and closed while he was lecturing, creating gale force winds that blew chalk dust around the room. Mrs. T.G. recently pointed out that I do this myself sometimes, so apparently the habit had quite an effect on me.

One of the useful things I learned from Dr. Elderkin is why screening for, e.g., diseases across populations is counter-productive and makes for bad social policy. Here's an example of the painfully stretched metaphor I tried to construct above.
  • Say the test for HIV antibodies in the blood test correctly identifies the presence of the antibodies 99% of the time, and 99% of the time it will correctly tell an uninfected person that he or she is not infected. Let's further guess that one million people in the US are infected with HIV (I'm making all these numbers up, but they're reasonably close to the actual numbers).
  • We test all adults--say, 100 million people, for HIV, and again we'll estimate that 1 million people are actually HIV positive.
  • The test is 99% effective, so (.99 * 1,000,000 = ) 990,000 HIV positive people learn that they are HIV positive. But it also gives a false positive 1% of the time, so of the remaining population, (.01 * 99,000,000 = ) 990,000 are given false positive diagnoses. That's 1,980,000 positive results, half of which are wrong.
  • Your HIV test, which is quite accurate for the individual, turns out to be only 50% accurate across an entire population.
In actuality, the HIV test is only something like 96% accurate, so the results aren't even going to be as accurate as my example. Moral of the story? Applied math: not generally applicable.

Next: inference vis à vis implication!


Sam said...

OK, so I know that's actually some very simple math, but still, that sort of blows my mind.

Lingual Mania(h) said...

so in one of my other lives, I get a masters in public health in which math like this is essential and I go out and save the world by intelligently chanelling charitable contributions from multiple sources, as I have most likely explained to you in the past. The point being, the only thing holding me back from this life is math like this. Shoot.

The Tetrast said...

Your HIV test, which is quite accurate for the individual, turns out to be only 50% accurate across an entire population.

That seems wrong. Sometimes probability problems work out counterintuitively, but in this case it seems to me that either--

A: 1% among the test's positives are false, with one false positive for 99 true positives, and therefore, if 1% of a population actually have HIV, then there'll be 10,000 false positives for every 990,000 true positives -- and
* testing positive will be reliable for 99% of all positives, and
* an individual's testing positive means a 99% chance of actually having HIV


B: 1% among the test's total results are false positives, which means, in a test population of 100mn in which 1mn actually have HIV,
* 990,000 positives true for the right reasons,
* 10,000 positives true even though the test would have given a false positive to those without HIV,
* 1,000,000 false positives.
(You didn't mention false negatives, so I'll leave them out of the hypothetical scenario).
And in this scenario
* the test will be true for only 50% of all 2mn positives, and
* an individual's testing positive means a merely 50% chance of his/her actually having HIV.

Periapse said...

I believe tetrast's second scenario is more accurate. To get just a bit formal here, the accuracy of tests are usually expressed in terms of two conditional probabilities (here I use the numbers from the original post):

False positive p(TP | NHIV) = 0.01 (The probability of testing positive given no actual HIV seroconversion).

False negative p(TN | HIV) = 0.01
(probability of testing negative given HIV)

Additionally we have p(HIV) = 0.01
(populationwide probability of having HIV), and we also have p(TP)= 0.0198 (the probability that a test from the population will be positive).

What we're interested in is p(NHIV|TP), the probability of *not* having HIV even though you have a positive test result. Applying Bayes theorem gives:

p(NHIV|TP) = p(TP|NHIV)*p(NHIV)/p(TP) = (0.01)*(1-0.01)/(0.0198) = 0.50

Thus a positive test result only means a 50% chance of actually having HIV antibodies. The test is *not* accurate on either an individual or population-wide scale.