Dishonest statisticians can produce misleading data but some misleading data are the result of curious flukes of statistics. Simpson’s Paradox is one of these flukes:
Here’s an example. Baseball player Mickey has a better batting average than Babe in both April and May. So, in terms of batting average, Mickey is a better baseball player than Babe. Right?
It turns out that Babe’s combined batting average for April and May can be higher than Mickey’s. In fact, Mickey can have a better batting average than Babe every month of the baseball season and Babe may still be a better hitter. How? That’s Simpson’s Paradox… Robert J. Marks, “Simpson’s Paradox: Big Data Can Lie” at Mind Matters
One outcome of Simpson’s Paradox is that machines cannot replace statisticians in analysing results. A great deal depends on interpretation, as Marks shows. “Clustering remains largely an art.”
Follow UD News at Twitter!
Also by Robert J. Marks: Things Exist That Are Unknowable: A tutorial on Chaitin’s number
See also: Too Big to Fail Safe? (cautions on overuse of Big Data in medicine)
Machines cannot take over Fundamental constraints in nature make nonsense of the claim. Great sci-fi plots though.
16 Replies to “Simpson’s Paradox: Numbers are stranger than we think”
This struck me as fascinating. Thanks for posting this, News!
The basic idea here is taught in beginning stats in high school, although not necessarily this specific type of example. If Joe had a batting average of 400 in May and 300 in June, we can’t conclude he’s averaging 350. You can’t average the averages unless the total quantities in each sample are the same.
Isn’t this more about using statistics and math improperly than anything wrong with statistics and math?
The opening sentence gave it away, dinnit?
“Dishonest statisticians can produce misleading data but some misleading data are the result of curious flukes of statistics.”?
Yes. Stats and probability are tricky, even to those who understand them. People can unscrupulously bend the facts (like playing with the scale on the vertical axis of a graph), but I think the most common situation is that it takes diligence and some skill to properly interpret data.
Do you ever wonder why nobody ever attempts to interact with you?
Hazel@5, I agree. Obviously there are some who manipulate the stats to favor their own flavor of reality. But most simply don’t question the assumptions if the initial run supports their bias.
I wasn’t even talking about bias, BB. I’m just saying stats and probability are tricky, and even well-educated laypersons can draw erroneous, or at least fairly unsupported, conclusions from data. A good analyst (this is true in all fields) looks at their results and tentative conclusions, and then thinks about whether they can double-check their results by doing such things as looking at things with a different approach, or discussing conclusions with a colleague, or publishing where hard criticism is welcome.
Do you ever wonder why everyone always corrects what you spew? Do you ever tire of posting nonsense that is easily refuted?
You act as if you are saying something when in fact you are painfully ignorant- as are all of your socks, spearshake.
Is like a true Scotsman…
Hazel, I don’t disagree, but my point is that researchers, being human, are going to be less diligent examining stats that support their biases (sub consciously) than they will data that opposes it.
Most bias in science is “innocent”, not nefarious.
Yes, BB, I agree that people are prone to what you say, but a good, conscientious analyst knows that and does as I suggested in order to be as unbiased as possible. Despite the cynicism, and bias against analysts, that some have, I think most people in most jobs work to do a good, accurate job of analyzing data.
But analyzing data takes judgment as well as skill – it’s an art and a science, so to speak, which, as I said, is why people share their perspective with others to balance out the individual perspective that each person has.
Hazel@12, I don’t disagree, but my only point is that even the most conscientious of researchers is subconsciously affected by their bias. In hindsight, I know I was. So is KF, you, Andrew, and everyone else. ET is ruled by his. 🙂 Anyone who says they are not is, with respect, delusional.
Anyone who uses statistics alone is already lost where science is concerned. But then again Brother Acartia wouldn’t understand that.
David Berlinski once wrote:
Climate change is statistically driven and biased also. So in that respect it is anthropic in nature. 😎
Yes, I am ruled by my bias. My bias being reality. And reality being that which can be observed, experienced and tested.
Everyone in state “A” has a higher IQ than anyone in state “B”. If the dumbest person in A moves to B, that will obviously raise the average IQ of both states. Thus it “seems” that should raise the average IQ of the entire country, yet no one entered or left the country.
Of course when I heard the joke, states A and B had specific names, but I decided to be careful…