Uncommon Descent Serving The Intelligent Design Community

Simpson’s Paradox: Numbers are stranger than we think

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

Dishonest statisticians can produce misleading data but some misleading data are the result of curious flukes of statistics. Simpson’s Paradox is one of these flukes:

Here’s an example. Baseball player Mickey has a better batting average than Babe in both April and May. So, in terms of batting average, Mickey is a better baseball player than Babe. Right?

No.

It turns out that Babe’s combined batting average for April and May can be higher than Mickey’s. In fact, Mickey can have a better batting average than Babe every month of the baseball season and Babe may still be a better hitter. How? That’s Simpson’s Paradox… Robert J. Marks, “Simpson’s Paradox: Big Data Can Lie” at Mind Matters

One outcome of Simpson’s Paradox is that machines cannot replace statisticians in analysing results. A great deal depends on interpretation, as Marks shows. “Clustering remains largely an art.”

Follow UD News at Twitter!

Also by Robert J. Marks: Things Exist That Are Unknowable: A tutorial on Chaitin’s number

See also: Too Big to Fail Safe? (cautions on overuse of Big Data in medicine)

and

Machines cannot take over Fundamental constraints in nature make nonsense of the claim. Great sci-fi plots though.

Comments
Everyone in state "A" has a higher IQ than anyone in state "B". If the dumbest person in A moves to B, that will obviously raise the average IQ of both states. Thus it "seems" that should raise the average IQ of the entire country, yet no one entered or left the country. Of course when I heard the joke, states A and B had specific names, but I decided to be careful...Granville Sewell
April 18, 2019
April
04
Apr
18
18
2019
05:52 PM
5
05
52
PM
PDT
Yes, I am ruled by my bias. My bias being reality. And reality being that which can be observed, experienced and tested.ET
April 16, 2019
April
04
Apr
16
16
2019
06:44 PM
6
06
44
PM
PDT
Anyone who uses statistics alone is already lost where science is concerned. But then again Brother Acartia wouldn't understand that. David Berlinski once wrote:
Natural selection disappears as a biological force and reappears as a statistical artifact. The change is not trivial. It is one thing to say that nothing in biology makes sense except in the light of evolution; it is quite another thing to say that nothing in biology makes sense except in the light of various regression correlations between quantitative characteristics. It hardly appears obvious that if natural selection is simply a matter of correlations established between quantitative traits, that Darwin’s theory has any content beyond the phenomenological, and in the most obvious sense, is no theory at all.
Climate change is statistically driven and biased also. So in that respect it is anthropic in nature. :cool:ET
April 16, 2019
April
04
Apr
16
16
2019
04:54 PM
4
04
54
PM
PDT
Hazel@12, I don’t disagree, but my only point is that even the most conscientious of researchers is subconsciously affected by their bias. In hindsight, I know I was. So is KF, you, Andrew, and everyone else. ET is ruled by his. :) Anyone who says they are not is, with respect, delusional.Brother Brian
April 16, 2019
April
04
Apr
16
16
2019
04:37 PM
4
04
37
PM
PDT
Yes, BB, I agree that people are prone to what you say, but a good, conscientious analyst knows that and does as I suggested in order to be as unbiased as possible. Despite the cynicism, and bias against analysts, that some have, I think most people in most jobs work to do a good, accurate job of analyzing data. But analyzing data takes judgment as well as skill - it's an art and a science, so to speak, which, as I said, is why people share their perspective with others to balance out the individual perspective that each person has.hazel
April 16, 2019
April
04
Apr
16
16
2019
04:03 PM
4
04
03
PM
PDT
Hazel, I don’t disagree, but my point is that researchers, being human, are going to be less diligent examining stats that support their biases (sub consciously) than they will data that opposes it. Most bias in science is “innocent”, not nefarious.Brother Brian
April 16, 2019
April
04
Apr
16
16
2019
03:54 PM
3
03
54
PM
PDT
hazel:
A good analyst...
Is like a true Scotsman...ET
April 16, 2019
April
04
Apr
16
16
2019
03:50 PM
3
03
50
PM
PDT
Brother Brian:
Do you ever wonder why nobody ever attempts to interact with you?
Do you ever wonder why everyone always corrects what you spew? Do you ever tire of posting nonsense that is easily refuted? You act as if you are saying something when in fact you are painfully ignorant- as are all of your socks, spearshake.ET
April 16, 2019
April
04
Apr
16
16
2019
03:48 PM
3
03
48
PM
PDT
I wasn't even talking about bias, BB. I'm just saying stats and probability are tricky, and even well-educated laypersons can draw erroneous, or at least fairly unsupported, conclusions from data. A good analyst (this is true in all fields) looks at their results and tentative conclusions, and then thinks about whether they can double-check their results by doing such things as looking at things with a different approach, or discussing conclusions with a colleague, or publishing where hard criticism is welcome.hazel
April 16, 2019
April
04
Apr
16
16
2019
03:40 PM
3
03
40
PM
PDT
Hazel@5, I agree. Obviously there are some who manipulate the stats to favor their own flavor of reality. But most simply don’t question the assumptions if the initial run supports their bias.Brother Brian
April 16, 2019
April
04
Apr
16
16
2019
03:27 PM
3
03
27
PM
PDT
ET
The opening sentence gave it away, dinnit?
Do you ever wonder why nobody ever attempts to interact with you?Brother Brian
April 16, 2019
April
04
Apr
16
16
2019
03:21 PM
3
03
21
PM
PDT
Yes. Stats and probability are tricky, even to those who understand them. People can unscrupulously bend the facts (like playing with the scale on the vertical axis of a graph), but I think the most common situation is that it takes diligence and some skill to properly interpret data.hazel
April 16, 2019
April
04
Apr
16
16
2019
02:58 PM
2
02
58
PM
PDT
Brother Brian:
Isn’t this more about using statistics and math improperly than anything wrong with statistics and math?
The opening sentence gave it away, dinnit? "Dishonest statisticians can produce misleading data but some misleading data are the result of curious flukes of statistics."?ET
April 16, 2019
April
04
Apr
16
16
2019
02:58 PM
2
02
58
PM
PDT
Isn’t this more about using statistics and math improperly than anything wrong with statistics and math?Brother Brian
April 16, 2019
April
04
Apr
16
16
2019
02:43 PM
2
02
43
PM
PDT
The basic idea here is taught in beginning stats in high school, although not necessarily this specific type of example. If Joe had a batting average of 400 in May and 300 in June, we can't conclude he's averaging 350. You can't average the averages unless the total quantities in each sample are the same.hazel
April 16, 2019
April
04
Apr
16
16
2019
02:27 PM
2
02
27
PM
PDT
This struck me as fascinating. Thanks for posting this, News! Andrewasauber
April 16, 2019
April
04
Apr
16
16
2019
01:40 PM
1
01
40
PM
PDT

Leave a Reply