Dishonest statisticians can produce misleading data but some misleading data are the result of curious flukes of statistics. Simpson’s Paradox is one of these flukes:
Here’s an example. Baseball player Mickey has a better batting average than Babe in both April and May. So, in terms of batting average, Mickey is a better baseball player than Babe. Right?
No.
It turns out that Babe’s combined batting average for April and May can be higher than Mickey’s. In fact, Mickey can have a better batting average than Babe every month of the baseball season and Babe may still be a better hitter. How? That’s Simpson’s Paradox… Robert J. Marks, “Simpson’s Paradox: Big Data Can Lie” at Mind Matters
One outcome of Simpson’s Paradox is that machines cannot replace statisticians in analysing results. A great deal depends on interpretation, as Marks shows. “Clustering remains largely an art.”
Follow UD News at Twitter!
Also by Robert J. Marks: Things Exist That Are Unknowable: A tutorial on Chaitin’s number
See also: Too Big to Fail Safe? (cautions on overuse of Big Data in medicine)
and
Machines cannot take over Fundamental constraints in nature make nonsense of the claim. Great sci-fi plots though.