Dishonest statisticians can produce misleading data but some misleading data are the result of curious flukes of statistics. Simpson’s Paradox is one of these flukes:

Here’s an example. Baseball player Mickey has a better batting average than Babe in both April and May. So, in terms of batting average, Mickey is a better baseball player than Babe. Right?

No.

It turns out that Babe’s combined batting average for April and May can be higher than Mickey’s. In fact, Mickey can have a better batting average than Babe every month of the baseball season and Babe may still be a better hitter. How? That’s Simpson’s Paradox…

Robert J. Marks, “Simpson’s Paradox: Big Data Can Lie” atMind Matters

One outcome of Simpson’s Paradox is that machines cannot replace statisticians in analysing results. A great deal depends on interpretation, as Marks shows. “Clustering remains largely an art.”

Follow UD News at Twitter!

*Also by* Robert J. Marks: Things Exist That Are Unknowable: A tutorial on Chaitin’s number

*See also:* Too Big to Fail Safe? (cautions on overuse of Big Data in medicine)

and

Machines cannot take over Fundamental constraints in nature make nonsense of the claim. Great sci-fi plots though.