Uncommon Descent Serving The Intelligent Design Community

The perils of data mining in science

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

It is a plague, not a cure, says Pomona College statistics prof Gary Smith:

Decades ago, data mining was considered a sin comparable to plagiarism. Today, the data mining plague is seemingly everywhere, cropping up in medicine, economics, management, and, now, history. Scientific historical analyses are inevitably based on data documents, fossils, drawings, oral traditions, artifacts, and more. But now, historians are being urged to embrace the data deluge as teams systematically assemble large digital collections of historical data that can be data mined…

The promise is that an embrace of formal statistical tests can make history more scientific. The peril is the ill-founded idea that useful models can be revealed by discovering unanticipated patterns in large databases where meaningless patterns are endemic. Statisticians bearing algorithms are a poor substitute for expertise.

For example, one algorithm that was used to generate missing values in a historical database concluded that Cuzco, the capital of the Inca Empire, once had only 62 inhabitants, while its largest settlement had 17,856 inhabitants. Humans would know better.

Gary Smith, “Data mining: A plague, not a cure” at Mind Matters News

He adds, “Findings patterns in data is easy. Finding meaningful patterns that have a logical basis and can be used to make accurate predictions is elusive. We can see this from 18th-century attempts to cure scurvy through 21st century claims about the stock market or history. “

Coronavirus: Is data mining failing its first really big test? Computers scanning thousands of paper don’t seem to be providing answers for COVID-19. (Robert J. Marks)

Comments
ACE2 receptor - CoViD19? Hmm... https://www.nih.gov/news-events/news-releases/study-determine-incidence-novel-coronavirus-infection-us-children-beginsjawa
May 12, 2020
May
05
May
12
12
2020
06:08 AM
6
06
08
AM
PDT
The important distinction is passive vs active. Looking at patterns from the outside, via piles of acquired metadata, doesn't work well. The only way to see the patterns of Nature is to actively engage with Nature. Not necessarily formal controlled experiments, but always some kind of experiment. Examine a real animal, build a real circuit or mechanism. Wiggle one part and see what happens to the other parts. This is also true of markets and history. The people who know the patterns are the pattern-makers, the active wigglers, the central banks and Deepstates. They take great pains to hide their active wiggling behind barrages of fake movements, so distant observers are unlikely to be looking in the right places. (I'm thinking of Denyse's brilliant remark about cats and hunting...)polistra
May 12, 2020
May
05
May
12
12
2020
05:42 AM
5
05
42
AM
PDT

Leave a Reply