Rob Sheldon on why statisticians are in a panic

_{News
September 2, 2019

Intelligent Design, Mathematics, Peer review

5}_{Categories
Intelligent Design
Mathematics
Peer review}

Share: Facebook; Twitter/X; LinkedIn; Flipboard; Print; Email

Yes, recently, we learned from a highly official source that statisticians are in some kind of a panic:

While the crisis of statistics has made it to the headlines, that of mathematical modelling hasn’t. Something can be learned comparing the two, and looking at other instances of production of numbers.Sociology of quantification and post-normal science can help.
While statistical and mathematical modelling share important features, they don’t seem to share the same sense of crisis. Statisticians appear mired in an academic and mediatic debate where even the concept of significance appears challenged, while more sedate tones prevail in the various communities of mathematical modelling. This is perhaps because, unlike statistics, mathematical modelling is not a discipline. It cannot discuss possible fixes in disciplinary fora under the supervision of recognised leaders. It cannot issue authoritative statements of concern from relevant institutions such as e.g., the American Statistical Association or the columns of Nature.
Andrea Saltelli, “A short comment on statistical versus mathematical modelling” at Nature

So what’s going on? Our physics color commentator Rob Sheldon offers,

The author of this article is contrasting the growing sense of panic in statisticians, with the complacency of modelers.

The panic in sociology, psychology, nutrition science, and pharmacology has been growing as >70% papers with “p-values” smaller than 0.05 are discovered to be unrepeatable.

Since the “p-value” is a statistical quantity invented by Ronald Fisher and is tied to “frequentist” statistics, the competing “Bayesian” statisticians have claimed that the method is deeply flawed. That battle is not new, having been fought since the year that Fisher introduced his p-value, but until recently, had been won by the frequentists. Today, Bayesian methods are not just widely popular, but have replaced frequentists in many niche fields, so that the “irreproducibility” crisis is not simply pointing the finger at a few fraudulent bad apples, but at an entire educational system that promoted p-hacking.

By contrast, modellers have been growing in prestige and fame year upon year. For example, in 2018, nine Neanderthal genomes had been sequenced, and one Denisovan genome.

Yet we have a news item this week, typical of recent news items, which claims that Neanderthals carry 1% of their genes from previous encounters with modern humans.

The Long Ascent: Genesis 1â 11 in Science & Myth, Volume 1 by [Sheldon, Robert]

How do they figure this out? Especially since we have zero genomes from Modern humans that predate Neanderthals?

Models.

But how do we know, asks Andrea Saltelli, if our models are valid? Can we run calibration tests on them with known answers? How about simple consistency checks? What about stating all our assumptions up front?

Nope, nope, and double nope. Modellers get a free pass, while statisticians get the bright lights in their eyes and the grilling from unseen questioners, with the threat of retracted papers and tenure-destroying expulsion.

Saltelli then goes on to show a rather disturbing plot. The more complicated our model becomes, the more ability it has to match our actual data. If you have only two data points, a model needs only two free parameters, and it can find a line through both those points. If you have 3 points, you can find a curve, a quadratic polynomial that will go through them. As long as you have as many free parameters as there are data points, there is always a curve that goes directly through all the points.

But is this increasingly complex mathematical model valid?

The way to test it, is to find one additional point, and see if the curve for n-1 points matches this last point. And weirdly enough, when the model has too many free parameters, it gets more and more “unstable”, more and more “wiggly” as it strains to perfectly match the previous data, with less and less likelihood of matching new data. This is what Saltelli’s disturbing plot shows, that the model error is minimized somewhere in the middle of the “complexity” axis.

So rather than complimenting our modellers (think global climate models) for matching past data perfectly by adding in adjustable variables (aerosols, feedback), we should be suspicious that they are actually making their predictions worse by overcomplicating them.

And it isn’t just Neanderthal genetics and global climate models. This is true for every area of science, from cosmology to particle physics to cladistics and AI. This is why IBM is abandoning “Deep Mind.” The problem wasn’t fixed by throwing more complexity at it.

So rather than being complacent, modellers ought to be in an equal state of panic as statisticians. Saltelli is not abandoning modelling, he just wants it to be ethical. From his concluding paragraph: “While this vision is gaining new traction [sociology of modelers working with suppliers of data and users of models] more could be done. A new ethics of quantification must be nurtured.”

Perhaps this is all part of the Paley renaissance, recognizing that the days of coddled dogmatics and their supporting cast of modellers are coming to an end.

See also: Confirmed: Deep Mind’s deepest mind is on leave. The chess champ computer system just never made money

Note:Rob Sheldon is the author of Genesis: The Long Ascent

Follow UD News at Twitter!

Comments

News, I was taught from the beginning that models are useful [if empirically reliable and even better, accurate at predictions . . . ] rather than true. Indeed, later, I saw that a key difference for theories was that they had some possibility of being true [= accurate to reality]. Yet later, I realised there is a debate between [chastened?] scientific realists and anti-realists [think, Feyerabend, Lakatos and Kuhn] with the ghost of the pessimistic induction on the history of falsified theories haunting the discussion. It looks a lot like we are falling into an abyss of reducing everything to modelling, then locking in some models as effectively sacrosanct because of ideologies such as evolutionary materialistic scientism. Maybe, it is time for serious reconsideration. Back to my son-assigned homework, reading Bishop Berkeley: just what is matter in an era of the Casimir effect and linked quantum field theory? Lurking, and what is mind too? KFkairosfocus_{September 5, 2019
September
09
Sep
5
05
2019
03:42 AM
3
03
42
AM
PDT}

From the Nature article:
For a start, modelling is less amenable than statistics to structured remedies. A statistical experiment in medicine or psychology can be pre-registered, to prevent changing the hypothesis after the results are known. The preregistration of a modelling exercise before the model is coded is unheard of, although without assessing model purpose one cannot judge its quality. For this reason, while a rhetorical or ritual use of methods is lamented in statistics, it is perhaps even more frequent in modelling. What is meant here by ritual is the going through the motions of a scientific process of quantification while in fact producing vacuous numbers.
From Dembski's 1998 paper, http://www.arn.org/docs/dembski/wd_idtheory.htmIntelligent Design as a Theory of InformationWhat is it for a possibility to be identifiable by means of an independently given pattern? A full exposition of specification requires a detailed answer to this question. Unfortunately, such an exposition is beyond the scope of this paper. The key conceptual difficulty here is to characterize the independence condition between patterns and information. This independence condition breaks into two subsidiary conditions: (1) a condition to stochastic conditional independence between the information in question and certain relevant background knowledge; and (2) a tractability condition whereby the pattern in question can be constructed from the aforementioned background knowledge. Although these conditions make good intuitive sense, they are not easily formalized. For the details refer to my monograph The Design Inference. This is exactly what Dembski's efforts sought to address, to which he gives a full explanation in No Free Lunch. Most people, even those who should know better, can't differentiate between knowing the pattern beforehand and then discovering it, and just discovering the pattern after the fact. For the former, the question, "Is a pattern present prior to the generation of the pattern I now recognize?, while for the latter, the question that is asked is simply, "Do I recognize a pattern." There not the same thing. The latter is no more than "ritual."PaV_{September 2, 2019
September
09
Sep
2
02
2019
10:51 AM
10
10
51
AM
PDT}

Rob:
So rather than complimenting our modellers (think global climate models) for matching past data perfectly by adding in adjustable variables (aerosols, feedback), we should be suspicious that they are actually making their predictions worse by overcomplicating them.
There's the case of Lord Monckton's "simple model," using a much simpler--and likely more apt, formula for 'feedback,' and which "models" recent climate well..............much better than the other climate modelers.
And weirdly enough, when the model has too many free parameters, it gets more and more “unstable”, more and more “wiggly” as it strains to perfectly match the previous data, with less and less likelihood of matching new data. This is what Saltelli’s disturbing plot shows, that the model error is minimized somewhere in the middle of the “complexity” axis.
So, instead of a "line," we get a "cloud." And within that "cloud" everything is related to everything else in an almost equal way. Which means you end up with no correlation at all.PaV_{September 2, 2019
September
09
Sep
2
02
2019
10:38 AM
10
10
38
AM
PDT}

The short, short version is, we want our models to generalize our data. Generalization means that the model should be smaller than the data. It should be smaller still based on the amount of error. While this doesn't guarantee fit, it at least seems to give a good starting point.johnnyb_{September 2, 2019
September
09
Sep
2
02
2019
09:56 AM
9
09
56
AM
PDT}

Interestingly, just a few months ago Eric Holloway and I published a mechanism for testing models against their complexity. It isn't the first (or last) word on model complexity testing, but it offers a straightforward way of criticizing overly-complex models. Generalized Information: A Straightforward Method for Judging Machine Learning Models.johnnyb_{September 2, 2019
September
09
Sep
2
02
2019
09:53 AM
9
09
53
AM
PDT}

You must be logged in to post a comment.

Leave a Reply