Uncommon Descent Serving The Intelligent Design Community

Eric Holloway: How ID can help business

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

Holloway points out that companies today are awash in information. But which patterns are real? Which are cloud bunnies?

One way business intelligence can address this problem (false positives) is hypothesis testing. The data analyst can generate a figure for the probability that a pattern is real, not imagined. The difficulty is that, for strong guarantees, the patterns must be proposed before they are seen in the data. But the more the analyst looks at the data to derive a pattern, the more that analyst falls prey to seeing patterns that are not really there. Thus, the need to state all patterns up front is a huge restriction and deadlocks our ability to gain insight from the data.

Welcome to Data Deadlock. Should we just go home now?

Intelligent design theory might help us make new headway in the fields of information theory and statistics. The problem is familiar: how can we be sure that a pattern we see, for example, apparent design in the biological record, is not merely a chance outcome? Intelligent design theory makes the novel proposal that we can derive patterns from the data after the fact while retaining the strong guarantees of hypothesis testing.

Eric Holloway , “How Business Intelligence Can Break the Data Deadlock” at Mind Matters News

ID theory, he says, offers a way around false positives.

See also: Does information theory support design in nature William Dembski makes a convincing case, using accepted information theory principles relevant to computer science

Follow UD News at Twitter!

Comments
Thanks Vmahuna and Polistra for your thoughtful comments. Regarding the huge waste from current statistics, I would argue this demonstrates my point. Due to the restrictions of Fisher/Pearson hypothesis testing of stating highly conservative models up front, there is bound to be a huge amount of waste because models cannot be updated after the fact to deal with actual observed performance. Being able to have statistically robust models derived after the fact would result in reduction of over engineering. The need to hear data on the ground is also another implication of this concept, as it again is a matter of updating the models based on after the fact information. In fact, the field of software engineering has really taken this approach on full force with the current practices of agile and devops, which seek constant feedback and rapid iterative improvement on prototypes, instead of a priori exhaustively defined specifications that then flow into the waterfall or spiral design methodologies. So, the general point is that ID much better describes how engineering and data analysis actually happen, or should happen, than traditional statistical modeling. Furthermore, ID offers a robust methodology formally justifying what actually happens. When I was taking a machine learning class in grad school, this distinction between theory and practice became very stark. The theory of machine learning very strongly discourages "data snooping", as that would invalidate the statistical guarantees of their Fisher/Pearson hypothesis testing methodology. However, the most successful data science platform, Kaggle, the winning contestants engage in enormous amounts of "data snooping" and end up producing models with very high predictive accuracy on out of sample datasets. If the real world operated according to Fisher/Pearson, this would not happen. The data snoopers would end up dirtying up their models with too much overfitting to create winning predictions. The only reason Kaggle is so successful is because the ID theory of hypothesis testing is correct, and we can indeed update our models after the fact. Finally, Fisher was well aware of this problem. He actually wanted to include some way to have ad hoc model updates, but the statistical theory of his time did not allow for it. Only once we've established good formalisms of randomness, such as with Kolmogorov's axioms of probability and algorithmic information, have we been able to get past the a priori model deadlock and formally allow for the actual induction and abductive reasoning that occur in practice. The only problem, from a naturalistic standpoint, is that to make formal sense of the ad hoc approach seems to require the inclusion of non-materialistic agencies, such as free will and halting oracles. So, as you might expect, a steadfast commitment to naturalism turns out to hold back progress in statistical modeling. For an example of the unfortunate tendency of methodological naturalism to be a science stopper, see these two questions I asked on public biology and bioinformatics forums regarding whether we can usefully interpret the genome as a programming language: https://bioinformatics.stackexchange.com/questions/8890/is-the-genome-a-programming-language-i-e-lisp-can-we-analyze-it-with-compute/ https://biology.stackexchange.com/questions/85289/is-the-genome-a-programming-language-e-g-like-lisp-can-we-analyze-it-as-soft Note the almost complete lack of substantive answers, and there are outright calls to shut down the questions. I propose it is the implicit commitment to naturalism that makes it so difficult for the forum members to treat the questions fairly and provide rational answers to what seems, at least to me, a potentially quite useful question. If my thesis is correct, then turning full circle back to the question of business intelligence, the commitment to naturalism and elimination of post hoc models is resulting in enormous amounts of waste and missed insights for businesses. Thus, any business adopting the ID approach will be at a competitive advantage, and ironically, be more fit to survive in the Darwinian corporate world.EricMH
July 6, 2019
July
07
Jul
6
06
2019
11:40 AM
11
11
40
AM
PDT
No theory can help much. Finding valid patterns requires real subjective experience. Holloway's example of repeated lottery tickets proves it. A pro gambler knows without math whether a sequence is a possible random streak or a cheater with a system. In earlier times, big business relied on a network of dealers who understood consumer preferences by LOCAL and MODULAR experience. When a carmaker introduced a model that wouldn't sell, the dealers told them about it. Smart carmakers listened to the dealers before tooling up a failed model. Now big business gathers worldwide data into one blob and tries to use math to tease out the MODULAR and LOCAL tendencies. Can't be done.polistra
July 5, 2019
July
07
Jul
5
05
2019
03:44 PM
3
03
44
PM
PDT
I'm sorry, but I did analysis of government data for... carry the 2... gotta take my shoes off here... Um, a WHOLE lotta years, and the only problem with statistical analysis I've ever seen is managers who refuse to acknowledge that, oh, the PREDICTED rate of failure (functional failure, failure to perform one of the required functions) is WAY higher than the observed failures in operation. Although there is also this HUGE wastefulness where there are zero failures of the part in 20 years of operation. I read someplace that back in the early 20th century, Henry Ford sent engineers out to junk yards across the country looked for Model T Fords that had been junked. What he wanted them to find were parts of the junked cars that STILL WORKED. Ford then had those SUPERIOR parts CHEAPENED, with the goal of having the ENTIRE Model T fall apart in your driveway at the same time. Now THAT'S Statistical Analysis in action. So, if humans die from Heart Disease when they still have functional kidneys, wouldn't an Intelligent Designer have designed ALL human organs (including The Skin) to fail at least within a month of each other, if not on the same day?vmahuna
July 5, 2019
July
07
Jul
5
05
2019
03:25 PM
3
03
25
PM
PDT

Leave a Reply