Intelligent Design Peer review

How, exactly, to construct a gibberish paper that gets accepted by journals

Spread the love

Okay, if someone asks, we will of course say, don’t do this:

Publishers withdraw more than 120 gibberish papers

Conference proceedings removed from subscription databases after scientist reveals that they were computer-generated.

But if you want to know how the offenders actually do it, here’s physicist Rob Sheldon with a lay-friendly explanation:

I finally had a chance to read the Labbe paper that found all the computer-generated papers. Here’s what they found:

The SCIgen fake paper generator is a rather long “Mad-Libs” template. The sentences are constructed by hand, but blanks are left for “scientific_adjective”, “noun-for-process”, etc. Glossaries of 50 or 100 words are supplied for these adjectives and nouns, and then the paper is constructed by filling the blanks randomly. So the grammar is correct, even the logic is correct, it is just that the content is made up. The code for this generator was made by 3 grad students at MIT in 2005, and originally the blanks were all “computer-science” words. The references are constructed likewise. The students did this as a prank, to demonstrate that many meetings don’t really care about “peer-reviewing” the entries, they care about making money off the registrants. What is scary, is that many of these journals that accepted the papers–IEEE and Springer-Verlag–are respected and “peer reviewed” journals.

The students made their “mad lib” code available to everybody, and since 2005, other uses have been found for it. For example, both in China and in Eastern Europe, promotions are based on getting peer-reviewed articles into English-speaking journals. Many scientists have next to nil English writing ability, and this is a way to raise your publication count.

The Labbe paper showed that you could also fool the Google Scholar metrics with what is called a “quote farm”. Google tracks how many people quote you, and follows the chain of quotes a couple of references backward. So Labbe and his wife created 100 SCIgen papers, carefully putting the other 99 papers in the reference of each. They didn’t need to publish them, because Google Scholar just pulls them off the internet. Thus a closed universe of self-quoting papers was created. When the Google metrics hit this, they ran around in circles finding out that “Dr Antkare” was being quoted by everyone!

Labbe also discovered that when they used the “More Like This One” feature button on the web browsers developed by Google or Nature or whoever, they could feed in SCIgen papers they had made and found numerous others in the literature! Over 120 papers were found this way, which they then dutifully told the publishers were machine generated.

Several variants of the SCIgen paper have also been constructed. One that does for High Energy particle physics exactly what SCIgen did for Computer Science. (Lubos Motl has sarcastically referred to this version as a clone of Lee Smolin’s brain. I too know a colleague who could double for Smolin.) Labbe has offered his “find SCIgen” program to the public, but it simply recognizes that the sentences in SCIgen are “fill in the blank” and therefore never depart from a recognizable script. The opening sentence is always one of two forms, which is a dead giveaway.

Of course, now those MIT grads will probably respond to Labbe’s algorithm by making a “Mad Lib” template generator, so an infinite variety of templates can be produced. It will be a challenge detecting the 2nd round of fake papers. But more importantly, the ease with which these papers can be created, combined with the apparent difficulty for reviewers to recognize them, doesn’t bode well for any of these fields. It looks like there are far more charlatans than we thought, or conversely, the economic benefits of securing tenure far outweigh the punishment for getting caught.

Here’s a challenge for ID types: There must be a way to tell when a sentence is gibberish and when it is meaningful.

Rob offers a suggestion on the fly:

Claude Shannon went through a text and eliminated letters to see if humans could reconstruct the words from the letters remaining. He then had a metric for how much information was encoded in each letter. Has anyone done this for words? How about sentences? Surely we can apply this sort of metric to a mad libs paper, and detect its information content.

Follow UD News at Twitter!
Further reading: This is not a hoax: 120 computer-generated nonsense papers are being removed from science papers database


Computer-generated nonsense research papers story trending

2 Replies to “How, exactly, to construct a gibberish paper that gets accepted by journals

  1. 1
    Dionisio says:

    oh well, what else is new? there we go again…
    cheating and deceiving is part of our human nature…
    so what’s wrong with all this?
    to those who did it, it was alright.
    anyway, who cares? no absolute truth, no absolute moral law, no absolute authority, no integrity, no responsibility,…
    just primordial soup, natural selection, survival of the fittest, the ends justify the means, whatever,… 🙁

    “A testing and a questioning has been all my travelling:- and verily, one must also learn to answer such questioning! That, however,- is my taste:
    -Neither a good nor a bad taste, but my taste, of which I have no longer either shame or secrecy.
    “This- is now my way,- where is yours?” Thus did I answer those who asked me “the way.” For the way- it does not exist!
    Thus spoke Zarathustra.”
    —Friedrich Wilhelm Nietzsche (Also sprach Zarathustra: Ein Buch für Alle und Keinen)

  2. 2
    Querius says:

    Are any of the computer-generated papers any good? Maybe, with peer review, the fittest will survive and produce new theories.

    For example, start out with bored high school students eliminating the most obvious ones, then mutate a meme here and there until the paper gets accepted. Then repeat the process with lower division college students, and so on up the ladder.

    Finally eliminate the flakiest ones that have already been published such as multi-verses, blind watchmaker, and the like.

    What’s left should be absolutely brilliant, and emulate the finest, most insightful papers ever published!

    I’d call the method, “The Blind Researcher” or “Methinks it’s a Weasel Paper.”


Leave a Reply