Uncommon Descent Serving The Intelligent Design Community

That time they invented scientists as well as research papers…


That time is today. Here’s another addition to the Trust the Science files:

In 2010 Labbé showed how citation counts could be inflated. In a few short months, he elevated an imaginary computer scientist (Ike Antkare, pronounced, “I can’t care”) to “one of the greatest stars in the scientific firmament.” Since Google Scholar only indexes papers that reference a paper already in Google Scholar, Labbé used SCIgen to create a fake paper, purportedly authored by Antkare, which referenced real papers and then used Scigen to generate 100 additional bogus papers supposedly authored by Antkare, each of which cited itself, the other 99 papers, and the initial fake paper. Finally, Labbé created a web page listing the titles and abstracts of all 101 papers, with links to pdf files, and waited for Google’s web crawler to find the bogus cross-referenced papers.

The Googlebot soon found the papers and Antkare was credited with 101 papers that had been cited by 101 papers, which propelled him to 21st on Google’s list of the most cited scientists of all time, behind Freud but well ahead of Einstein, and first among computer scientists.

If imitation is the sincerest form of flattery, Labbé should be flattered. Soon after he reported the Antkare stunt, three Spanish researchers who specialize in bibliometrics and research evaluation reported inflating their Google Scholar Citation profiles by posting six fictitious papers on one of their university websites, with each of the six papers citing 129 papers written by the authors. As expected, “The citation explosion was thrilling, especially in the case of the youngest researchers whose citation rates multiplied by six.” They published an account of their manipulation of citation counts because their intent was not to game the system, but to show how the system cannot be trusted because it can be gamed.

Even if researchers do not do an all-out Ike Antkare, they can still easily game citation metrics. In every paper they write, they cite as many of their other papers as the editors will let them get away with.

Journals can also game citation counts by publishing lots of papers that cite papers previously published in the journal. On more than one occasion, I have had journal editors ask me to add references to articles published in their journal.

Gary Smith, “A vulnerable system: fake papers and imaginary scientists” at Mind Matters News

Blowharding about “Trust the science” is necessarily losing its shine. Like a wooden nickel.

You may also wish to read the first two articles in this three-part series:

Publish or Perish — Another Example of Goodhart’s Law. In becoming a target, publication has ceased to be a good measure. Researchers game the system to beat the publish-or-perish culture, which undermines the usefulness of publication and citation counts. (Gary Smith)

Gaming the System: The Flaws in Peer Review. Peer review is well-intentioned, but flawed in many ways. Predatory journals, dishonest researchers, and escalating costs in academic journals reveal the weaknesses in peer review. (Gary Smith)

Bibliometric analysis using citations is difficult and full of of well known pits. There are lot's of papers dealing with such analyses. A really informative text on the problems of citations can be found here (published in 1981 !!!): Smith (1981), Citation Analysis, Library Trends, Vol. 30, S. 83-106 https://www.ideals.illinois.edu/handle/2142/7190 AndyClue
In the Scientometrics journal Google Scholar was rejected as being a useful tool for scientometric analysis: "Is Google Scholar useful for bibliometrics? A webometric analysis", 2011 https://link.springer.com/article/10.1007/s11192-011-0582-8
The individual analysis show that universities from China, Brazil, Spain, Taiwan or Indonesia are far better ranked than expected. In some cases, large international or national databases, or repositories are responsible for the high numbers found. However, in many others, the local contents, including papers in low impact journals, popular scientific literature, and unpublished reports or teaching supporting materials are clearly overrepresented. Google Scholar lacks the quality control needed for its use as a bibliometric tool; the larger coverage it provides consists in some cases of items not comparable with those provided by other similar databases.
Ouch. This is one reason why we shouldn't trust Google scholar. another is that it makes me one of the top scientists at my university (basically because my name is on a software package, where my contribution was some code to implement a method for I wrote for a paper that said "don't use this method"). Bob O'H

Leave a Reply