In a project that could unlock the world’s research papers for easier computerized analysis, an American technologist has released online a gigantic index of the words and short phrases contained in more than 100 million journal articles — including many paywalled papers. The catalogue, which was released on 7 October and is free to use, holds tables of more than 355 billion words and sentence fragments listed next to the articles in which they appear. It is an effort to help scientists use software to glean insights from published work even if they have no legal access to the underlying papers, says its creator, Carl Malamud. He released the files under the auspices of Public Resource, a non-profit corporation in Sebastopol, California that he founded.
Holly Else, “Giant, free index to world’s research papers released online” at Nature 26 October 2021
Malamud isn’t breaching copyright, he says, because his quotations are only five words or so. In the article, Else warns that publishers might question how he could have created the index in the first place.
Some of us are left wondering why, iogven that the public generally pays for science, the public isn’t allowed to read it for free.