We can think we’ve invented the wheel when actually …
From Adam Kirsch at The New Republic:
Certainly, if we ask the data unsophisticated or banal questions, we will get only unsophisticated and banal answers. That is the lesson of Uncharted, in which Erez Aiden and Jean-Baptiste Michel play tricks with the Google Ngram Viewer. In an odd but revealing moment, they quote a list of things that publications said about their invention when it launched, including this: “Mother Jones hailed it as ‘perhaps the greatest timewaster in the history of the Internet.” “Hailed” does not seem like quite the right word here, but Aiden and Michel don’t care: what matters is not the quality of the attention but the fact that “the interwebs were atwitter, and the Twitter was abuzz.”
The Google Ngram Viewer allows the user to search all of Google Books for strings of characters. This sounds like a powerful tool, but as Aiden and Michel put it through its paces, it turns out once again that the digital analysis of literature tells us what we already know rather than leading us in new directions. It is not surprising to learn, for instance, that the incidence in print of the name of any given year is most common in that year itself, so that more books containing “1950” were published in 1950 than in any other year. One reason this is not surprising is that all books’ copyright pages include the year of publication; but Aiden and Michel ignore this fact, which tends to nullify their conclusions about the “forgetting curve.” Once again, meta-knowledge—knowledge about the conditions of the data you are manipulating—proves to be crucial for understanding anything a computer tells you. Ask a badly phrased question and you get a meaningless answer.
At another point Aiden and Michel use the Ngram Viewer to document the suppression of certain names in German-language books published between 1933 and 1945. They show that banned artists such as Chagall and Beckmann virtually disappear from German books under the Nazis, and then rebound spectacularly after the war, as interest in their work revives. This is another example of data illustrating a truism rather than discovering a truth. After all, we wouldn’t think to search for those names in that time period unless we knew what we were going to find, and why; and the same holds true for the other examples of censorship that Aiden and Michel cite—the word “Tiananmen” in Chinese after 1989, for instance. The faux naïveté of some of these digital tools, their proud innocence of prior historical knowledge and literary interpretation, is partly responsible for the thinness of their findings.
Indeed, Aiden and Michel write that when they posed the same question about artists’ names to “a scholar from Yad Vashem,” she was able to predict exactly “which names would appear at which end of the curve. We didn’t give her access to our data or to our results, and we didn’t even tell her why we were asking. All she got from us was the list of names. Nevertheless, her answers agreed with ours the vast majority of the time.” Of course they did: she was a scholar! Aiden and Michel do not seem to recognize that this example, far from making the case for the usefulness of Ngrams, completely destroys it, by turning them into fancy reiterations of conventional wisdom.More.
There’s nothing wrong with Ngrams, but original ideas are just the sort of thing that can’t by their nature be automated. The iThinkbot isn’t going to work out any better than the iCarebot.
Follow UD News at Twitter!