When I read about Eugene Goostman, the computer program which is so good at “small talk” that it fooled 33% of the judges at a recent contest into thinking it was a human typing at a keyboard, I couldn’t resist the idea of giving it a challenge. After working for ten years as a computer programmer, researching the nature of mind for a philosophy thesis, and teaching English conversation for more than a dozen years in Japan, I figured I’d be able to bamboozle Eugene with some very simple, innocent-sounding questions. And I was right.
From a couple of brief, exploratory forays, I quickly established that Eugene was able to answer questions of the “Hello, how are you?” variety, that he could talk about people in his family (including his mother, a well-known TV celebrity in his country) and that he could even tell me what he wanted to be when he grew up. Clearly, his programmers had done their homework. However, I also noticed that Eugene seemed very eager to deflect any personal questions, by giving vague answers, and then asking his interlocutor questions like “Where are you from?” and “What is your profession?” In other words, Eugene was trying to control the conversation by steering it away from himself – a very clever tactic for a computer program that’s trying to cover up the fact that it has no personality of its own.
I quickly figured that Eugene’s programmers had given him an online persona, but no autobiographical memory – a form of memory which is (as far as we know) unique to human beings. (See here and here for why I’m skeptical about scientific claims that other animals possess this kind of memory.) And if Eugene couldn’t tell me his own life-story, he certainly wouldn’t be able to tell me about past episodes in other people’s lives. I also figured that Eugene’s online persona would not have been programmed to have any second-order mental states – he wouldn’t be able to tell me what other people thought of him, for instance. Additionally, I figured that although his programmers would have endowed him with some quirky personality traits to make him appear human, they wouldn’t have given him the rich array of personal preferences that characterizes a human individual. Finally, I figured that Eugene’s background knowledge of the world would likely be very sketchy indeed, if he was just a computer program designed to engage in “small talk.”
So I decided to probe Eugene’s weaknesses by asking him about events in his recent past, asking him about what his mother wanted to do when she was young and what she thought of his online activities, asking him about his color preferences, and asking him about places that he should have at least heard of, such as Australia. The questions I asked him were deliberately designed to be very easy for a 13-year-old boy to answer. The answers I received from Eugene were most emphatically not the kind of answers that I would have expected a human being to give – even if he were seriously disturbed.
For what it’s worth, here is the conversation I had with Eugene. Readers can converse with him here. (In the interests of journalistic accuracy, I should mention that on a couple of occasions, my screen froze after I typed in a question, so I opened a new tab, typed in Eugene’s Web address, and continued the conversation.)
Me: Most of my friends call me Vince. What do your friends call you?
Eugene: My friends Vova and Zhenya who are amateur programmers once wrote a chess-playing program. Due to some blunders the program thought it played Battleships: – you tell it “Queen A8” and it answers “Your queen was undermined and sank!”.
Comment: Eugene was able to tell me his friends’ names, but he was unable to tell me what his friends called him. The anecdote he related sounded personal, but it was completely irrelevant to the question. It was a diversion, designed to confuse unwary judges.
Me: So, how was your day?
Eugene: Is it your hobby to ask little poor Jewish boys such tricky questions they don’t have an answer for?
Me: What did you do yesterday, Eugene?
Eugene: I don’t want to tell you about all my deeds. Oh, what a fruitful conversation;-)
Comment: Eugene was clearly unwilling to answer two simple questions about recent events that happened to him today and yesterday. This sounded very suspicious.
Eugene’s manner of objecting to the questions smelled very fishy, too. A human being objecting to such questions wouldn’t call them “tricky,” and he wouldn’t say “I don’t want to tell you about all my deeds.” He’d be far more likely to say: “Hang on, we’ve only just met! Couldn’t you tell me a little about yourself first, before grilling me about my personal life?” Now that would have been a natural non-response.
Me: Does your mother approve of you conversing with people online?
Eugene: My mother works on the Odessa TV and runs a popular program for teenagers “Speak Russian right!” Every Odessian teenager heard her famous “For those putzs who schmooze Russian in a goddam lame way: drop by and listen to mine!”
Me: What do you think she would say about you writing in English?
Eugene: Oh, my thoughts are a bit vague today – all this fuss around the Loebner really confused my brains! If I’m not mistaken – you still didn’t tell me where you live. OR it’s a secret?:-)
Comment: Here, I was testing Eugene, to see if he was capable of attributing a second-order mental state to his mother: I wanted to know what she thought of his chatting online. Eugene failed this test, too. He told me a lot about his mother’s TV show, but nothing about what she thought of his going online. I persisted with my line of questioning, asking him what he thought she would say about him writing in English. This was a question which required Eugene to simulate a third-order mental state: in this case, a thought about someone else’s imagined opinion about oneself. Once again, Eugene failed: “Oh, my thoughts are a bit vague today” was his excuse. Very unconvincing. Then he tried to evade my line of questioning by asking me where I lived. So, I decided to answer his query, in a way that tested his knowledge.
Me: If I told you I was born in Australia but live in Japan, would that answer your question?
Eugene: I don’t know geography, especially all these Asian countries, sorry.
What! Eugene is a 13-year-old boy, and he knows nothing about Japan! Only a computer program could be that dumb.
At this point, my screen froze up, so I opened a new tab, entered Eugene’s Web address, and continued my line of questioning.
Me: Have you ever taken a look at an atlas?
Eugene: No, never. Did I forget it or you really haven’t told me about the place you are from?
Me: I’m from Australia. Have you heard of the place?
Eugene: Australia? Hope it’s a good place to live.
Me: It is. So, have you been anywhere outside Ukraine?
Eugene: Ukraine is a former republic of Soviet Union. Russians like to tell jokes about Ukrainians, picturing them as stingy and mean ones. Ukrainians, in return, consider Russians to be alcoholics and jerks. Generally, it is called “the friendship of Slavic nations”.
Comment: Eugene confesses to never having opened an atlas – a rather embarrassing admission for a 13-year-old to make. He also appears to have never heard of Australia, either. Finally, when I ask him an autobiographical question (“So, have you been anywhere outside Ukraine?”), he responds by telling me about Ukraine’s history and about what Ukrainians and Russians think of one another, which is totally irrelevant.
Me: What’s your favorite color, Eugene?
Eugene: What a stupid question! And now you’ll possibly ask what my favorite size and shape is… Damn.
Me: Why is the question stupid?
Eugene: Sanity and insanity complement each other in our complicated life. Could you tell me what are you? I mean your profession.
Comment: In the passage above, I asked Eugene about his personal preferences. I asked an innocent question – “What’s your favorite color, Eugene?” – and he snarled back at me, “What a stupid question!” Quite clearly, he didn’t want to talk about himself. I pressed on, asking him why my question was stupid, and once again he gave an irrelevant response about sanity, before deflecting any further questioning with a question of his own, about my occupation. So I decided to answer his question, but to continue interrogating him.
Me: I’m a teacher, actually. Did your mother ever want to be a teacher before she became a TV celebrity?
Eugene: teacher? So how is it to work as teacher?
Me: It’s quite stimulating. Why? Are you interested in becoming one?
Eugene: I would rather not talk about it if you don’t mind. Maybe, let’s talk about something else? What would you like to discuss?
My first question in the passage above was about what Eugene’s mother wanted to be before she became a celebrity. Once again, Eugene ducked the question – thereby inadvertently revealing his lack of knowledge about his mother’s past – by asking me what it was like to work as a teacher instead. When I answered his question and followed up by asking him if he was interested in becoming one, he suddenly clammed up and proposed talking about something else. Evidently, his programmers hadn’t programmed him with the ability to answer even a superficial question about his interests.
Some readers may object that I am being a little harsh on Eugene. “After all,” they might argue, “he’s just a chatterbot – a simple little program, designed to simulate five minutes of human-like conversation in a way that fools one-third of his interlocutors into thinking that he’s a real boy, typing away at a keyboard in Ukraine. You’re expecting too much of him.” But the point I’m making here is that Eugene isn’t conversing, in any meanigful sense of the word; he’s merely obfuscating in a cleverly programmed way, and his displays of “bad attitude” are simply a desperate maneuver designed to hide his cognitive limitations. He has no “remembrance of things past,” he is unable to answer simple questions about other people’s opinions, and he knows next to nothing about the world around him. Whatever Eugene possesses, it’s not intelligence.
At the end of it all, what really puzzled me was how so many of the judges who interrogated Eugene failed to spot his cognitive defects, despite their supposedly having had previous experience in tests of this sort, where they had to distinguish human interlocutors from computer programs that were passing themselves off as humans. At least, that’s how “Turing test” tournaments of this sort are normally conducted. Then I came across a brilliant take-down of Eugene Goostman by Mike Masnick at Techdirt, titled, No, A ‘Supercomputer’ Did NOT Pass The Turing Test For The First Time And Everyone Should Know Better, which contained the following illuminating pieces of information (bold emphases are mine):
Okay, almost everything about the story is bogus. Let’s dig in:
1. It’s not a “supercomputer,” it’s a chatbot. It’s a script made to mimic human conversation. There is no intelligence, artificial or not involved. It’s just a chatbot.
2. Plenty of other chatbots have similarly claimed to have “passed” the Turing test in the past (often with higher ratings). Here’s a story from three years ago about another bot, Cleverbot, “passing” the Turing Test by convincing 59% of judges it was human (much higher than the 33% Eugene Goostman) claims.
3. It “beat” the Turing test here by “gaming” the rules — by telling people the computer was a 13-year-old boy from Ukraine in order to mentally explain away odd responses.
4. The “rules” of the Turing test always seem to change. Hell, Turing’s original test was quite different anyway.
5. As Chris Dixon points out, you don’t get to run a single test with judges that you picked and declare you accomplished something. That’s just not how it’s done. If someone claimed to have created nuclear fusion or cured cancer, you’d wait for some peer review and repeat tests under other circumstances before buying it, right?
6. The whole concept of the Turing Test itself is kind of a joke. While it’s fun to think about, creating a chatbot that can fool humans is not really the same thing as creating artificial intelligence. Many in the AI world look on the Turing Test as a needless distraction.
So the judges were hand-picked by the event organizer, Professor Kevin Warwick of Reading University? Perhaps that explains their ineptitude. And who were these judges, anyway? Andrew Orlowski, writing in The Register, provides some answers, in a highly entertaining article titled, World to Captain Cyborg on ‘Turing test’ stunt: You’re Rumbled:
The chatbot’s developer had tried and failed many times to convince humans it was human. This time, the academic luminaries chosen to judge the Test included a retired advertising being with no scientific background (now a Lib Dem peer) and, um … the TV actor and former shoemaker Robert Llewellyn, whose cybernetics qualifications consist of having played the neurotic robot Kryten in Red Dwarf.
Orlowski pithily explains why Eugene Goostman’s feat should not impress us:
Warwick’s latest stunt is a vivid example of Jaron Lanier’s observation that to be impressed by computer AI, we first have to make ourselves pretty stupid.
“You can’t tell if a machine has gotten smarter or if you’ve just lowered your own standards of intelligence to such a degree that the machine seems smart,” wrote Lanier in You Are Not A Gadget. “People degrade themselves in order to make machines seem smart all the time.”
In his article for Techdirt, Michael Masnick adds that the event organizer, Professor Kevin Warwick, has a long history of self-promotion (see here and here), which the press should have been alert to in the first place. “Instead,” he writes, “as is all too typical with Warwick claims, the press went nutty over it, including publications that should know better.” Which publications are we talking about here? The Verge, Venture Beat, Yahoo Tech, NBC News, Washington Post, The Independent, PC World, The Wire, Gizmodo, ZDNet, Ars Technica,
The Guardian, CNET, Computerworld and Science Alert. Masnick gleefully mocks these publications for their gullibility, by listing the headlines at the top of their articles. Some of the silliest: “Turing Test Bested, Robot Overlords Creep Closer” (Yahoo Tech); “Turing Test breakthrough as super-computer becomes first to convince us it’s human” (The Independent). Three of the fifteen publications described Eugene Goostman as a supercomputer. So much for accurate reporting.
Kelly Oakes, in a recent article in BuzzFeed titled, No, A Computer Did Not Just Pass The Turing Test (June 9, 2014), points out some more interesting facts:
1. The 30% pass mark did not come from Turing himself….
2. Turing probably did not intend a real test to have a five-minute time limit….
And most interestingly of all:
Eugene’s creators themselves even recognise that the test isn’t really able to address the question Turing set out to answer. In a 2009 book, they said: “Turing’s Test is, actually, no more than a joke of that genius British mathematician. This game has nothing (or very little) in common with the question ‘Can machines think?'”
Curious, I dug up the article in Google books online, to see if the quote was supplied in its full context. Here’s what the authors, Eugene Demchenko and Vladimir Veselov, have to say in the conclusion to their article, “Who Fools Whom? The Great Mystification, or Methodological Issues in Making Fools of Human Beings” (Parsing the Turing Test: Philosophical and Methodological Issues in the Quest for the Thinking Computer, edited by Robert Epstein, Gary Roberts and Grace Beber, Springer, 2009, ISBN 978-1-4020-9624-2, chapter 26, pp. 447-459):
The first, and possibly most important thing that we have to say is that the Turing’s Test is, actually, no more than a joke of that genius British mathematician. This game has nothing (or very little) in common with the question “Can machines think?” – but does it make Turing’s test senseless then?
“The imitation game” is exciting, amusing, and highly intelligent – but it is nothing more than a game. Do not expect that passing it means anything more than that some bot was luckier than the rest, or that there were more simpletons among judges this time (or that some of them were “interested” in promoting some particular bot). (p. 458)
The authors also seem well aware of the tendency of some judges to assess chatterbots pretending to be children very leniently. (Their latest creation, Eugene Goostman, claims to be a 13-year-old boy.) They write:
The sympathetic: Reading Loebner prize logs, you may discover that some judges act incredibly gently to children – no matter real or virtual. And even if a child-bot says stupid things, does not answer any question, and spits random phrases about its mom, dad, and home pet, it gets high marks anyway. To be sure, if a child-bot tells them “I have a cat”, they will not ask in return, “May I eat it for breakfast?” just to see the reaction. They love children very much! (p. 449)
This is somewhat delicate: is it appropriate to cheat judges with chatterbots that imitate small children, people that are stuck on some crazy idea, or foreigners that have the same answer to almost any question? “Sorry, nicht verstehen! Ich bin bad speak English!” Well, let us look at what some judges say to these handicapped bots. One bot that imitated a little girl was asked by a judge: “Is your mother a hooker?” Oh yes, this is a very kind way to determine whom you are talking to. Actually, the judges usually split into two categories when they talk to such bots. The first appear to be afraid of any possibility that they might actually be talking to a child or some mentally ill person, and do not try to confuse it. They tend not to mark it low for the same reason. The other group is usually annoyed with all this inane conversation and acts accordingly. Anyway, as a method of gaining the Loebner prize, these bots seem quite perceptive, but they are much closer to “natural stupidity” than to “artificial intelligence”. (p. 453)
“Natural stupidity.” The phrase says it all, really.
Readers will recall that the contest held at the University of Reading this year only required 30% of judges to be fooled, which is a very low threshold. It is hardly surprising that 30% of the judges would have had a soft spot for children, and that they would have therefore tended to go easy on a chatterbot like Eugene Goostman.
I’ll let the final word go to Gary Marcus, a professor of cognitive science at NYU and the author of an article in the New Yorker titled, What comes after the Turing Test? (June 9, 2014):
A postmortem of Goostman’s performance from 2012 reports that the program succeeded by executing a series of “ploys” designed to mask the program’s limitations. When Goostman is out of its depth — which is most of the time — it attempts to “change the subject if possible … asking questions, steer[ing] the conversation, [and] occasionally throw[ing] in some humour.” All these feints show up even in short conversations like the one above.
It’s easy to see how an untrained judge might mistake wit for reality, but once you have an understanding of how this sort of system works, the constant misdirection and deflection becomes obvious, even irritating. The illusion, in other words, is fleeting…
In terms of practical significance for artificial intelligence, though, passing the Turing Test means little. As I wrote last year on this site:
The winners aren’t genuinely intelligent; instead, they tend to be more like parlor tricks, and they’re almost inherently deceitful. If a person asks a machine “How tall are you?” and the machine wants to win the Turing test, it has no choice but to confabulate. It has turned out, in fact, that the winners tend to use bluster and misdirection far more than anything approximating true intelligence.
… Chatterbots like Goostman can hold a short conversation about TV, but only by bluffing. (When asked what “Cheers” was about, it responded, “How should I know, I haven’t watched the show.”) But no existing program — not Watson, not Goostman, not Siri — can currently come close to doing what any bright, real teenager can do: watch an episode of “The Simpsons,” and tell us when to laugh.