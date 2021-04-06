IBM has come up with an automatic debating system called Project Debater that researches a topic, presents an argument, listens to a human rebuttal and formulates its own rebuttal. But does it pass the Turing test? Or does the Turing test matter anymore?
The Turing test was first introduced in 1950, often cited as year-one for AI research. It asks, “Can machines think?”. Today we’re more interested in machines that can intelligently make restaurant recommendations, drive our car along the tedious highway to and from work, or identify the surprising looking flower we just stumbled upon. These all fit the definition of AI as a machine that can perform a task normally requiring the intelligence of a human. Though as you’ll see below, Turing’s test wasn’t even for intelligence or even for thinking, but rather to determine a test subject’s sex.
The Imitation Game
The Turing test as we know it today is to see if a machine can fool someone into thinking that it’s a human. It involves an interrogator and a machine with the machine hidden from the interrogator. The interrogator asks questions of the machine using only keyboard and screen. The purpose of the interrogator’s questions are to help him to decide if he’s talking to a machine or a human. If he can’t tell then the machine passes the Turing test.
Often the test is done with a number of interrogators and the measure of success is the percentage of interrogators who can’t tell. In one example, to give the machine an advantage, the test was to tell if it was a machine or a 13-year-old Ukrainian boy. The young age excused much of the strangeness in its conversation. It fooled 33% of the interrogators.
Naturally Turing didn’t call his test “the Turing test”. Instead he called it the imitation game, since the goal was to imitate a human. In Turing’s paper, he gives two versions of the test. The first involves three people, the interrogator, a man and a woman. The man and woman sit in a separate room from the interrogator and the communication at Turing’s time was ideally via teleprinter. The goal is for the interrogator to guess who is male and who is female. The man’s goal is to fool the interrogator into making the wrong decision and the woman’s is to help him make the right one.
The second test in Turing’s paper replaces the woman with a machine but the machine is now the deceiver and the man tries to help the interrogator make the right decision. The interrogator still tries to guess who is male and who is female.
But don’t let that goal fool you. The real purpose of the game was as a replacement for his question of “Can a machine think?”. If the game was successful then Turing figured that his question would have been answered. Today, we’re both more sophisticated about what constitutes “thinking” and “intelligence”, and we’re also content with the machine displaying intelligent behavior, whether or not it’s “thinking”. To unpack all this, let’s take IBM’s recent Project Debater under the microscope.
The Great Debater
IBM’s Project Debater is an example of what we’d call a composite AI as opposed to a narrow AI. An example of narrow AI would be to present an image to a neural network and the neural network would label objects in that image, a narrowly defined task. A composite AI, however, performs a more complex task requiring a number of steps, much more akin to a human brain.
Project Debater is first given the motion to be argued. You can read the paper on IBM’s webpage for the details of what it does next but basically it spends 15 minutes researching and formulating a 4-minute opening speech supporting one side of the motion. It also converts the speech to natural language and delivers it to an audience. During those initial 15 minutes, it also compiles leads for the opposing argument and formulates responses. This is in preparation for its later rebuttal. It then listens to its opponents rebuttal, converting it to text using IBM’s own Watson speech-to-text. It analyzes the text and, in combination with the responses it had previously formulated, comes up with its own 4-minute rebuttal. It converts that to speech and ends with a summary 2-minute speech.
All of those steps, some of them considered narrow AI, add up to a composite AI. The whole is done with neural networks along with conventional data mining, processing, and analysis.
The following video is of a live debate between Project Debater and Harish Natarajan, world record holder for the number of debate competitions won. Judge for yourself how well it works.
Does Project Debater pass the Turing test? It didn’t take the formal test, however, you can judge for yourself by imagining reading a transcript of what Project Debater had to say. Could you tell whether it was produced by a machine or a human? If you could mistake it for a human then it may pass the Turing test. It also responds to the human debater’s argument, similar to answering questions in the Turing test.
Keep in mind though that Project Debater had 15 minutes to prepare for the opening speech and no numbers are given on how long it took to come up with the other speeches, so if time-to-answer is a factor then it may lose there. But does it matter?
Does The Turing Test Matter?
Does it matter if any of today’s AIs can pass the Turing test? That’s most often not the goal. Most AIs end up as marketed products, even the ones that don’t start out that way. After all, eventually someone has to pay for the research. As long as they do the job then it doesn’t matter.
IBM’s goal for Project Debater is to produce persuasive arguments and make well informed decisions free of personal bias, a useful tool to sell to businesses and governments. Tesla’s goal for its AI is to drive vehicles. Chatbots abound for handling specific phone and online requests. All of them do something normally requiring the intelligence of a human with varying degrees of success. The test that matters then is whether or not they do their tasks well enough for people to pay for them.
Maybe asking if a machine can think, or even if it can pass for a human, isn’t really relevant. The ways we’re using them require only that they can complete their tasks. Sometimes this can require “human-like” behavior, but most often not. If we’re not using AI to trick people anyway, is the Turing test still relevant?
I still think the big problem with the Turing test is the basic idea that a *human* is the right tester. Humans are notoriously awful at constructing tests for themselves. We construct tests that ignore a *huge* amount of what we actually have learned. It’s like the famous example of asking a human to give a string of random numbers, or the apocryphal “physicist with a barometer” question. I feel like it’s a similar situation here.
I mean, drop an AI into some reality TV show and I get the feeling you’d be able to tell the difference within *minutes*.
Byte magazine once had an article (humorous) that said in effect, machines will never think like humans until they learn Artificial Stupidity.
Machines will have arrived when they understand humor.
The most important idea of the Turing Test is that it provides an answer to the question “can machines think” without having to define what you mean by “think”, but instead simply compare the capabilities of the machine to those of a human.
The setup of the test, as described by Turing is just one possible version to do that. Being able to drive a car from A to B with similar comfort, speed and safety as a human driver is another.
The driving example might be an example of artificial stupidity.
It doesn’t establish that, because it swaps the question to a different one – although related.
It asks “can machines imitate people?”, which does not answer whether they can think, however you define “think”.
Sufficiently advanced imitation is indistinguishable from real thinking. It’s pointless to try to go any further than that.
Project Debater. One small step for computer science, one giant leap for social media.
I think the confusion arises because “AI” has evolved into something distinct from what we used to call AI. We recognize that AI like expert systems and self-driving cars aren’t “thinking” even if they are performing at a high level. What we used to think of as AI is now generally called “AGI” for Artificial General Intelligence. IBM’s debater notwithstanding, even the best AI fails miserably at random human conversation. These AI systems are OK, sometimes even superior to humans at performing expected tasks, but they cannot handle edge cases or switch modes the way humans can. And that has been an issue for self-driving cars, because once you leave the big highway systems the world’s roads are a yarn ball of edge cases.
The distinction between narrow AI and general AGI is pretty straightforward, so I wonder where all the confusion is coming from.
It´s a general problem.
Media because there’s something to sell, and those who don’t want another AI winter.
>The Turing test was first introduced in 1950, often cited as year-one for AI research. It asks, “Can machines think?”.
This isn’t true. Turing never intended it to test the machine, but to test our perception of intelligence because the whole concept was/is too ill-defined to make the question meaningful.
The Turing test is among the most misunderstood, mis-quoted and mis-applied concepts in computer science, right alongside with Moore’s law. Literally everybody gets it wrong, and even when you point out it’s wrong people just stare at you blankly, turn away, and repeat the false version.
Direct AI research into improving humans so our perceptions get better. :-p
I propose another AI test: Can they create content? No, not just writing a few sentences, but creating plots for stories, scripts for movies, or even… memes?
I think this is a barrier that few, if any, AI have crossed, and it sort of lines up with Turing’s methodology. I’m no expert, but I think this concept of content-creation will serve as the basis for evaluating AI in the future.
