- BQO - https://www.bigquestionsonline.com -

How Did Alan Turing Propose to Test Whether a Computer Can Think?

Editors’ Note: This is the third in a series of four essays, written by Jack Copeland, spotlighting Alan Turing, who is considered the father of modern computing and whose work breaking German codes changed the course of World War II.

How could researchers tell if a computer—whether a humanoid robot or a disembodied supercomputer—is capable of thought? This is not an easy question. For one thing neuroscience is still in its infancy. Scientists don’t know exactly what is going on in our brains when we think about tomorrow’s weather, or plan out a trip to the beach—let alone when we write poetry, or do complex mathematics in our minds. But even if we did know everything there is to know about the functioning of the brain, we might still be left completely uncertain as to whether entities without a human (or mammalian) brain could think. Imagine that a party of extraterrestrials find their way to Earth, and impress us with their mathematics and poetry. We discover they have no organ resembling a human brain; inside they are just a seething mixture of gases, say. Does the fact that these hypothetical aliens contain nothing like human brain cells imply that they do not think? Or is their mathematics and poetry proof enough that they must think—and so also proof that the mammalian brain is not the only way of doing whatever it is that we call thinking?

Of course, this imaginary scenario about aliens is supposed to sharpen up a question that’s much nearer to home. For alien, substitute computer. When computers start to impress us with their poetry and creative mathematics—if they don’t already—is this evidence that they can think? Or do we have to probe more deeply, and examine the inner processes responsible for producing the poetry and the mathematics, before we can say whether or not the computer is thinking? Deeper probing wouldn’t necessarily help much in the case of the aliens—because ex hypothesi the processes going on inside them are nothing like what goes on in the human brain. Even if we never managed to understand the complex gaseous processes occurring inside the aliens, we might nevertheless come to feel fully convinced that they think, because of the way they lead their lives and the way they interact with us. So does this mean that in order to tell whether a computer thinks, we only have to look at what it does—at how good its poetry is—without caring about what processes are going on inside it?

That was certainly what Alan Turing believed. He suggested a kind of driving test for thinking, a viva voce examination that pays no attention at all to whatever causal processes are going on inside the candidate—just as the examiner in a driving test cares only about the candidate’s automobile-handling behavior, and not at all about the nature of the internal processes that produce the behavior. Turing called his test the “imitation game,” but nowadays it is known universally as the Turing test.

Turing’s test works equally well for computers or aliens. It involves three players: the candidate and two human beings. One of the humans is the examiner, or “judge,” and the other is the “foil,” or comparator. The idea of the test is that the judge must try to figure out which of the other two participants is which, human or non-human, simply by chatting with them. The session is repeated a number of times, using different judges and foils, and if the judges are mistaken often enough about which contestant is which, the computer (or alien) is said to have passed the test. Turing stipulated that the people selected as judges “should not be expert about machines.”

Turing imagined conducting these conversations via an old-fashioned teleprinter, but nowadays we would use email or text messages. Apart from chatting, the judges must be kept strictly out of contact with the contestants—no peeping is allowed. Nor, obviously, are the judges allowed to measure the candidates’ magnetic fields, or their internal temperatures, or their processing speeds. Only Q & A is permitted, and the judges must not bring any scientific equipment along to the venue. Justifying his test, Turing said: “The question and answer method seems to be suitable for introducing almost any one of the fields of human endeavour that we wish to include” (and his own list of suitable fields for testing included mathematics, chess, poetry, and flirting). Turing added drolly, “We do not wish to penalize the machine for its inability to shine in beauty competitions,” making the point that his question-and-answer test excludes irrelevant factors.

The judges may ask questions as wide-ranging and penetrating as they like, and the computer is permitted to use “all sorts of tricks” to force a wrong identification, Turing said. So smart moves for the computer would be to reply “No” in response to “Are you a computer?” and to follow a request to multiply one huge number by another with a long pause and an incorrect answer—but a plausibly incorrect answer, not simply a random number. In order to fend off especially awkward questioning, the computer might even pretend to be from a different (human) culture than the judge. In fact, it is a good idea to select the test personnel so that, from time to time, the judge and the foil are themselves from different cultures. Here is Turing’s own example, dating from 1950, of the sort of conversation that could occur between a judge and a computer that successfully evades identification:
Judge:     In the first line of your sonnet which reads “Shall I compare thee to a  summer’s day,” would not “a spring day” do as well or better?

Machine:  It wouldn’t scan.

Judge:     How about “a winter’s day?” That would scan all right.

Machine: Yes, but nobody wants to be compared to a winter’s day.

Judge:     Would you say Mr. Pickwick reminded you of Christmas?

Machine:  In a way.

Judge:    Yet Christmas is a winter’s day, and I do not think Mr Pickwick would mind the comparison.

Machine: I don’t think you’re serious. By a winter’s day one means a typical winter’s day, rather than a special one like Christmas.       

Turing was a little cagey about what would actually be demonstrated if a computer were to pass his test. He said that the question “Can machines pass the test?'”is “not the same as ‘Do machines think,’ but (he continued) it seems near enough for our present purpose, and raises much the same difficulties.” In one of his philosophical papers, he even cast doubt on the meaningfulness of the question “Can machines think?” saying (rather rashly) that the question is “too meaningless to deserve discussion.” However, he himself indulged in such discussion with gusto. In fact he spoke very positively about the project of “programming a machine to think” (his words), saying “The whole thinking process is still rather mysterious to us, but I believe that the attempt to make a thinking machine will help us greatly in finding out how we think ourselves.”

Turing was also cagey about how long he thought it would be before a computer passes the test. He said (in 1952) that it would be “at least 100 years” before a machine stood any chance of passing his test with no questions barred. This was a sensibly vague prediction, making it clear that Turing appreciated the colossal difficulty of equipping a computer to pass the test. Unfortunately, though, there is an urban myth that Turing predicted machines would pass his test by the end of the twentieth century—with the result that he has been unfairly criticised not only for being wrong, but also for being ‘far too optimistic about the task of programming computers to achieve a command of natural language equivalent to that of every normal person’, one of his critics, Martin Davis, said. Given Turing’s actual words (“at least 100 years”) this is misguided criticism.

There is another widespread misunderstanding concerning what Turing said. He is repeatedly described in the (now gigantic) literature about the Turing test as having intended his test to form a definition of thinking. However, the test does not provide a satisfactory definition of thinking, and so this misunderstanding of Turing’s views lays him open to spurious objections. Turing did make it completely clear that his intention was not to define thinking, saying ‘I don’t really see that we need to agree on a definition at all’, but his words were not heeded. “I don’t want to give a definition of thinking,” he said, “but if I had to I should probably be unable to say anything more about it than that it was a sort of buzzing that went on inside my head.”

Someone who takes Turing’s test to be intended as a definition of thinking will find it easy to object to the definition, since an entity that thinks could fail the test. For example, a thinking alien might fail simply because its responses are distinctively non-human. However, since Turing didn’t intend his test as a definition, this objection misses the point. Like many perfectly good tests, Turing’s test is informative if the candidate passes, but uninformative if the candidate fails. If you fail an academic exam, it might be because you didn’t know the material, or because you had terrible flu on the day of the exam, or for some other reason—but if you pass fair and square, then you have unquestionably demonstrated that you know the material. Similarly, if a computer passes Turing’s test then the computer thinks, but if it fails, nothing can be concluded.

One currently influential criticism of the Turing test is based on this mistaken idea that Turing intended his test as a definition of thinking. The criticism is this: a gigantic database storing every conceivable (finite) English conversation could, in principle, pass the Turing test (assuming the test is held in English). Whatever the judge says to the database, the database’s operating system just searches for the appropriate stored conversation and regurgitates the canned reply to what the judge has said. As philosopher Ned Block put it, this database no more thinks than a jukebox does, yet in principle it would succeed in passing the Turing test. Block agrees that this hypothetical database is in fact “too vast to exist”—it simply could not be built and operated in the real world, since the total number of possible conversations is astronomical—but he maintains that, nevertheless, this hypothetical counterexample proves the Turing test is faulty.

It’s true that the database example would be a problem if the Turing test were supposed to be a definition of thinking, since the definition would entail that this monster database thinks, when obviously it does not. But the test is not supposed to be a definition and the database example is in fact harmless. Turing’s interest was the real computational world, and the unthinking database could not pass the Turing test in the real world—only in a sort of fairyland, where the laws of the universe would be very different. In the real world, there might simply not be enough atoms in existence for this huge store of information to be constructed; and even if it could be, it would operate so slowly—because of the vast numbers of stored conversations that must be searched—as to be easily distinguishable from a human conversationalist. In fact, the judge and the foil might die before the database produced more than its first few responses.

Another famous (but misguided) criticism of the Turing test is by philosopher John Searle. Searle is one of AI’s greatest critics, and a leading exponent of the view that running a computer program can never be sufficient to produce thought. His objection to the Turing test is simply stated. Let’s imagine that a team in China, say, produces a computer program that successfully passes a Turing test in Chinese. Searle ingeniously proposes an independent method for testing whether running the program really produces thought. This is to run the program on a human computer and then ask the human, “Since you are running the program—does it enable you to understand the Chinese?” Searle imagines himself as the human computer. He is in a room provided with many rulebooks containing the program written out in plain English; and he has an unlimited supply of paper and pencils. As with every computer program, the individual steps in the program are all simple binary operations that a human being can easily carry out using pencil and paper, given enough time.

In Searle’s Turing test scenario, the judge writes his or her remarks on paper, in Chinese characters, and pushes these into the room through a slot labeled INPUT. Inside the room, Searle painstakingly follows the zillions of instructions in the rulebooks and eventually pushes more Chinese characters through a slot labeled OUTPUT. As far as the judge is concerned, these symbols are a thoughtful, intelligent response to the input. But when Searle, a monolingual English speaker, is asked whether running the program is enabling him to understand the Chinese characters, he replies “No, they’re all just squiggles and squoggles to me—I have no idea what they mean.” Yet he is doing everything relevant that an electronic computer running the program would do: The program is literally running on a human computer.

This is Searle’s famous “Chinese Room” thought experiment. He says the thought experiment shows that running a mere computer program can never produce thought or understanding, even though the program may pass the Turing test. However, there is a subtle fallacy. Is Searle in his role as human computer the right person to tell us whether running the program produces understanding? After all, there is another conversationalist in the Chinese Room—the program itself, whose replies to the judge’s questions Searle delivers through the output slot. If the judge asks (in Chinese) “Please tell me your name,” the program responds “My name is Amy Chung.” And if the judge asks “Amy Chung, do you understand these Chinese characters,” the program responds “Yes, I certainly do!”

Should we believe the program when it says “Yes, I am able to think and understand?” This is effectively the very same question that we started out with—is a computer really capable of thought? So Searle’s gedankenexperiment has uselessly taken us round in a circle. Far from providing a means of settling this question in the negative, the Chinese Room thought experiment leaves the question dangling unanswered. There is nothing in the Chinese Room scenario that can help us decide whether or not to believe the program’s pronouncement “I think.” It certainly does not follow from the fact that Searle (beavering away in the room) cannot understand the Chinese characters that Amy Chung does not understand them.

Alan Turing’s test has been attacked by some of the sharpest minds in the business. To date, however, it stands unrefuted. In fact, it’s the only viable proposal on the table for testing whether a computer is capable of thought.

Questions for Discussion:

Will computers pass the Turing test in our lifetimes?

Could a computer really think?

Does a flawless simulation of thinking count as thinking?

Is there any systematic way of unmasking the computer in the Turing test (without infringing the rules of the test)?

Could IBM’s ‘Watson’ pass the Turing test?

Is there a difference between being able to think and being conscious?

Is there a difference between being intelligent and being able to think?

If computers do eventually pass the Turing test, what will follow for the human race?

An AI expert once predicted that, if we are lucky, the superintelligent computers of the future may keep us as pets. How real is the danger that Artificial Intelligences will take over?

Are human beings soft cuddly computers?

If the ability to think is not exclusive to humans, is there any quality that would distinguish human beings from the products of technology?

Does Searle’s Chinese Room thought experiment show that thinking is more than number-crunching and symbol-crunching?

If everything counts as a computer, then the human brain is a computer, and so it’s trivially true that computers can think. Is the brain really a computer? Is there anything that isn’t a computer, or does every object have some level of description relative to which it is a computer?

If computers can think, then can neuroscience tell us anything about thinking?