Part I · Chapter 1

Genesis

Frank Rosenblatt's Perceptron in 1958, the Navy's thinking machine, and Marvin Minsky's campaign that buried it. → The origin story and the original sin that sidelined neural nets for fifteen years.

“New Navy Device Learns by Doing; Psychologist Shows Embryo of Computer Designed to Read and Grow Wiser.” — The New York Times, July 8, 1958

On the afternoon of July 7, 1958, a small crowd of reporters gathered in a room at the United States Weather Bureau in Washington to watch a machine learn. The machine was not, strictly speaking, in the room. What sat in front of them was a connection to an IBM 704, a room-filling mainframe in a building elsewhere, and the man running the demonstration was a thirty-year-old psychologist named Frank Rosenblatt. The Navy had paid for the work and the Navy had called the press, and the press, given a machine and an officer in uniform and a confident young scientist, did what the press does. The next morning, readers of the country’s most serious newspaper learned that the Navy had built the embryo of a computer it expected would one day walk, talk, see, write, reproduce itself, and be conscious of its existence.

The demonstration itself was modest. Rosenblatt fed the machine punch cards marked on the left or the right, and at first it guessed. It was wrong about as often as a coin. Then, card after card, it began to be right, and after fifty cards it was reliably telling left from right, having been told nothing about left or right except whether each of its guesses had been correct. It had not been programmed to do this in the way computers were then programmed, with a human writing out every rule in advance. Nobody had written a rule. The machine had found the rule by adjusting itself. To the reporters watching, and to Rosenblatt, that distinction was the whole point. A machine that followed instructions was a tool. A machine that improved with experience was something closer to a mind.

The reporters did not write down “a machine improved with experience.” They wrote down what the Navy told them the machine would someday become, and the gap between the two was enormous. The perceptron, Rosenblatt and the officers present suggested, was a first step toward devices that would recognize people by sight and greet them by name, that would hear one language and write down another, that would be built rather than programmed and would design their own successors. Newspapers across the country picked it up and ran with it, some under headlines invoking electronic brains and mechanical men, the standard vocabulary of an era that had just learned the word “computer” and could not yet tell a calculator from a creature. The coverage was not entirely the press’s invention. It was, to a meaningful degree, the message the Navy and Rosenblatt had handed them. That mattered, because the bill for a prophecy is paid not by the prophet who makes it but by everyone who has to live with the disappointment when it does not arrive on time.

Rosenblatt called his invention the perceptron, and the idea behind it was both very old and, in 1958, genuinely new. The old part came from biology. The brain is a web of neurons, each one collecting signals from thousands of others and firing, or not firing, depending on whether the incoming signals add up past some threshold. Two researchers, Warren McCulloch and Walter Pitts, had shown in 1943 that you could describe a neuron as a simple logical device, and a few years later the psychologist Donald Hebb had proposed how such neurons might learn: connections that fire together grow stronger. Rosenblatt’s contribution was to take these ideas off the page and build them into a procedure a machine could run. The perceptron was a single artificial neuron, or a layer of them, wired to a set of inputs through connections whose strengths, called weights, could be turned up or down. When it guessed wrong, an automatic rule nudged the weights in the direction that would have produced the right answer. Do that enough times, with enough examples, and the weights settle into a configuration that solves the problem. The machine was not told the answer. It was shown examples and corrected, and it converged.

This was learning in a sense narrow enough to define precisely, which was exactly why it mattered. For decades the question of whether a machine could think had been a matter for philosophers and science-fiction writers. Rosenblatt had turned it into a question an engineer could answer with a stopwatch and a stack of cards. And the answer, on that July afternoon, was yes, at least for the small problem in front of it. The perceptron was a thinking machine in the most literal and least romantic way, and the gap between those two readings of the word “thinking” would define the next thirty years of the field, and very nearly kill it.

The most remarkable thing about the perceptron was not the demonstration but a guarantee that came with it. Rosenblatt had proved a theorem, the perceptron convergence theorem, which said that if a problem could be solved by a single-layer perceptron at all, the learning rule was certain to find a solution in a finite number of steps. This was not a hope or an empirical observation. It was a mathematical promise. Feed the machine examples of a problem it was capable of solving, correct it when it erred, and it would converge on a working set of weights, every time, no matter how badly it started. For a field that had spent years arguing in vague terms about whether machines could in principle learn, this was a different kind of statement entirely. It put a learning machine on the same footing as a proof. The trouble, as Minsky would later make painfully clear, was hidden in the phrase “a problem it was capable of solving.” The theorem said nothing about which problems those were, and the set turned out to be smaller than anyone watching the Weather Bureau demonstration imagined.

Rosenblatt was an unlikely figure to be at the center of any of this. He had trained as a psychologist at Cornell, written his dissertation on the statistics of personality, and ended up at the Cornell Aeronautical Laboratory in Buffalo, a contract research outfit where the questions tended to come with a defense application attached. He was a polymath of the type the era produced and then mostly forgot: he flew small planes, did astronomy, wrote music, and built, in addition to the perceptron, a machine he hoped might one day transfer learning from one rat’s brain to another’s. He believed, sincerely and without much hedging, that the brain was a machine and that the perceptron was a step toward understanding it. When he spoke to reporters, the hedges that a more cautious scientist would have inserted tended to fall away, and the Navy’s public-relations apparatus was happy to let them fall. The result was the most famous case of overpromising in the early history of artificial intelligence, and it would be held against the whole idea long after Rosenblatt himself was gone.

His real interest, the one underneath the headlines, was never the machine for its own sake. Rosenblatt had come to the perceptron from psychology, and what he wanted from it was a theory of the brain. He did not think the perceptron was a clever engineering trick that happened to recognize shapes. He thought it was a stripped-down, testable model of how a nervous system might actually store and organize the world it perceived, and he titled the long, dense paper he published that year accordingly: a probabilistic model for information storage and organization in the brain. The word “brain” was not a flourish. It was the subject. Where Minsky and the engineers around him would come to see the brain as one possible implementation of intelligence and not necessarily the best one, Rosenblatt saw the brain as the thing to be explained and the machine as the explanation. This difference in motive ran underneath everything that followed. The two camps disagreed about more than how to build a thinking machine. They disagreed about whether the goal was to build a mind or to understand one, and a scientist who is trying to model the brain will tolerate a system he cannot fully explain in a way that an engineer trying to build a reliable product will not.

He was, by the accounts of the students who passed through his lab at Cornell, a generous and inspiring teacher, the kind who talked about the work as though it were the most exciting thing in the world because to him it was. He took students up in his plane. He let them chase strange ideas. The rat-brain experiments, in which he ground up the brains of trained rats and injected the extract into untrained ones to see whether learning could be chemically transferred, sound today like something out of a pulp magazine, and they did not work, but they came from the same conviction that drove the perceptron: that mind was matter, that learning left a physical trace, and that a sufficiently clever experiment could get hold of it. The conviction was right even where the experiments were wrong. It was the kind of mind that produces a genuine breakthrough and a genuine embarrassment from the same impulse, and Rosenblatt produced both.

The hardware came later, and when it did it was impressive in a way the IBM 704 demonstration had not been. The Mark I Perceptron, built at the Cornell lab with Office of Naval Research money, was a dedicated machine the size of a wall of refrigerators. Its eye was a grid of four hundred photocells, twenty by twenty, that took in an image the way a crude camera would. Those cells fed into a layer of artificial neurons through a tangle of wires, and the weight on each connection was a physical thing: a potentiometer, a dial, whose setting determined how strongly that input pushed the neuron toward firing. The genius and the absurdity of the machine were the same detail. When the perceptron learned, it did not update a number in memory. It turned the dials. Small electric motors, hundreds of them, physically rotated the potentiometers to new positions, so that learning in the Mark I was an audible, mechanical event, a room full of motors whirring as the machine corrected its mistakes. It could learn to tell simple shapes apart, a square from a circle, a printed letter from another. It was, for its moment, a real and working pattern recognizer, and it was nothing at all like the conscious, self-reproducing creature the headlines had promised.

One feature of the design would look prophetic decades later, though no one could have known it then. The wiring between the photocells and the first bank of units was deliberately set at random and left fixed; only the final layer of weights, the dials the motors turned, was allowed to learn. This was partly a practical compromise, since nobody knew how to train the earlier connections, and partly a guess about how brains might work, with a layer of fixed feature-detectors feeding a layer that learned. The structure foreshadowed, in crude form, the layered networks that would eventually dominate the field. But it also contained the seed of the machine’s undoing. The features the perceptron could combine were only the features its fixed random wiring happened to hand it, and a single trainable layer could draw only one kind of boundary through them. The architecture that made the Mark I buildable in 1960, with motors and dials and surplus parts, was the same architecture whose limits Minsky would expose in 1969.

That distance between the headline and the dials is the chapter’s whole drama, and it had a second character, who had been sitting in the audience of the field from the beginning. Marvin Minsky had grown up in New York and gone, as Rosenblatt had, to the Bronx High School of Science, the public school that funneled the city’s most gifted children toward technical careers and that produced, in those years, an improbable density of future scientists. The two were nearly contemporaries, a year or so apart, two clever boys from the same hallway who would spend their adult lives on opposite sides of the same question. Minsky was, if anything, the more brilliant. As a young researcher he had built one of the first machines that simulated a network of neurons, a contraption called the SNARC wired together out of surplus parts and vacuum tubes. He had, in other words, started where Rosenblatt was. He did not stay there.

By the early 1960s Minsky had moved to MIT and concluded that the connectionist approach, the project of building intelligence out of brain-like networks of weighted connections, was a dead end. His own bet had gone the other way, toward what came to be called symbolic AI: the idea that intelligence was a matter of manipulating symbols and logical rules, that thinking was something like proof, and that the way to build a mind was to write down, explicitly and in advance, the knowledge and the reasoning a mind would need. The two visions differed in their engineering, but the deeper split was philosophical. One held that intelligence emerged from the bottom up, from many simple units adjusting themselves against data, with no one able to say exactly what any single weight meant. The other held that intelligence was built from the top down, in clean, inspectable rules a human could read. Rosenblatt’s machine was a black box that worked for reasons even its maker could not fully articulate. Minsky wanted a glass box. Sixty-five years later the field would still be arguing about which of them was right, but in 1969 the argument was about to be settled, at least as a matter of funding, in Minsky’s favor.

Minsky was not a marginal figure taking a swing at a famous one. In the summer of 1956 he had been one of the organizers of the workshop at Dartmouth College that gave the field its name, the meeting where a handful of mathematicians and engineers spent two months proposing that “every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.” Minsky was at the center of that founding optimism and would spend his career as one of its standard-bearers. He had built his neuron machine and then walked away from neurons, and the walking away was deliberate. He had concluded that the brain’s architecture was a distraction, that the way to make a machine intelligent was to give it the explicit symbolic knowledge a person uses to reason, and that fiddling with weighted connections was a slow road that led, at best, to pattern recognition and never to thought. By 1969 he ran one of the two or three most powerful AI laboratories in the world, and the question of which approach deserved the field’s scarce money was not abstract to him. It was a question about his own program’s future, and he answered it with the full weight of his authority.

The clash between the two men ran deeper than the science. Rosenblatt and Minsky tangled at conferences in the early 1960s, in sharp exchanges in front of colleagues, the kind of personal heat that gathers around a young field where the prizes are reputations and grant dollars and there is not yet enough of either to go around. They had known each other since boyhood, two products of the same intense New York school, and there is a particular edge to a rivalry between people who started together and chose opposite roads. Each could see in the other the path he had not taken. Minsky had built a neural-network machine and abandoned the approach; Rosenblatt had taken that same approach and built it into a public sensation. Rosenblatt did not help his own case. He continued to make large claims, published a dense and difficult book on the theory of brain mechanisms, and presented himself as something between a scientist and a prophet, which made him an easy target for a critic who preferred his arguments tight and his promises small. Minsky, for his part, went further than disagreement. He set out to prove, with mathematics, that the perceptron could not do what Rosenblatt said it would.

The proof arrived in 1969, in a slim, austere book called Perceptrons, written by Minsky and his MIT colleague Seymour Papert. It was, by the standards of the field, a serious and rigorous piece of work, and that was what made it lethal. The book did not rant. It calculated. Its central result concerned a class of problems that a simple perceptron, the kind Rosenblatt had demonstrated, provably could not solve. The cleanest example was a logical operation called exclusive-or, XOR: a function that returns true when exactly one of two inputs is on, and false when both or neither are. A child can learn it in seconds. A single-layer perceptron, Minsky and Papert showed, never can, because the answer cannot be carved out by a single straight dividing line, and a single layer of weights is, in effect, just a line. The machine that had so impressed the reporters at the Weather Bureau could be defeated by a problem with two inputs.

There was an important qualification, and it would haunt the field for the next fifteen years. The limitation Minsky and Papert proved applied to perceptrons with a single layer of adjustable weights. Stack the neurons into multiple layers, with a hidden layer in the middle, and the network can represent XOR and far more besides. Researchers, including Rosenblatt, knew this even in the 1960s. The catch was that nobody had a reliable way to train such a multi-layer network: the elegant rule that nudged a single layer’s weights toward the right answer did not extend, in any obvious way, to the hidden layers in between. There was no known method to assign blame for an error backward through the layers, to tell an interior neuron how it should have behaved. The promise of deep networks was visible. The means to realize it was missing, and would stay missing, in widely usable form, until 1986. Perceptrons was technically careful about all of this. Its readers were not. The lasting impression the book left, fairly or not, was that neural networks had been mathematically proven to be a toy, and that a serious person should work on something else.

What happened next was less a refutation than an evacuation. Funding agencies, which had poured money into the optimism of the late 1950s, grew skeptical of artificial intelligence broadly and of the brain-inspired branch in particular. The general AI winter that descended in the 1970s was driven by many disappointments, but for the connectionists the chill was specific and personal. Grants dried up. Graduate students were steered toward symbolic methods, where Minsky’s school held the prestige and the funding. The few researchers who kept faith with neural networks found their papers harder to publish and their careers harder to advance. An idea does not die because it has been disproven; it dies because the people who might have developed it are paid to do something else, and the next generation never learns it was alive. That is roughly what happened. For the better part of two decades, the most promising lead in the entire project of building thinking machines was treated as a closed question, settled by a book that had, in fact, settled only the narrow case it analyzed.

The mechanics of this mattered, because the money in early artificial intelligence did not come from many small sources but from a few very large ones, and chief among them was the Pentagon’s research arm. A field funded by a handful of program managers is a field that can be steered by a handful of opinions, and the opinions that counted held that symbolic AI was the serious work and neural networks were the discredited fad of the previous decade. Through the 1970s Washington’s appetite for blue-sky AI research cooled across the board, squeezed by disappointments and by a political climate impatient with research that could not point to a near-term payoff. When the budget tightened, the work nearest the door was the work that already carried a reputation for having failed. Connectionism had that reputation, and Perceptrons had given it. The book did not have to convince every researcher in the field. It only had to convince the small number of people who decided where the next grant went, and on that audience it landed with full force.

Rosenblatt did not live to see any of it reversed. He had moved on from the perceptron to neurobiology, to the experiments on rat brains, drifting toward the edge of a field that was drifting away from him. On July 11, 1971, his forty-third birthday, he drowned in a boating accident on Chesapeake Bay. He was not famous when he died. The machine that had made every front page in 1958 was, by 1971, a footnote in textbooks that mentioned it mainly to explain why it had failed. The man who had built the first learning machine to capture the public imagination died with his life’s idea in disgrace, and the disgrace was, in a precise and unusual sense, manufactured. The perceptron had not stopped working. The world had stopped looking.

It is tempting, with the benefit of everything that came after, to read this as a simple story of a visionary crushed by a reactionary, of Rosenblatt right and Minsky wrong. The record does not support so clean a verdict. Minsky was not a fool attacking a genius; he was a genius who had started in the same place and reached a defensible, mathematically grounded conclusion that the approach as it then stood could not deliver what its champion claimed. He was correct about the single-layer perceptron. His mistake, and Papert’s, lay in how confidently the limited result was allowed to generalize, and in how thoroughly the field internalized a verdict the mathematics did not actually warrant. Rosenblatt, for his part, had oversold a real thing into a fantasy, and the backlash he provoked was partly the bill coming due for the Navy’s talk of conscious, self-reproducing machines. Two men from the same Bronx classroom, both right about something and both wrong about something, had between them set the schedule of the entire field back by a generation.

What the episode established, more than any technical fact, was a pattern the rest of this story would repeat. The promise would always run ahead of the demonstration, and the demonstration would always run ahead of the understanding. A working result would be packaged into a prophecy, the prophecy would invite a reckoning, and the reckoning would overshoot. The idea that a machine could learn from examples by adjusting weighted connections was correct in 1958 and is the basis of nearly every system this book will describe. It simply arrived without the two things it needed to prove itself: a way to train networks more than one layer deep, and enough computing power to make the training worth the trouble. Both were decades away. In 1969, with the funding gone and the prestige drained, those decades looked less like a delay than a death.

But ideas that are true have a way of finding the few people stubborn enough to keep working on them when there is no reward for doing so, and there were such people. In Britain, a young man who had bounced from physics to psychology to carpentry was about to enter a graduate program in artificial intelligence at the exact moment the field’s first winter set in around him, drawn to the discredited idea of learning machines precisely because almost no one else believed in it anymore. His name was Geoffrey Hinton, and he would spend the next thirty years proving that the verdict of 1969 had been wrong.