Part IV · Chapter 21

X Factor

The forward look written before ChatGPT — the predictions, and the door left open. → The baseline against which everything in the parts that follow can be measured.

“The revolution will not be supervised.” — Yann LeCun, closing slide of his talks on the future of machine learning

The question always came at the end. A man would stand at a lectern in a hotel ballroom or a university hall, walk an audience through the work, the benchmarks, the curves that all bent the same hopeful way, and then a hand would go up somewhere in the seats. The question behind every other question. Where is this going? How fast, how far, and what happens to the rest of us when it gets there? And the man at the lectern, who had spent thirty years being told he was chasing a dead idea and had recently been proven right in the most public way a scientist can be, would do something that looked, at first, like modesty. He would hedge.

Geoffrey Hinton hedged with jokes. By the second half of the 2010s he had become, against every probability of his early career, a public figure, the grand old man of a discipline that had treated his life’s work as a backwater for most of the time he was doing it. People asked him to forecast its future more or less constantly, and he had earned the standing to make sweeping pronouncements. He mostly declined. When he did commit to one, it tended to be specific, falsifiable, and a little reckless, which is a different quality from confident. In 2016, at a gathering on machine learning and the economy in Toronto, he was asked about medicine, and he answered with a line that would trail him for the rest of the decade. People should stop training radiologists now, he said. Deep learning was going to read medical scans better than any human could, and the conclusion was so plain that the responsible thing was to stop producing the humans who would soon have nothing to do. He reached for an image. The radiologist, he said, was like the coyote in the cartoon who has already sprinted off the edge of the cliff and simply hasn’t looked down yet. Five years, he guessed. Maybe ten. The result was already baked in; the world had just not caught up to it.

He was wrong, or at least the clock was, and the way he was wrong became one of the most repeated cautionary tales in the field. Five years on there were more radiologists at work than before, not fewer. The systems that read scans had turned into instruments radiologists picked up, not replacements for the radiologists who picked them up. Nobody drew the lesson that Hinton had been foolish. He was among the most careful minds the discipline had produced and had been right about more, and earlier, than almost anyone alive. The lesson was sharper than that. The man with the best record in the history of the field, forecasting the single application he understood most deeply, had still missed it by a decade and in the wrong direction. If Hinton could not see five years out, the unspoken corollary went, then no one in the room could either.

This was the field as it stood at the close of its first modern decade, in the stretch between the machine that taught itself to play Go and the thing that did not yet have a name. It had won. The argument that had eaten Hinton’s youth, whether neural networks were a serious path to intelligence or a discredited toy that serious people had outgrown, was finished, and his side had taken it so completely that many of the people who had once helped bury the idea were dead, or retired, or quietly retraining themselves on the very methods they had dismissed. The money had arrived in amounts that embarrassed universities. The talent had been identified, bid up, and locked behind the badge readers of corporate labs. The technology was reading handwritten checks and captioning photographs and translating between languages it had never been explicitly taught to pair, and the founders of the labs that owned it were standing on stages saying, with varying degrees of literalness, that they intended to build a mind. And still, when you asked any of them the plain question a curious child would ask, they hedged. Because the people who had built the most consequential technology of the era genuinely did not know what it would become, and the honest ones said so out loud.

There were not many of them. Five or six names carried real weight, and by the end of the decade they had settled into recognizable forecasting positions the way columnists do. You could, by 2019, predict roughly what each would say about the future before he said it. What is striking, going back to what they actually committed to on the record, is how cleanly their disagreements sorted. They agreed almost entirely about the past. They split, hard and along consistent lines, on three questions only: whether scaling would keep working, what was still missing, and how long any of it would take.

The first question turned out to be the one that mattered most, and almost no one with authority framed it correctly. The breakthrough that had restarted everything, the convolutional network that crushed the field’s flagship image contest in 2012, had not been a new idea at all. It was an old idea, decades old, fed more data and run on faster graphics chips that happened to be very good at the arithmetic neural networks required. A reasonable observer might have drawn a blunt conclusion from that: the road ahead was simply more. More data, more compute, bigger models, run harder, until something cracked. A small number of younger researchers had drawn exactly that conclusion and were acting on it inside Google and inside a new lab in San Francisco, building models larger than anything before them and watching the models get reliably, almost tediously better as they grew. But the senior figures, the ones at the lecterns, mostly did not believe scale alone was the answer. They took it for a phase. A useful phase, even an impressive one, but a phase that would soon run dry and expose the need for the deeper ideas they had spent their lives pursuing.

Yann LeCun was the clearest voice on this, and the most fascinating to read back, because he was right and wrong at once in a way that would take years to fully untangle. Through the 2010s LeCun had argued that the supervised learning everyone was celebrating, the kind that learns from millions of human-labeled examples, was a trap dressed as a triumph. A child, he liked to say, does not need to see a thousand tagged photographs of cars to learn what a car is. The child learns by watching the world go by, by building an internal model of how objects move and fall and stay put when no one is looking at them, soaking up the structure of reality with no one narrating it. That, LeCun insisted, was the real prize. Unsupervised learning, or what he had taken to calling self-supervised learning, in which a system learns the shape of the world by trying to predict the parts of it that are hidden from view. He had a slogan, which he dropped onto the closing slide of talk after talk, lifted and twisted from an old protest song. The revolution will not be supervised. He had a metaphor too, and it became the most quoted thing he ever said. If intelligence were a cake, then supervised learning was the icing, and reinforcement learning, the trial-and-error method that had powered the Go machine, was the cherry on top. The cake itself, the dense bulk of it, was self-supervised learning. And nobody, LeCun admitted, knew how to bake the cake.

He was right that the cake was the thing. He was about to be blindsided by who would bake it, and by how ordinary the recipe would look. The architecture that would let self-supervised learning finally work at scale, that would let a model absorb the structure of human language by doing nothing more glamorous than guessing the next word, over and over, across a meaningful fraction of everything ever written, already existed while LeCun was giving those talks. It had been published by a group at Google in 2017, under a title that read like a dare. LeCun, like nearly everyone else holding a microphone, had not yet understood that the cake he kept describing and the recipe sitting quietly in that paper were the same object. His prophecy was sound. The prophet did not recognize its fulfillment when it landed on his own desk.

Demis Hassabis forecast from somewhere else entirely, because he was not really predicting a technology. He was running an institution built around a single sentence, the one his London lab had been founded on: solve intelligence, and then use it to solve everything else. By the late 2010s he had a result, the Go victory in Seoul, that made the sentence sound less like a founder’s pitch and more like a project plan. What stayed with Hassabis about that match was its strangeness, the way a learning system could find things that human beings, with all their accumulated tradition, had simply never seen. He spoke about the future in the calm register of a scientist describing an instrument he was still calibrating. Real general intelligence, he thought, was achievable, perhaps within a couple of decades, and the reason to build it was discovery itself. Curing disease, decoding the folding of proteins, compressing centuries of scientific progress into years. In public he was disciplined about timelines. The people around him understood the mission to be entirely literal. What Hassabis was not doing in those years was building anything an ordinary person could open and use. The lab solved problems; the public did not touch the solutions. That gap, between an intelligence solved in a research building and an intelligence placed in a stranger’s hands, was about to be filled by people who were not in his lab and not on anyone’s list of names.

Yoshua Bengio sat closest to the middle, which suited him. He was the third of the three men who would soon share the field’s highest honor, less famous than Hinton and LeCun, and by most accounts the most purely scientific of the three, a Montreal academic who had stayed an academic while the other two accepted industrial titles. Bengio believed deep learning was missing something large, and unlike most of his colleagues he was precise about what. The systems could perceive but they could not reason. They could recognize a face or a phrase but could not think a problem through in deliberate steps, could not represent cause and effect, could not picture a situation they had never been shown and work out what would happen in it. He framed the gap in the vocabulary of the psychologist Daniel Kahneman, who had split human thought into a fast, automatic mode, the snap judgment, and a slow, effortful mode, the deliberate chain of reasoning. Deep learning, Bengio argued, had built a magnificent version of the fast mode and had not even begun on the slow one. The machines had intuition with no deliberation behind it. Closing that gap was, he believed, the central unsolved problem of the field, and he made a point of not pretending to know how long it would take, or whether the current approach could ever cross it at all.

That left the dissenter, and every field that agrees with itself too comfortably produces one. Gary Marcus had spent the decade as the discipline’s designated heretic, the cognitive scientist who insisted, to the visible irritation of the men at the lecterns, that the whole structure was raised on sand. The substance of that argument, and the onstage clash with Bengio it produced, belongs to its own chapter; what matters for the forecasts is the single line he drew. Neural networks, Marcus held, were pattern-matchers that interpolated beautifully inside their training data and came apart the instant they were pushed outside it, and no quantity of scale would repair a flaw that was architectural rather than incidental. Marcus said scaling would hit a wall. The scaling optimists, the younger ones actually doing the scaling, said not yet, and perhaps not ever. Two camps looked at the identical systems and saw opposite futures, and there was no experiment available to settle the matter, because the experiment was the next ten years, and the next ten years had not happened.

Beneath the technical quarrels ran a geopolitical one that everyone now took seriously, now that the Go matches had set the United States and China openly competing to own the technology. The people at the lecterns had opinions, mostly uneasy ones, and the opinions sat awkwardly against their other commitments. The same scientists who preached that open publishing and free sharing were the engine of the entire enterprise were now being asked whether the enterprise was a contest between nations that could not afford to share anything. They did not resolve the contradiction. They noted it and moved on, which was its own quiet forecast, that the openness of the early years had been a feature of a moment, and the moment might be closing.

The one thing none of them would do, the discipline that united every forecast across all its disagreements, was name a date. Hinton’s radiology timeline was the exception that made the rule famous; it had become a punchline precisely because he had broken it. Everyone else had learned, watching the field ambush its own founders again and again, that timelines were where forecasters went to be humiliated. The history was littered with the wreckage. The first generation had been certain the hard problems would fall within a single career and they had not. The researchers of the 1980s had watched their own confident predictions curdle into a long winter of lost funding and abandoned labs. Even the recent wins had a way of arriving from the wrong direction entirely: the people who built the speech systems had not foreseen the leap in machine vision, and the people who built the vision systems had not foreseen the leap in translation. The field’s record at predicting itself was, on the plain evidence, dismal, and the senior people knew it intimately, which is exactly why the ones with the most authority spent it the least freely. They had been right about the one enormous thing. They had also learned that being right once does not make a person a prophet. It makes him someone the prophecy happened to favor.

So the forecasts, gathered up and read side by side, formed a strange and useful document. Set down in the late 2010s, they amount to a portrait of a discipline that had won its founding argument and had no clear idea what came next. The machines could see and hear and translate and play games at a level beyond any human, and they could not reason, could not understand, could not take a single step outside the world they had been shown. Scale was working, and might keep working, or might stall. The missing ingredient was reasoning, or world models, or common sense, or a body to learn through, or symbols, depending entirely on whom you asked, and no two of them fully agreed. The technology was either months from a wall or decades from a mind, and the same evidence honestly supported both readings. Something large was plainly on its way. No one could say what shape it would take, and the people best positioned to know were the ones most willing to confess that they couldn’t.

There is a particular quality to a field caught in this condition, poised and uncertain and aware of its own blindness, that does not survive contact with whatever comes after. Once the future arrives it rewrites the past in its own image. The open questions of an era get sorted, in hindsight, into the ones that turned out to matter and the ones that turned out to be noise, until it becomes genuinely hard to remember that at the time they had all looked equally alive, equally worth a career. The value of catching the field here, before the sorting, is that you can see what the smartest people in it actually believed when they still had to guess. They believed scale would help for a while and then run out. They believed the cake was self-supervised learning and that no one alive knew how to bake it. They believed reasoning was the missing piece and was years off at best. They believed general intelligence was real and coming and impossible to put on a calendar. And most of them believed, deep down, that the next great leap would be an idea. A new architecture, a new principle, conceived by someone clever in a moment of insight, the way every leap they had personally lived through had been conceived.

The architecture that would prove most of them wrong about most of this had already been written down, in a paper most of them had read and none of them had fully grasped, by a small group of researchers at Google. It did not require a new principle. It required scale, the very thing the senior people had bet against, pointed at prediction, the very thing LeCun had insisted was the prize, applied at a size none of them had imagined anyone would be willing to pay for. The cake was about to be baked, by people whose names appeared on none of the forecasts, and the men at the lecterns would spend the years that followed reacting to a future they had stood closer to than anyone and seen least clearly of all. The door was open. They could all feel the air moving through it. Not one of them could see what was on the other side.