Part II · Chapter 8

Hype

IBM Watson, self-driving promises, and voice assistants — the marketing of intelligence. → When deep learning escaped the labs and became a marketing department.

“I, for one, welcome our new computer overlords.” — Ken Jennings, written beneath his Final Jeopardy! answer, February 16, 2011

The strangest thing about the machine that beat Ken Jennings was where it sat. The cameras showed a sleek black panel between the two human champions, a glowing avatar of swirling threads that pulsed when the computer was thinking and shifted color when it grew confident. That avatar was theater. The actual machine lived in a separate room down the hall, ten racks of IBM Power 750 servers, ninety of them, drawing enough electricity to heat a house and enough cooling to chill one. It had no eyes and no ears. It could not hear Alex Trebek read the clues; the questions were fed to it as text the instant they appeared on the players’ screens. It could not see the board. What it could do, for a few days in February 2011, was answer trivia faster and more accurately than the two best Jeopardy! players who had ever lived, and do it on national television in front of an audience that had never watched a computer think before.

The match ran across three nights, February 14 through 16. Jennings had once won seventy-four games in a row, the longest streak in the show’s history. Brad Rutter had never lost to a human and had banked more prize money than anyone. They were not chosen because they were good; they were chosen because they were the ceiling. IBM wanted to lose to them or beat them, with nothing ambiguous in between. By the end of the second night the outcome was no longer in doubt. Watson, the machine, had built a lead that the humans could not close. On the final clue of the final night, after the answer was already decided, Jennings wrote out his response and added a line beneath it, a joke borrowed from The Simpsons: “I, for one, welcome our new computer overlords.” The studio laughed. The clip went everywhere. For most of the people who saw it, it was the first time a computer had ever looked, on camera, like it might actually be smart.

It was a genuine achievement, and it is worth being precise about what kind. Watson was not a deep-learning system. It did not learn to play Jeopardy! the way Alex Krizhevsky’s network had learned to recognize a cat, by training on examples until the right behavior emerged from the weights. Watson was a vast hand-built engineering project named DeepQA, the work of two dozen IBM researchers led by a computer scientist named David Ferrucci over four years. It ran hundreds of separate algorithms in parallel, each generating candidate answers and evidence, then scored and combined them to produce a single confident guess with a probability attached. It leaned on statistics and machine learning in places, but its core was the older tradition of carefully assembled rules and curated knowledge, the kind of artificial intelligence that the connectionists in Toronto and New York had spent their careers fighting. The irony would matter later. The machine that taught the public to fear and admire AI was built on the very approach that the breakthrough in Lake Tahoe, two months earlier, was about to render obsolete.

None of that registered with the audience, and none of it registered with IBM’s marketing department, which understood exactly what it had. A century-old company that sold mainframes and consulting contracts had just produced the most-watched demonstration of machine intelligence in history. The challenge had cost IBM something on the order of a few million dollars to stage, against an advertising value that was incalculable. The question inside the company was no longer whether Watson was impressive. It was how to sell it.

The answer IBM settled on was that Watson was a general machine, and the trivia had only been a proving ground. The same system that could parse a punning Jeopardy! clue, the reasoning went, could parse a patient’s medical chart, a body of legal precedent, a quarterly earnings report. IBM rebranded the technology as a cognitive computing platform and built a business unit around it. The most ambitious bet was in medicine. If Watson could read the entire corpus of cancer research, every journal article and clinical trial, far more than any oncologist could hold in a human head, then it could read a patient’s record and recommend the best treatment. IBM called it Watson Health and described it, in the press and to hospital executives, as a partner that would help doctors cure cancer.

In October 2013 the University of Texas MD Anderson Cancer Center, one of the most respected oncology hospitals in the world, signed on. The plan was to build an “Oncology Expert Advisor,” a Watson-powered tool that would guide treatment for leukemia patients and eventually others. The promise was enormous and the language was unguarded. Executives spoke of moonshots and of a tool that could think alongside the best physicians. For a few years the partnership was IBM’s proof of the idea, the demonstration that the Jeopardy! magic could be turned into something that saved lives.

It did not work. The medical reality was harder than the demo had suggested in every direction at once. Watson had triumphed on Jeopardy! because trivia is a closed world: the question is unambiguous, the answer is a single fact, and a right answer exists and can be checked. Cancer treatment is the opposite. The data lives in unstructured doctors’ notes full of abbreviations and contradictions; the right answer is contested even among experts; and the consequences of a confident wrong recommendation are not a lost dollar amount but a dead patient. Feeding the system real medical records turned out to be a massive, manual, expensive undertaking, and the recommendations that came out were sometimes generic, sometimes wrong, and rarely better than what a competent oncologist already knew. By 2016 MD Anderson had quietly stopped working with IBM on the project. A University of Texas internal audit, released early in 2017, found that the center had spent more than sixty million dollars on the effort and that the tool was not ready for use on patients, all before the hospital had even integrated its new electronic medical record system, which the Watson tool would have needed to talk to anyway. The flagship had run aground before it left the harbor.

MD Anderson was the most visible failure but not the only one. Reporting that followed found that Watson for Oncology had been trained not on a sweep of the world’s cancer literature but in large part on the treatment preferences of a small group of doctors at a single hospital, Memorial Sloan Kettering, which the system then dispensed as if they were the considered judgment of all of medicine. In July 2018 the health-news outlet STAT reported, citing internal IBM documents, that the system had recommended treatments described inside the company as “unsafe and incorrect.” The gap between the brochure and the bedside was the whole story.

What had happened was a particular kind of corporate transformation, and it would repeat across the industry for the next decade. A real research result, narrow and hard-won, escaped the lab and met a sales force. The sales force did not lie, exactly. It extrapolated. It took a system that did one thing genuinely well and described the thing it wished the system could do, on the assumption that the engineers would close the gap before any customer noticed. The customers noticed. The engineers, who knew the limits of what they had built, watched their careful work repackaged into promises they had never made and could not keep. Inside IBM the researchers had a phrase for the cancer recommendations they were being asked to stand behind. They were not confident. Marketing was confident. That split, between what a lab can demonstrate and what its company tells the world, opened in 2011 and has never fully closed.

IBM was the loudest case but far from the only one, because the same years that produced Watson also produced two other technologies that the public could touch, and both arrived wrapped in the same inflated language.

The first was the self-driving car. Google had begun a secret autonomous-vehicle project in 2009, run out of the same Google X division that would later house the cat-detector experiment, led by Sebastian Thrun, a Stanford roboticist who had won a Pentagon driving challenge across the Mojave Desert. By 2010 the cars were quietly logging miles on California highways with a safety driver behind the wheel, and the early progress was real and astonishing. A car that could stay in its lane, read a traffic light, and brake for a cyclist was a genuine triumph of perception, and deep learning was increasingly the engine of that perception. The trouble came in the timeline. Thrun and others spoke as though full autonomy, a car with no steering wheel that could go anywhere a human could, was a handful of years away. In 2012 Sergey Brin predicted that ordinary people would have access to self-driving cars within five years. Elon Musk, whose Tesla shipped its Autopilot driver-assist feature in 2015, told audiences repeatedly that a Tesla would be able to drive itself coast to coast, with no human touching the controls, very soon, and named dates that came and went. The early-2010s consensus, repeated in keynotes and magazine covers, was that the human driver was nearly extinct.

The early progress had been deceptive in a way that is easy to understand only in hindsight. Getting a car to handle ninety percent of driving turned out to be a few years of work. Getting it to handle the last fraction, the construction zone with a cop waving traffic through a red light, the plastic bag that looks like a rock, the pedestrian who makes eye contact and then steps out anyway, turned out to be the actual problem, and the actual problem was most of the difficulty hiding behind a flattering demo. The same shape as Watson: the closed, clean cases were nearly solved, and the open, messy world was where intelligence actually lived. Google’s project survived, spun out as Waymo in 2016, and a decade later did run real driverless taxis in a few cities, which is no small thing. But the coast-to-coast Tesla and the steering-wheel-free car of 2017 did not arrive in 2017, and the gap between the promise and the road became one of the most expensive lessons in the technology’s history.

The second technology the public could touch was the voice assistant, and here the hype was aimed directly at the consumer. Apple bought a small company called Siri and shipped its assistant inside the iPhone 4S in October 2011, eight months after Watson’s win, marketing it as a virtual aide that understood plain spoken English and answered back. Amazon went further in November 2014 with the Echo, a cylindrical speaker that listened for the name “Alexa” and promised to run a household by voice, and Google folded its own assistant into its phones and, later, its own speakers. The advertising sold understanding. The pitch was that you could talk to these devices the way you talked to a person, and they would comprehend.

The speech recognition underneath was, by this point, genuinely transformed, and deep learning deserved much of the credit for it. The accuracy of turning sound into text had improved dramatically, enough that the devices usually heard the words correctly. But hearing the words and understanding the request are different problems, and the second one was nowhere near solved. Ask Siri to set a timer and it worked. Ask it anything that required actually following a chain of meaning, and it fell back on a canned line or a web search, exposing the seam between the marketing and the machine. The public learned the limits quickly. The assistants became, for most people, voice-controlled remote controls for timers, weather, and music, useful but a long way from the thinking companion the commercials had implied. The word “intelligence” had been attached to a narrow competence and sold as a broad one.

The people most uncomfortable with all of this were the researchers who had spent decades getting the underlying methods to work. They had lived through two AI winters, both of which had begun with exactly this pattern: a real result, oversold by people who did not understand its limits, followed by a backlash when the promises failed and funding fled. Geoffrey Hinton and Yann LeCun, now inside Google and Facebook respectively, had watched the word “neural” become radioactive once before, in the 1990s, and they were wary of seeing it happen again, this time with their own work as the bait. The danger of hype went well past embarrassment. The next disappointment, they feared, would be blamed on the science rather than the sales pitch, and the whole field would be punished for promises it had never authorized.

The discomfort ran deeper than reputation. It was about accuracy, which the working scientists cared about as a matter of professional honor. A neural network that classified images with high accuracy was not “seeing” in any sense a human would recognize, and the researchers knew it. It had no concept of a cat, only a statistical talent for distinguishing the pixels that tended to accompany the label “cat.” Calling that understanding, the way the marketing did, was worse than imprecise. It set the public up to expect a kind of competence the systems did not have, which guaranteed eventual disappointment and obscured the genuinely remarkable thing that had actually been accomplished. LeCun in particular grew fond, in this period and after, of puncturing the inflation, reminding audiences that the most advanced AI of the day had less common sense than a house cat, that a teenager learns to drive a car in twenty hours and the machines needed millions of miles, that the systems were narrow specialists and nothing close to general minds. Hinton’s worry, when it came, would run the other way, less about whether the machines were overhyped than about what might happen if one day they were not. But that was years off, and for now he kept it to himself.

The gap that opened in these years had a peculiar structure. The actual science was, if anything, ahead of where most outsiders understood it to be; the breakthrough in Lake Tahoe was real, and the techniques were about to generalize across one field after another. The marketing was also, in a sense, ahead, promising machines that could cure cancer and drive coast to coast. But the two were ahead in different directions. The science was ahead on narrow, measurable tasks, the closed worlds where a right answer existed and could be checked. The marketing was ahead on the open, human world, the messy domain of judgment and meaning and consequence, which is exactly where the science was furthest behind. IBM had sold a Jeopardy! champion as an oncologist. The car companies had sold a highway lane-keeper as a chauffeur. The phone makers had sold a speech recognizer as a confidant. In each case the demonstration was real and the product was a different thing entirely.

There was a version of this story in which the hype was harmless, a froth of advertising that would settle once the products matured. That is not how it went. The overselling shaped what got built and what got funded, drew billions of dollars toward applications that were not ready, and seeded a public expectation that the technology would inevitably fail to meet. The disillusionment took years to climb out of. IBM’s Watson Health, after years of losses, would eventually be sold off for parts. The self-driving timeline would slip by a decade or more. The voice assistants would settle into their modest competence and stay there for years.

But the deepest consequence of the hype was not the disappointment it guaranteed. It was the reaction it provoked from a small group of thinkers who looked at the same demonstrations and drew the opposite conclusion. Where the marketing departments saw a machine that could not yet do enough, and the working researchers saw a narrow tool oversold, these thinkers saw something else in the trajectory: a technology improving fast, escaping the labs, and being handed to companies that plainly did not understand or respect its limits. Their fear ran in the opposite direction from everyone else’s. They had begun to worry, with a seriousness that the people selling Watson would have found absurd, that one day the machines might do far too much.