Neuron Makers
Part IX · Chapter 39

The Quant Builds a Lab

A Chinese hedge-fund founder stockpiles Nvidia GPUs and builds, in two years, the lab that will frighten Silicon Valley and Wall Street. → How the most disruptive AI lab came from outside the system.

“We often say that there is a gap of one or two years between Chinese and American AI, but the real gap is the difference between originality and imitation.” — Liang Wenfeng, interviewed by 36Kr’s “Waves,” July 2024

In the summer of 2024, a Chinese technology reporter sat down across from a man almost no one in the West had heard of and asked him why he was doing something that, by every rule of the industry he worked in, made no sense. Liang Wenfeng ran a quantitative hedge fund in Hangzhou. His business was statistical arbitrage, the practice of using machines to find tiny pricing errors in financial markets and harvest them at scale before anyone else noticed. It had made him and his partners rich. And then, instead of compounding that money the way a fund manager is supposed to, he had begun pouring it into a research lab that had no product, no revenue, no plan to raise outside capital, and no obvious way of ever paying for itself. The lab was called DeepSeek. When the reporter pressed him on the economics of it, Liang gave an answer that read, even at the time, less like a business plan than a creed.

The gap between Chinese and American artificial intelligence, he said, was not really a gap in money or even in talent. People liked to say China was one or two years behind. He thought that framing missed the point. The real difference was between originality and imitation. China had spent two decades as a follower, taking ideas invented elsewhere and applying them faster and cheaper. If that did not change, he said, China would always be only a follower. He was not building DeepSeek to catch up. He was building it to find out whether a Chinese lab could invent something first.

This was an unusual thing for a hedge-fund founder to say, and an even more unusual thing to spend a fortune proving. Liang gave almost no interviews. The 36Kr conversation was one of only about two substantive public appearances he made before his name became famous, and the people who did meet him described someone who fit no template the industry had on file. He was no showman, unlike the American lab founders who lived on conference stages and social media, and no politically connected operator working the corridors in Beijing. He was a quiet, intense engineer who talked about the work and almost nothing else. One profile would later call him a tech madman. To understand how he got there, and why the lab he built would, within six months of that interview, briefly become the most consequential AI company on earth, you have to start with the money, and with how he had learned to think about machines that learn.

Liang was born in 1985 in the Zhanjiang area of Guangdong province, in the far south of the country, a long way from Beijing and Shenzhen and the centers of Chinese technology. He studied electronic information and communication engineering at Zhejiang University, one of the country’s strong technical schools, and he came up through the world of algorithmic trading rather than the world of academic computer science. In the mid-2010s, with a group of his university classmates, he co-founded a quantitative hedge fund called High-Flyer. The Chinese name, 幻方, carries a sense of the fantastical, a magic square. The fund used machine-learning models to trade Chinese equities, and it grew quickly. Within a few years its assets under management climbed past 100 billion yuan, making High-Flyer one of the largest quant funds in the country.

Running a quant fund taught Liang a particular lesson, one that turned out to matter enormously later. The thing that made a fund like High-Flyer competitive was not any single brilliant idea. It was infrastructure. To find signals in market data and act on them before rivals did, you needed enormous computing power, and you needed to wring every last drop of performance out of the hardware you had. High-Flyer became, in effect, a company that was very good at building and operating large GPU clusters and at writing the low-level code that made those clusters run fast. In 2019 the fund spun up a dedicated AI research arm and began assembling supercomputers it named Fire-Flyer. The second of these, Fire-Flyer 2, was reportedly budgeted at around a billion yuan, and by the time it was done Liang had reportedly acquired some ten thousand of Nvidia’s A100 graphics chips, the workhorse processor of the deep-learning boom.

The timing of that purchase would prove decisive. High-Flyer bought its A100s before October 2022. That date matters because of what happened in Washington.

On October 7, 2022, the U.S. Bureau of Industry and Security issued a set of export controls aimed squarely at China’s ability to build advanced AI. The rules drew lines based on a chip’s raw computing performance and the speed at which it could talk to other chips, and they were calibrated to block exactly the processors that frontier AI training requires: Nvidia’s A100 and its newer, faster successor, the H100. The logic was straightforward. The United States believed its lead in artificial intelligence rested on a lead in the specialized hardware that trained the models, and that lead in turn rested on Nvidia, an American company. Deny China the chips, the thinking went, and you deny China the frontier.

Nvidia, which did not want to lose the Chinese market, responded the way a company responds when a regulation is written in terms of measurable thresholds. It engineered chips that fell just under the lines. The A800 and the H800 were deliberately hobbled versions of the A100 and H100, with their chip-to-chip communication speeds throttled down enough to satisfy the letter of the rule while remaining usable for training. For a year, this loophole stayed open. A Chinese lab that moved quickly could still buy large quantities of H800s, and DeepSeek did, stockpiling them before October 2023, when the Bureau tightened the rules again and closed the gap that the A800 and H800 had slipped through.

By the time the controls had fully hardened, then, Liang’s operation already held something most observers did not realize it held: a substantial cluster of high-end Nvidia silicon, legally acquired at the moment of purchase, sitting in data centers in China while the rest of the country’s AI labs scrambled for whatever compute they could find. The hedge fund had become an accidental armory.

DeepSeek itself was formally founded on July 17, 2023, spun out of High-Flyer as a separate entity and wholly funded by it. There were no venture capitalists, no plans for an initial public offering, no outside board pushing for a return. This was the structural fact that gave Liang his freedom, and it is the thing the dollar figures that came later tend to obscure. He had a profitable hedge fund underneath him, willing to absorb the cost of a research lab indefinitely. When he told the Waves interviewer that “money has never been the problem for us; bans on the shipment of advanced chips are the problem,” he was not posturing. He had the rarest thing in AI: a deep, patient pool of capital that answered to no one but him, and a team that already knew how to make GPUs run hard.

What he did with that freedom looked, at first, like a series of competent open-source releases that the wider world barely noticed. On November 2, 2023, DeepSeek put out DeepSeek Coder, a family of code-generation models released openly. On November 29, it released its first general-purpose large language models, DeepSeek LLM, in seven-billion and sixty-seven-billion-parameter sizes. In early 2024 came DeepSeek-MoE, the lab’s first public use of a mixture-of-experts design, the architecture that would become its signature. None of these set off alarms in Silicon Valley. They were the kind of solid, derivative work that confirmed the prevailing assumption: Chinese labs were capable engineers who followed where the American frontier led.

Then, in May 2024, DeepSeek did something that got people’s attention, though not yet for the right reasons. It released DeepSeek-V2 and priced access to the model through its API at a level that was almost insulting to competitors: roughly one yuan per million tokens of input, a fraction of what anyone else charged. The move detonated a price war inside China. Within days, Alibaba, Baidu, Tencent, ByteDance, and Zhipu all slashed their own prices to keep up. Chinese commentators gave DeepSeek a nickname that captured both the admiration and the unease: the Pinduoduo of AI, after the discount e-commerce platform that had bludgeoned its way to dominance by underpricing everyone. The nickname framed DeepSeek as a ruthless price-cutter. It missed what was actually happening, which was that DeepSeek could charge so little because it had built a model that was genuinely cheaper to run, and it had done that on purpose.

This is the part of the story that the later panic flattened, and it is worth slowing down for, because the efficiency was real even when the headline number about it was not.

The constraint Liang faced was the constraint Washington had imposed. He could not get unlimited quantities of the fastest chips. His H800s were deliberately bandwidth-limited; the throttled communication between chips was exactly the bottleneck that hurts most when you are training a giant model across thousands of processors at once. An American lab with racks of unrestricted H100s could paper over inefficiency with brute force. DeepSeek could not. So it was forced to do something American labs, flush with compute, had less reason to do: make the training itself dramatically more efficient.

There is a counterintuitive thing about constraint that the DeepSeek story illustrates cleanly. The lab everyone assumed was at a disadvantage because of the chip controls had, in a narrow sense, been handed an advantage by them. Abundance breeds waste. A team that can always buy more compute will reach for more compute before it reaches for a cleverer algorithm, because more compute is the faster, surer path. A team that cannot has to think harder. The export controls had taken the easy path away from DeepSeek and left it only the hard one, and the hard one, it turned out, led somewhere the well-supplied American labs had less incentive to go.

The DeepSeek team attacked the problem from several directions at once, and the results showed up in the model they released on December 26, 2024, DeepSeek-V3, alongside a technical report posted to the arXiv preprint server. V3 was a mixture-of-experts model with 671 billion total parameters but only 37 billion of them activated for any given token. The idea behind a mixture of experts is intuitive once you see it: instead of running every word through the entire enormous network, you train many smaller specialized sub-networks and, for each piece of input, route it to just the few “experts” best suited to handle it. The full model is huge, which makes it capable, but the slice doing work at any moment is small, which makes it cheap. DeepSeek pushed this design hard. It also introduced an attention mechanism it called multi-head latent attention, which compressed the memory the model needed to keep track of what it had already read, and it trained the model using eight-bit floating-point numbers, a lower-precision format that halves the memory and bandwidth costs of training compared with the standard sixteen-bit approach, if you can keep it numerically stable. Keeping it stable across a model that size was the hard engineering, and it was exactly the kind of low-level, squeeze-the-hardware work that High-Flyer’s quant background had trained the team to do.

The technical report contained a number that would, a month later, travel around the world detached from every qualification its authors had attached to it. DeepSeek reported that the final pre-training run for V3 had consumed about 2.788 million hours of H800 GPU time, which the report converted, at an assumed rental rate of two dollars per GPU-hour, to roughly 5.576 million dollars. Five and a half million dollars. For a model that scored on major benchmarks in the neighborhood of systems that American labs had spent vastly more to build.

It is important to be precise about what that figure was and was not, because the imprecision is the whole story of what came next. DeepSeek’s own report said plainly that the 5.576 million dollars covered the final training run only. It explicitly excluded the cost of all the earlier research, the failed experiments, the architecture work, the data preparation, and, above all, the hardware itself, the thousands of GPUs that had to exist before a single training run could begin. The number was real. It was also narrow. It was the cost of the last lap, not the cost of the car or the team or the years of practice. To say DeepSeek built a frontier model for 5.576 million dollars was a little like saying a transcontinental flight costs only the price of the jet fuel.

When outside analysts tried to reckon the true cost of the whole operation, they arrived at figures orders of magnitude larger. The research firm SemiAnalysis estimated that DeepSeek’s broader compute footprint amounted to something like fifty thousand Nvidia Hopper-generation chips of various kinds, representing on the order of 1.6 billion dollars in server capital expenditure and hundreds of millions more in operating costs, with the lab’s best researchers reportedly paid more than a million dollars a year. Those numbers were estimates, not figures DeepSeek confirmed, and they should be treated as such. But they pointed at a truth that the 5.576 million dollar figure obscured: this was not a scrappy startup that had stumbled onto a miracle in a garage. It was the lavishly capitalized research arm of one of China’s largest hedge funds, sitting on a GPU stockpile it had built before the door closed.

Two facts sat side by side here, and any honest account had to carry both. The efficiency breakthroughs were genuine and significant; DeepSeek really had figured out how to train a competitive model using far less compute than the prevailing assumption said was necessary. And the claim that it had done so for the price of a nice house in California was misleading, a marginal cost dressed up as a total cost. The collision of those two facts, the real engineering and the unreal sticker price, was the loaded weapon. It only needed a moment to go off.

What made the engineering more than an internal curiosity was the decision about what to do with it. DeepSeek released its models openly, weights and technical reports both, for anyone to download, inspect, and run. This was not the careful, gated, terms-and-conditions openness that Meta had built around Llama. DeepSeek simply put the models out under permissive licenses and published how it had done the work. The choice flowed directly from Liang’s conviction, the one he had laid out in that summer interview, that in a field moving as fast as this one, the advantage a company gained from keeping its work secret was temporary. A closed model, he argued, bought you a head start measured in months. An open one bought you something more durable: a community building on top of your work, a reputation, and a position as the foundation others stood on. He was betting that openness was strategy, not charity, and that the strategy would compound.

He was not alone in China in making that bet, which is part of why the chapter that DeepSeek would soon write was not really about a single company. By the end of 2024, a cohort of Chinese labs had concluded that open weights were the way to win attention and influence in a market they could not yet win on raw frontier capability or on access to the best chips. The reasoning was partly defensive and partly cultural. A lab that gave its models away built goodwill and a developer base it could not have bought; it also planted a flag, signaling that Chinese AI was a contributor to the global commons rather than a walled-off rival. And in a country where the largest technology firms had spent the prior years under intense regulatory pressure, a research lab that published openly and charged little looked less like a threat to be reined in than a national asset to be celebrated. DeepSeek was the boldest of the cohort but not the only one. Alibaba’s Qwen team was shipping open model after open model; Moonshot AI, led by Yang Zhilin, was building its Kimi line; Zhipu, MiniMax, ByteDance’s Doubao team, and Kai-Fu Lee’s startup 01.AI were all in the field. The American export-control regime had been designed to choke off China’s frontier ambitions. What it had produced instead was a generation of Chinese labs that had turned constraint into a method, optimizing relentlessly for efficiency and giving their work away to build position. The policy meant to slow China down had, at minimum, changed the direction it ran.

Jensen Huang, whose company sat at the center of all of it, understood the bind better than the policymakers did. Nvidia made the chips the controls were written to deny. It engineered the cut-down versions to keep selling into China, and it lobbied, persistently, to keep that market open. The controls were a structural force shaping the whole story, but they were also a structure that the actors inside it kept finding ways around, and Huang was the figure who profited no matter which way the argument resolved. His machines were the currency of the boom in America and the prize being rationed in China at the same time.

When DeepSeek shipped V3 on the day after Christmas in 2024, the response was muted relative to what was coming. Researchers who read the technical report recognized the engineering for what it was. A few American AI scientists noted, with some unease, that an open Chinese model had arrived at the closed American frontier and shown its work in the process. But there was no panic, no market reaction, no calls from the White House. V3 was a chat-and-completion model, impressive but not the kind of thing that captured the public imagination. The world had, by late 2024, grown used to a new frontier model arriving every few weeks.

What the world did not yet know was that DeepSeek had built V3 to be a base. Inside the lab, the team was already using it as the foundation for something else, a model that would do more than answer. It would reason, thinking step by step before it spoke, in the new style that OpenAI had introduced with its o1 system and that the whole industry was now racing to match. That model was being trained through reinforcement learning, taught to reason by reward rather than by imitation. It would be released openly, like everything else DeepSeek made, under one of the most permissive licenses available. And it would carry a price tag that turned a quiet engineering achievement into an international event.

The efficiency was real. The cost claim was primed. The reasoning model was nearly ready. Liang Wenfeng, the reclusive quant who believed China could invent rather than imitate, had built, in barely eighteen months and almost entirely outside the system everyone else was playing in, the lab that was about to give Wall Street the worst Monday in its history.