Graphcore Launches 3rd-Gen AI With Wafer-On-Wafer (WoW!) Technology

by | Mar 11, 2022 | In the News

Startup also teases $120M Brain-Scale AI “Good Computer” For 2024

Remember Intel’s old Tick-Tock processor roadmap? The company alternated between two approaches, bringing out a new generation featuring architectural enhancements, then subsequently shrinking that chip into a new smaller process node. Both approaches typically deliver about 15% better performance. Historically these are the two approaches chip makers have always had at their disposal to keep on the Moore’s Law train.

But now there is a third approach being pioneered by AI unicorn Graphcore and TSMC. The idea is to use two stacked wafers, one with the Graphcore Colossus logic die and one for efficient power delivery. Then TSMC slices the double wafer into dies for normal packaging, in contrast to the full-sized wafer approach of another startup, Cerebras.

What is a BOW IPU?

Basically, Graphcore developed a new version of the Colossus chip, used in the 2nd generation IPU, that has power connectivity to the sandwiched 2nd power delivery wafer. This approach connects the transistors to power over far shorter distances compared to the traditional power regulators alongside the chip’s package. By adding deep trench capacitors in the power delivery die, right next to the processing cores and memory, Graphcore is able to deliver power much more efficiently – enabling 350 TeraFLOPS of AI compute and delivering nearly 40% more performance across a wide range of AI models, all at 16% better power efficiency.

Graphcore’s resulting new BOW Intelligence Processing Unit (IPU) delivers better performance without any programmer-visible changes to the chip’s logic, and without requiring a new manufacturing process node. All this translates to lower development costs for Graphcore, faster time to market, and zero code changes for developers. To our knowledge, this has never been attempted nor accomplished in a commercially-available semiconductor.

Graphcore said they can deliver the BOW POD256 now, with the POD1024 shipping soon. Graphcore

As for system configurations, Graphcore was able to convert all their existing PODs to BOW, thanks to the compatibility with the existing IPU Machine. Also, Graphcore said that there is no increase in pricing, which will further extend what the company claims is superior price/performance leadership. The flagship Bow Pod256 delivers more than 89 PetaFLOPS of AI compute, while the Bow POD1024 delivers 350 PetaFLOPS of AI compute, with enough memory across the complex to handle the largest AI models currently in development.

In a welcome change, Graphcore included references to customers who already have access to the new platform, including several cloud service providers and Pacific Northwest National Labs. “For instance, we are pursuing applications in computational chemistry and cybersecurity applications. This year, Graphcore systems have allowed us to significantly reduce both training and inference times from days to hours for these applications,” said Sutanay Choudhury, co-director of PNNL’s Computational and Theoretical Chemistry Institute

What, and Who, is “Good”?

Back in 1965, the computer science pioneer Jack Good was the first person to describe a machine that would exceed the capability of our brain in his paper, Speculations Concerning the First Ultra-Intelligent Machine. In honor of this pioneer, Graphcore has named it’s first planned Exa-scale AI system the “Good Computer”, a system that would exceed the human brain in terms of the number of parameters, or synapses. Other companies including NVIDIA and Cerebras have outlined their plans for brain-scale computing as well so this level of computation is already moving from fantasy to reality.

While the company didn’t disclose technology details, it did share that they expect this system to deliver over 10 Exa-Flops of AI floating-point compute capacity, with 4 petabytes of memory over 10 petabytes/second of memory bandwidth, and support for AI models of up to 500 trillion parameters. The system would cost about $120 million. Since there is no way Graphcore would build this speculatively, it is safe to assume they have several prospective clients lined up to buy and run one of these beasts. Otherwise, Graphcore would be foolish to announce these plans only to disappoint the market and investors in just two years.

The Good Computer could deliver over 10 Exa-flops of floating-point performance and cost upwards of $120M. Graphcore

What are the implications for the industry?

Surely, TSMC is already exploring other customers for the WoW technology, and we expect that effort will be successful. Getting 40% better performance without taping out a new architecture or to a new, smaller manufacturing node is astounding. Moreover, we believe that this wafer stacking technique represents many more opportunities beyond power delivery. One could imagine, for example, that I/O could be added to the second wafer, or perhaps in the longer term logic could be added in a third wafer for amazing density and performance, if one can figure out how to cool it. And the potential of combining the multi-wafer approach with full wafer-scale devices a la Cerebras makes our heads spin .


Clearly the emergence and value of heterogeneous computation and acceleration has opened up a Pandora’s box of new approaches to silicon design and manufacturing. We believe that Graphcore has begun to cross the chasm, with customers now going on-the-record with significant results. Yes, NVIDIA is still the champion, especially in software and per-device performance. But Graphcore is steadily finding its own niches where it can shine brightly. And the WoW technology appears to have significant promise that will be fun to watch!