It is all about tighter integration with memory, CPUs, and accelerators for trillion-parameter AI models.
For 12 years, NVIDIA has used its Spring GPU Technology Conference (GTC) to amaze its customers and investors with new GPUs for graphics and application acceleration. But this year, Founder and CEO Jensen Huang surprised an audience of over 100,000 virtual attendees by announcing not a new GPU but a 2023 Arm-based CPU. NVIDIA believes a custom-designed CPU-GPU platform is the only way it can enable the next generation of hyper-scale Artificial Intelligence and begin to approach the level of computer-based “general intelligence.” To this end, NVIDIA announced that the company is developing a platform for AI that connects a forthcoming NVIDIA Arm CPU called Grace, connected to a next-gen GPU over a faster NVLink.
While the company provided few details about Grace itself, let’s look into this significant move’s underlying motivation. There were many other exciting announcements from NVIDIA, such as the Bluefield 3 DPU and a new platform for Enterprise AI, but for now, let’s focus on the big news: Arm.
What did NVIDIA announce, and why?
Jensen says the goal is to create a tightly integrated computational foundation to pursue the next wave of AI innovation: a trillion-parameter computer intelligence. Today’s largest AI model is the Open.ai GPT-3, totaling 170 billion parameters for language processing, requiring over one thousand NVIDIA GPUs hosted by Microsoft Azure. The human brain has about 100 trillion synapses, roughly equivalent to the deep neural network parameters. If successful, the NVIDIA system would be only 100 times slower than the human brain. While this still does not approach general intelligence, it could be an entire order of magnitude larger than currently planned Exascale systems planned by US DOE labs.
Ian Buck, GM and VP of accelerated computing at NVIDIA, made it clear that this is not about competing with other CPUs to solve today’s problems. “Grace will not out-perform an x86 CPU and NVIDIA GPU running today’s MLperf benchmarks,” said Buck. “Rather, the system design will allow us to solve AI problems that are orders of magnitude larger than today’s.” NVIDIA had to rethink the entire system architecture to run a trillion-parameter model efficiently with fast response time.
This all about an integrated systems design for NVIDIA. The magic will come in part from a faster version of the NVLink, delivering 500 GB/S of cache-coherent interconnect from Grace to the GPU, between GPUs, and between Grace CPUs. Since no CPU vendors are building such an interconnect to enable this level of outrageous performance, NVIDIA had no choice but to design it themselves. Also, NVIDIA indicated that Grace would support LPDDR5x, used today in mobile platforms such as Qualcomm. In total, memory access will be about 2 TB/S, or 30X faster than today’s server memory to the GPU. The GPU will continue to depend on High Bandwidth Memory for local data storage.
Grace will ship on an HGX board and in a future server, with four Grace CPUs and GPUs directly interconnect over NVLink. A future NVIDIA DPU will provide the interconnect to combine these servers into hyper-scaled AI machine.
NVIDIA envisions a complete redesign of AI systems, with a dedicated Arm CPU attached to each GPU over a faster interconnect and higher memory bandwidth. Comparing to existing X86 accelerated servers, one can see the dramatic improvements that NVIDIA envisions.
A 20 Exaflop Supercomputer
At least one customer likes the idea, and I expect many more will follow. The Swiss National Computing Center (SCSC) and ETH Zurich will deploy a massive Grace-based system, delivered by HPE Cray, that will provide 20 Exaflops of (16-bit) AI performance for the largest AI networks. That is ten times faster than the fastest DOE system currently envisioned, El Capitan, which is to use AMD CPUs and GPUs in 2023. It is now evident that the US DOE decision to exclude NVIDIA in the first Exascale systems did not foretell the end of NVIDIA leadership in HPC. Far from it, this announcement leapfrogs those competitors by 10X in roughly the same timeframe.
While NVIDIA went to great lengths to ensure that this announcement in no way portends a shift to compete directly with Intel or AMD CPUs, the Grace CPU will certainly displace those CPUs in systems built to handle massive AI models.
Jensen also shared the new company roadmap, that will deliver yearly advances in CPUs, GPUs, and DPUs.
NVIDIA constantly reminds me that its mission is to help customers solve only their most challenging problems. This announcement demonstrates that NVIDIA does not measure itself against the competition, but against the speed of light: perfection. Yes, this zeal now requires the company to design its Arm CPU to enable their vision. The upcoming Grace CPU is not about NVIDIA getting into the CPU business. It is all about re-imagining the data center for trillion-parameter AI models Jensen and his partners envision.