NVIDIA Provides More Details On Selene Supercomputer

by | Sep 3, 2020 | AI and Machine Learning, In the News

Last May, when NVIDIA unveiled the Ampere GPU architecture, the company announced a new supercomputer named Selene that ranks #7 in the world in total performance. Selene is now the fastest industrial system in the USA and is the second-most energy-efficient system ever built. The air-cooled Selene was constructed in a standard data center in only three weeks, compared to the 9-12-month time frame typically required to create a typical supercomputer installation. This fast deployment was made possible by NVIDIA’s plug and play DGX system that houses AMD CPUs, A100 GPUs, and Mellanox HDDR networking. Shortly afterward, the University of Florida announced that they had stood up their supercomputer also built on the DGX A100 platform. Consequently, NVIDIA has achieved CEO Jensen Huang’s claim that the company is not just in the GPU business, but has climbed up the stack to compete for end-to-end data center procurements.

NVIDIA’s Selene supercomputer provides a sandbox for NVIDIA hardware and software engineers and experience for NVIDIA’s team to learn about data center best practices and innovations.  image: NVIDIA

What is Selene, and why should you care?

Selene is not NVIDIA’s first foray into DGX-based supercomputers, the first being Saturn V announced at the 2017 launch of the Volta GPU. By building their private supercomputer, NVIDIA has learned many lessons that will help them sell into academia and large cloud infrastructures while providing NVIDIA engineers with a world-class computing platform for product design and software optimization. Saturn V and Selene also act as a reference architecture for potential clients to examine to determine if it would meet their needs, and also to instill confidence in NVIDIA as a tier-one vendor of high-performance infrastructure. Not only was the University of Florida impressed, but Argonne National Labs, Microsoft, and Lockheed Martin have also climbed on board with their own DGX SuperPODs. And the design is open-sourced in the HGX version, allowing any datacenter to build their own if they prefer.

The DGX A100 is the building block for DGX SuperPODS. image: NVIDIA

Many media outlets have described the configuration and installation process, built during a global pandemic with social distancing and teams of only two installers. ZDNet has covered this quite well, for instance, here. I want to focus here on the business implications, which are significant for NVIDIA and its partners.

Armed with experience with the DGX and HGX reference architectures, NVIDIA and their partner network have deftly advanced from offering chips and modules to providing full-scale data centers, including the software, compute infrastructure, networking, and storage. Now a customer can go directly to an NVIDIA Partner Network reseller or a DGX-Ready Data Center Partner to install or gain cloud access to a DGX, DGX POD, or DGX SuperPOD. To be sure, a DGX A100 is not cheap, starting at $199K, but a customer can be up and running in a matter of weeks, not months, and can expect a smooth installation with low risks.

Conclusions

NVIDIA does not (yet) break out their system revenues from the rest of their business, but I would expect this product line will quickly grow to become a significant source of income and profit. After all, NVIDIA has at least a dozen DGX customers so far, and Selene itself is comprised of 280 DGX A100s, amounting to a list price of $56M if someone wants to buy one.

So, the bottom line is that NVIDIA has worked its way up the value chain and stands to earn revenues and profit margins that historically have gone to their OEM partners.

Note: Moor Insights & Strategy writers and editors may have contributed to this article.