2021 was another banner year for AI hardware platform innovations, as we predicted. Ok, not ALL of our prognostications came to pass, notably the stall (ok, death) of the NVIDIA/Arm deal. But others were pretty accurate. What does 2022 hold? We think the landscape evolves in two symbiotic dimensions: Optimized system design, and ever-larger models. In addition, AI startups (including Intel) need to stand up a robust ecosystem development program or prepare to fold up their tents and go home.
We certainly do not anticipate a slower pace of new hardware, but we see a shift from silicon to solutions that solve real-world problems. And a few (GPT-4 anyone?) that do not.
First, a quick stroll down memory lane to set the stage:
- Large Language Models (LLMs) continued to grab headlines in 2021, with GPT-3 and smaller open-source alternatives like GPT-J. J was created by Eleuther AI, a group of researchers who seek to democratize artificial intelligence. Models even larger than GPT-3 were announced by Google, NVIDIA+Microsoft and the Beijing Academy of Artificial Intelligence (BAAI) Wu Dao 2.0, which is currently the largest of the large at over 1 Trillion parameters.
- To support LLMs exceeding tens of trillions of parameters, some vendors have made or announced the first foray into custom system design, notably NVIDIA Grace, Cerebras CS2 and Graphcore POD256 which we covered here in Forbes.
- NVIDIA is beginning to see some real competition, and not just from startups. AMD brought out its second-generation Instinct MI200 GPU for data center acceleration, sporting FP64 performance nearly five-fold faster than the NVIDIA A100. low-precision AI performance remained modest, with AMD prioritizing traditional HPC to support the US DOE Exascale wins it has under its belt. AMD also launched the latest EPYC MilanX CPU, which we expect will actually be launched this month, probably on January 25. But ‘“it’s the ecosystem, stupid!’
- Intel also stepped up its game with the Habana Gaudi training platform at Amazon AWS and UCSD. MLPerf benchmarks still need a lot of tuning, however, and the ecosystem for this chip remains in its infancy; no OneAPI support yet.
- In our view, UK Unicorn Graphcore gained the most in 2021 in earning traction, adding significant software, benchmarks, LLMs, performance improvements, higher scalability, and adopters of their open-source stack including VMWare, HuggingFace, Alibaba, and Baidu.
- Cloud providers Google and AWS made progress with their hardware, especially the Google TPU v4. Each TPU v4 chip provides more than 2X the compute power of a TPU v3 chip – up to 275 peak TFLOPS – and supports a 4,096 TPU v4 cloud on an ultra-fast interconnect that provides 10x the bandwidth per chip at scale compared to a typical GPU-based large scale cluster. But for some reason, Google has not yet made these super-fast systems available to the public, even though the company has been releasing and winning MLPerf benchmarks for over a year.
Ok, that’s the 2021 highlights; what do we see coming in 2022?
- Let’s start with the lay-up: Nobody will take a meaningful chunk of NVIDIA’s AI business. Not for years. Even if a few companies manage to produce faster chips like the TPU-v4, it just won’t matter: NVIDIA enjoys an ecosystem that nobody can touch, and that moat will protect their revenue and margins for the foreseeable future. And nobody is going to walk away from a working solution for 20-40% better performance if that is even achieved. Note that performance claims against NVIDIA are often based on unoptimized benchmark runs and are largely misleading. Don’t get me wrong, there are great challengers like Cerebras, Graphcore, and Habana-Intel, but getting an ecosystem in place will take time. Ok, now that I given away the ending…
- Good news: Those custom systems I mentioned above will begin to hit the market this year. Soon, serious AI will not be running on servers with PCIe GPU cards. There are just too many bottlenecks. Cerebras will ship its Brain-Scale AI platform this year, Graphcore is already shipping theirs but has more tuning in store to get to a trillion parameters, and NVIDIA might ship Grace systems (Arm+DPU+GPU) this year. Ok, the latter is a stretch: competitively speaking, NVIDIA does not need the Hopper GPU until 2023, IMHO. It will be critical to see how NVIDIA enables their key partners, like HPE, Dell, Lenovo, and SuperMicro so they don’t get left out of the next inning in AI.
- Bad news: Let me be frank: there are a number of very-well-funded startups that need to put up or shut up. Ok, they won’t fold this year, perhaps, as they were given a ton of Venture Capital and the world abhors a monopoly. Most are chasing NVIDIA in training, instead of joining the fray for edge AI. But they are taking too long to get to meaningful deployment traction, and most have not invested nearly enough, nor been bold enough, in their software/model ecosystem. Contact us if you want the list!
- Some startups are reaching the threshold of traction, notably Cerebras and Graphcore. Cerebras should be ready to focus on customer success stories in 2022, having built an accessible, if not expensive, system-level solution that has been adopted in HPC and a few enterprises. Graphcore has reached critical mass in the implementation of an open-source development community. I expect both to focus more on large language models this year, where their architectures could provide significant differentiation to a small number of very large customers. Both will now need to develop an inference processing capability, which Graphcore began working on last year. There are significant advantages to running the models on the same hardware that trained them.
- When Intel spins MobileEye out, which is the right move, they will need to consider whether and how to position Habana Goya as an inference processor for the edge. If they want it to survive, it will need a spin (it is 3 years old) and a big ecosystem push. The company should support Goya and Habana Gaudi with OneAPI as part of its revitalized developer program. If they don’t, I’d consider Gaudi dead in the water and a waste of $2B.
- But the big AI story in 2022 will be software, and not just for LLMs. The Enterprise will begin to realize that their entire software stack, billions of lines of code, will need to adopt AI-enabled features such as NLP and voice recognition. NVIDIA just released Enterprise AI Suite 1.1, providing a rich set of tools and models to ease the AI on-ramp. Nobody else is even thinking about this to our knowledge. NVIDIA Omniverse 1.1 is also now available, creating a 3D virtual world for content designers and engineering teams. The Metaverse for gaming is cool and will become a big deal, but content creators and engineers can use Omniverse now to improve productivity and quality. There’s gold in them there hills.
- We do not expect Open.AI to launch GPT-4 (500 times larger than GPT-3) this year, but other organizations will offer increasingly large and increasingly open LLMs for experimentation (see GPT-J). We might even see real-world applications!
Conclusions
2022 will represent a turning point for AI, from a cool tool to essential technology for every business. This creates opportunities for those companies with relatively mature ecosystems to ease adoption and speed time to value. Large Language Models will again generate a lot of buzz, but revenue only for a select few accelerator companies. AI Some AI startups will begin to fail or be acquired, as they and their boards realize they are on a suicide march. And NVIDIA will once again dazzle the world with real solutions and growth.