AMD Narrows The gap With Nvidia In New MLPerf Benchmarks

by | Aug 28, 2024 | In the News

New benchmark results from AMD, Untether AI, Google, Intel, and Nvidia demonstrate the converging AI silicon performance competition. However, system design, networking, and software make AI sing and dance. And that’s where Nvidia excels.

Finally, I can stop whining about AMD’s lack of open AI benchmarks. AMD has published excellent MLPerf inference results for their MI300 GPU, which is competitive with the Nvidia H100, although only on a single benchmark. Canadian startup Untether.ai also published new inference benchmarks showing their power efficiency. Let’s take a look.

The MLPerf Inference 4.1 Benchmark Suite

The MLCommons industry consortium, which controls and publishes the MLPerf benchmarks, has extended the twice-annual inference benchmark suite with a new one for the increasingly popular mixture-of-experts (MoE) AI models. MoE models combine multiple models to improve accuracy and lower the training costs of huge LLM models, like OpenAI’s GPT-4. AMD did not publish an MoE benchmark, but now that they have broken the benchmarking ice, an AMD spokesperson indicated we could see more shortly.

MLCommons has added a mixture of Experts model to its suite of AI benchmarks.

MLCommons has added a mixture of Experts model to its suite of AI benchmarks. NVIDIA and MLCOMMONS

Its is certainly encouraging to see submissions to MLPerf for new processors. Specifically in addition to the Nvidia Blackwell and the first AMD submissions, we now have selected benchmarks for Untether.ai, AMD’s next generation Turin CPU, Google’s Trillium TPUv6e accelerator, and Intel’s Granite Rapids Xeon CPU. We will focus here on Nvidia, AMD, and Untether.ai.

AMD is roughly on par with the Nvidia H100, while the H200 is 43% faster

While AMD has previously disclosed micro benchmarks that highlight raw theoretical performance, such as that of the math performance on the MI300, these do not reflect the complex world of AI stacks. The AMD marketing claims that the MI300 is the fastest AI GPU were not validated with this new benchmark, but it is in the ballpark of the H100 when running a real AI workload. The Nvidia H200, however, beat the MI300 by some 43% on the same benchmark.

Screen Shot 2024-08-27 at 8.50.00 AM

AMD is without a doubt, now competitive with the Nvidia H100. AMD

We note that the Llama 2 70B benchmark doesn’t really allow AMD to strut its stuff with respect to having a larger HBM to support larger models. Hopefully we will see them run the new Mixtral MoE in a future MLPerf release.

Nvidia published H200 benchmarks that top AMD, but only by a little.

Nvidia published H200 benchmarks that top AMD, but only by a little. NVIDIA

Nvidia also published the first Blackwell benchmarks, demonstrating about four times the performance of the H100 on medium-sized models (Llama 2 70B). Nvidia recently shared more details on Blackwell NVL72 at HotChips, in which the NVSwitch interconnected infrastructure is supposed to deliver 30 times better inference performance than H200. Can’t wait to see actual (MLPerf) benchmarks for the flagship NVL72.

Nvidia showed that the Blackwell is indeed 30X faster than the H100 for extremely large models

Nvidia showed that the Blackwell is indeed 30X faster than the H100 for extremely large models. NVIDIA

Nvidia did publish results for the new MoE benchmark, which shows off the the H100 and H200. Nvidia also showed a 10% to 27% performance improvement for the H200 across the MLPerf benchmark suite, which should help users as they await Blackwell’s arrival in volume.

For those worried about the delay of Blackwell volume shipments, the H100 and H200 keep getting faster with software improvements.

For those worried about the delay of Blackwell volume shipments, the H100 and H200 keep getting faster with software improvements. NVIDIA

Untether.ai Demonstrates Power-efficient Inferencing

We have seen before that an ASIC can provide more efficient AI inference processing, as first demonstrated with the Qualcomm Cloud AI100. The challenge is that ASICs, unlike GPUSs, are one-trick-ponies. They can perhaps perform quite efficiently on, say, Resnet-50, but not so impressively on other models.

Untether.ai thinks they can break that mold, and have submitted exceptional power efficiency on Resnet-50 thats were on-par with the Nvidia H100-NV at a fraction of the power consumption.

Untether demonstrated excellent performance at low power

Untether demonstrated excellent performance at low power. UNTETHER.AI

Ok, so how does the Untether platform perform on LLM’s? The engineers didn’t complete their optimization on the BERT benchmark in time for the MLPerf submission deadline, but they have since completed their work and shared the results with us. As you can see below, the company seems to have avoided the traps their predecessors fell into. They are showing comparable performance as an Nvidia H100-NVL, with over 3X advantage in energy efficiency.

Untether missed the submission deadline for BERT-Large, but the results indicate the platform has exceptional energy efficiency for language models

Untether missed the submission deadline for BERT-Large, but the results indicate the platform has exceptional energy efficiency for language models. UNTETHER.AI

Conclusions

Once again, as we have seen over the years, only Nvidia published results for every benchmark, and once again Nvidia demonstrated why they are the best AI infrastructure provider due to their full stack approach of custom CPU, GPU, software, system, and networking But at the chip level, there is now legitimate competition from AMD, at least on a single benchmark. While we may enter a period of leapfrogging, similar to what we saw decades ago with RISC CPUs, these differentiators for Nvidia will be durable, and should keep Team Green in the lead for at least the next 2-3 years.