Nvidia Leapfrogs Google And AMD With Vera Rubin

by | Jan 7, 2026 | In the News

CEO and founder Jensen Huang at the CES keynote in Vegas on January 5, 2016

CEO and founder Jensen Huang at the CES keynote in Vegas on January 5, 2016. NVIDIA

Its got to be tough being an Nvidia competitor. You think you are close to catching up, perhaps even taking the lead, and then, bang! Nvidia brings out yet another latest and greatest.

This year brings perhaps the biggest generational change ever witnessed in Nvidia, with a new CPU, a new GPU, new networking chips, an AI model for automated driving and new open models for agentic AI. (Disclosure: Like most AI semiconductor companies of note, Nvidia is a client of my firm, Cambrian-AI Research, LLC.)

We will focus here on Vera Rubin. Note that you speed readers may want to jump to the end where I draw some conclusion comparing Nvidia’s platform to Google, the clear No. 2.

The Nvidia Vera Rubin Platform

I recently published a summary of the new Google Ironwood AI Super Computer here on Forbes, where I said Google had turned AI up to 11. Well, Nvidia just proved that there are numbers above 11. A lot of them, in fact.

Typically, Nvidia CEO and founder Jensen Huang’s team operates on a simple system principle: only change one chip at a time. CPU, GPU, NPU, Ethernet, scale-up switches. Thats a lot of engineering and a lot of tape-outs, and testing. This time around, Huang threw that rule out, and the teams co-designed six chips to create an entirely new platform while still using the Blackwell chassis. Specifically, Huang announced the Nvidia Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU and Spectrum-6 Ethernet Switch. Such an engineering feat has never been attempted, and Huang was pleased as punch to announce that all six, and the supercomputer racks, are already in production, with customer shipments expected in the second half of 2026.

Training time is reduced by 4X,  inference throughput is improved by 10X, and Token cost is lowered by 10X.

Training time is reduced by 4X, inference throughput is improved by 10X, and Token cost is lowered by 10X. NVIDIA

The innovation across every element of an AI system has enabled a dramatic increase in performance, power efficiency and cost. And the market is responding as one would expect. Microsoft’s next-generation Fairwater AI super factories will scale to hundreds of thousands of Nvidia Vera Rubin Superchips.

“Rubin arrives at exactly the right moment, as AI computing demand for both training and inference is going through the roof,” said Huang. “With our annual cadence of delivering a new generation of AI supercomputers — and extreme codesign across six new chips — Rubin takes a giant leap toward the next frontier of AI.”

I love this quote: Elon Musk, founder and CEO of Tesla and xAI, said that, “💚🎉🚀 🤖Nvidia Rubin will be a rocket engine for AI. If you want to train and deploy frontier models at scale, this is the infrastructure you use — and Rubin will remind the world that Nvidia is the gold standard.💚🎉🚀 🤖”. (Emojis are Musk’s, not mine.)

Vera Rubin

Vera Rubin Specs

Vera Rubin Specs. NVIDIA

The star of the show: the Rubin GPU

The star of the show: the Rubin GPU. NVIDIA

NVLink6

NVLink remains one of Nvidia’s most significant competitive advantages and has taken on a strategic role with the adoption of the open NVLink Fusion by Qualcomm, Amazon AWS, MediaTek , Marvell, Alchip Technologies and Astera Labs. The latest incantation of NVLink provides a scale-up fabric at 3.6 TB/s per GPU, supporting all-to-all collectives in network. While many are pinning their hopes on the slow-moving UALink consortium efforts, NVLink is the industry standard and best scale-up fabric currently available.

NVLink 6 Switch uses the fastest SerDes in the industry

NVLink 6 Switch uses the fastest SerDes in the industry. NVIDIA

For completeness, I include slides from Nvidia on the other chips in this announcement.

Nvidia BlueField-4 is a combined SmartNIC and Storage processor with 6 times the compute of its predecessor

Nvidia BlueField-4 is a combined SmartNIC and Storage processor with 6 times the compute of its predecessor. NVIDIA

An interesting announcement from Huang was a BlueField-4-based storage server. The Nvidia Inference Context Memory Storage Platform boosts KV cache capacity and accelerates the sharing of context across clusters of rack-scale AI systems, while persistent context for multi-turn AI agents improves responsiveness, increases AI factory throughput and supports efficient scaling of long-context, multi-agent inference. The platform has received support from nearly every storage vendor.

The Nvidia ConnextX-9 Spectrum-X SuperNIC is an 800 Gb/S Ethernet

The Nvidia ConnextX-9 Spectrum-X SuperNIC is an 800 Gb/S Ethernet. NVIDIA

Let's not forget that fast CPU performance is critical, hence Vera

Let’s not forget that fast CPU performance is critical, hence Vera. NVIDIA

Spectrum-X will ship with Co-packaged optics, one of the industry's first

Spectrum-X will ship with Co-packaged optics, one of the industry’s first. NVIDIA

The new Spectrum-X Ethernet Photonics switch systems deliver 5x improved power efficiency and uptime for scale-out networking and is one of the industry’s first co-packaged optics switches in production. This chip supports 128 ports at 800 Gb/s.

Nvidia is justifiably proud of the Vera Rubin Compute tray:  No Cables. No Hoses.  No fans.

Nvidia is justifiably proud of the Vera Rubin Compute tray: No Cables. No Hoses. No fans. NVIDIA

The compute tray is noteworthy, as the previous Grace Blackwell tray had wires to connect the various domains and hoses for cooling. Take a look at this beauty for Vera Rubin. Huang said the old Blackwell design required about two hours to assemble. This new tray only takes five minutes. He also noted that the Vera Rubin takes water in at 45 degrees Celsius (113 degrees Fahrenheit), eliminating the need for coolers.

Comparing Nvidia Vera Rubin To Google TPUv7 and AMD MI450

OK, this will be difficult to impossible to make an apples-to-apples comparison lacking MLPerf results, but let’s see what has been disclosed. (Sources are cited below the table.)

First, Google Ironwood is a “inference-first” machine, but does not (yet) support or publish the 4-bit comparisons that would enable an equivalent comparison to Nvidia. NVFP4 “performance” is a hardware peak for 4-bit tensor-core style GEMMs under Nvidia’s FP4 approach, but real training/inference throughput depends on how much of the model actually runs in FP4 vs higher precision (BF16/FP32), plus optimizer/state handling and other non-GEMM work.

Nvidia’s own NVFP4 training approach is explicitly mixed precision (e.g., GEMMs may consume FP4 inputs but produce BF16/FP32 outputs, and some layers/ops are retained in BF16/FP32 for stability), so there is no single constant conversion factor from NVFP4 FLOPs to BF16 FLOPs. You can’t just divide a 4-bit performance metric by four to get a 16-bit result to compare against Google and AMD.

While some might say that Nvidia’s use of NVFP4 in training is somewhat specious, I am certain that Nvidia will publish the methodology to obtain its results at precision soon.

I would note that if Google publishes an MLPerf suite of benchmarks in the next round from MLCommons, we will have a much better feel for relative real-world performance for inference and training. Nvidia does this for all benchmarks. AMD does for a few models. And Google did selectively in the past.

The best we can do until MLPerf results are (hopefully) published soon.  Note that the Nvidia performance results use NVFP4.

The best we can do until MLPerf results are (hopefully) published soon. Note that the Nvidia performance results use NVFP4. THE AUTHOR AND PERPLEXITY USING GEMINI3

As you can see, Nvidia’s use of NVFP4 is producing huge dividends, while AMD maintains a memory capacity advantage in the MI450, due to ship at the same time in second half of 2026.

Key Takeaways

First, if you are worried that Nvidia’s competition is heating up, keep worrying; it is. That being said, I think it is safe to say that most competitors are vying for second place in the data center and physical AI, and that Nvidia’s leadership remains largely unchallenged. And we haven’t even touched on the rich software stack and ecosystem support that Nvidia has built over the years.

References:
[1] Nvidia launches Vera Rubin NVL72 AI supercomputer at CES
[2] Ironwood: The first Google TPU for the age of inference
[3] AMD Helios – AI Rack Built on Meta’s 2025 OCP Design
[4] Nvidia unpacks Vera Rubin rack system at CES – The Register
[5] Inside AMD’s Helios AI super rack powered by MI450 chips as Meta …

Disclosures: This article expresses the opinions of the author and is not to be taken as advice to purchase from or invest in the companies mentioned. My firm, Cambrian-AI Research, is fortunate to have many semiconductor firms as our clients, including Baya Systems BrainChip, Cadence, Cerebras Systems, D-Matrix, Esperanto, Flex, Groq, IBM, Intel, Micron, NVIDIA, Qualcomm, Graphcore, SImA.ai, Synopsys, Tenstorrent, Ventana Microsystems, and scores of investors. I have no investment positions in any of the companies mentioned in this article.