Agentic AI Reshapes Nvidia Strategy Beyond GPUs At GTC

by | Mar 23, 2026 | In the News

Nvidia's updated Roadmap is based on GPUs, LPUs, CPUs, NPUs, and networking chips.

Nvidia’s updated Roadmap is based on GPUs, LPUs, CPUs, NPUs, and networking chips. NVIDIA

Donning his signature black leather jacket, Nvidia Founder and CEO Jensen Huang once again wowed his fans, totaling over 30,000 in the San Jose’s SAP Arena, with dazzling technology and extreme claims, including the company’s first post-GPU product. Agentic AI took center stage, the next big thing taking the AI world by storm.

A lot is riding on this event, which marks a major turning point for the world’s largest company. Many of Nvidia’s customers now offer competing AI solutions, and the industry shift from training to inference is helping many startups gain funding for inference-specific chips. The company that earned a $5 trillion (now only $4.3 trillion) market cap based on GPUs must now shift to a new style of computing championed by its competitors to handle agentic AI, and Jensen reveled in the challenge. And that future is a five-layer cake. Sounds trite, but I can assure you it is anything but. (Like many AI semiconductor firms, Nvidia is a client of Cambrian-AI Research, LLC.)

There is far too much to cover here, so I will concentrate on the why and the impact of three announcements.

The Shift From GPU to GPU+LPU For High-Value Agentic Inference

Clearly, Nvidia intends to lead the agentic AI revolution, and believes the trend will deliver additional growth as well as demand new technologies from accelerators to new storage and new CPUs.

Why: GPUs Are Fast, But Not Fast Enough On Their Own For Agentic AI

A year ago, Nvidia announced an extension to the upcoming Vera Rubin generation for inferencing to understand massive input prompts, the Vera Rubin CPX. CPX used a form of DRAM (GDDR7) to lower costs and increase the performance of context pre-fill, the first stage of an LLM inference.

But with agentic AI exploding (the Open Claw agent is experiencing the fastest growth of any product in history), Nvidia changed its strategy. Instead of spending time and money speeding up pre-fill, the team focused on the feed-forward final stage of decoding. But it had to do something fast that didn’t depend on a unique GPU development. It had to acquire.

The Groq 3 LPU is at the core of Nvidia's Agentic AI Strategy

The Groq 3 LPU is at the core of Nvidia’s Agentic AI Strategy. THE AUTHOR AND NVIDIA

That shift led Nvidia to enter into a non-exclusive licensing deal to acquire the assets and leadership team of the competitive startup Groq last December. Merry Christmas and Happy Hanukkah, Mr. Jonathan Ross and team! Here’s $20 billion. And the amazing thing is that Team Green has already designed a rack-scale Groq solution that the company will ship with Vera Rubin this year.

The Groq 3 LPX

The Groq 3 LPX NVIDIA

Nvidia is not abandoning GPUs; but as Jensen has often repeated, Nvidia is a data center company, not a GPU company. The addition of the LPX from Groq confirms this assertion beyond any doubt. Also, note that LPX is based on Groq 3, a new generation chip not previously announced. More details on this coming soon, I’m sure.

Impact:

Groq systems do not use HBM, or DRAM for that matter. Its language processing unit (LPU) uses on-chip SRAM, which can deliver roughly one to two orders of magnitude higher memory bandwidth and far lower latency than HBM4. With this fast and deterministic architecture, Groq won what is, effectively, the anchor AI inference hub for the Persian Gulf region, centered in Saudi Arabia and backed by a very large sovereign check. I’m sure that got Jensen’s attention and narrowed the list of potential technologies he was considering.

The VR will now focus on Prefill and Attention, while the LPU will take over the decode Fee-Forward networking (FFN).

The VR will now focus on Prefill and Attention, while the LPU will take over the decode Fee-Forward networking. NVIDIA

But SRAM offers vastly lower capacity, demanding many more processors to hold an LLM. I’m sure Jensen will be just fine with that if this new platform takes off (and it will). In fact, a Nvidia executive told me he expects that some agentic workloads could require up to four racks of Groq LPUs for each rack of Vera Rubins. That’s impact. Nvidia also teased the next two generations of LPUs for 2027 and 2028.

Rubin + Groq = up to 35X performance efficiency

Rubin + Groq = up to 35X performance efficiency. NVIDIA

The impact of Groq LPX on Nvidia revenues could be huge. While Jensen forecasted a trillion dollars in revenue for the next ~3 years, that does NOT include LPX, nor China for that matter. Jensen wisely preferred to talk about the impact to a hypothetical model server, like OpenAI. The slide below outlines four service levels, each demanding distinct solutions. On the left you see the free tier, like ChatGPT, then medium, high, premium and ultra AI inferencing service. The last two are what drove Nvidia to acquire Groq assets. Overall, Nvidia with Vera Rubin + LPUs will deliver some ten-fold increase in revenue potential for the service provider. The LPU will enable the premium and ultra tiers, which is where service providers will secure the most revenue.

Over $300B in additional revenue for a hypothetical model service company is possible over using Blackwell.

Over $300B in additional revenue for a hypothetical model service company is possible over using Blackwell. NVIDIA

Note that Grace Blackwell GB300 NVL72 is certainly no slouch. Take a look at this slide, wherein SemiAnalysis published inference results for GB NVL72, and declared Nvidia the “Inference King,” an accolade Jensen clearly enjoyed. Fifty times higher throughput/watt and 35 times lower cost per token. It is not clear who “competition” is; my guess is its AMD. And of course, now Grace Blackwell will be far surpassed by Vera Rubin.

Nvidia is declared the Inference King by SemiAnalysis.

Nvidia is declared the Inference King by SemiAnalysis with 50 times higher performance per watt and 35 times lower cost per token. NVIDIA

The Shift From Big Inference to Agentic AI Software

Why: Agents Need To run Locally To Interact With Users And Orchestrate The Execution Of LLMs And Other Algorithms On The Cloud

Until recently, AI acceleration systems were designed for training and/or inference writ large. The traditional (if I can use that word) inference process was a one-shot query. Agents, however, are long-lived, and will typically be on the job for an hour, a day, a week or even months.

This led Nvidia to create a storage tier specifically targeting that sort of usage pattern. Nvidia itself is not entering the storage market. Instead it is providing a blueprint for BlueField-4 connected STX racks, and launching the platform with the support of 15 partners.

The BlueField-4 storage system for agentic AI.

The BlueField-4 storage system for agentic AI. NVIDIA

The recent emergence of OpenClaw agentic AI jolted the market and suppliers to realize that the current inference offerings would not be ideal for agentic AI, which will ultimately run on personal devices like phones and desktops/laptops connected to cloud services (Qualcomm and Apple were the exception to this rule.)

OpenClaw is a viral open-source AI agent you run locally, designed to connect to apps like ChatGPT and automate tasks for the user, like reading/writing files, running shell commands and interacting with other apps. It’s been described as a self-hosted assistant with persistent memory and broad integrations, rather than a simple consumer app download. But it is these very capabilities that make Claws so dangerous if they are set up to inherit the user’s file and application permissions. If not set up correctly, a single user could seriously damage his/her job, the company or even a government institution.

Open Claw adoption rate is the red vertical line on the right.

Open Claw adoption rate is the red vertical line on the right. NVIDIA

OpenClaw is enjoying explosive open-source adoption: reports say OpenClaw hit about 250,000 GitHub stars in roughly 60 days and reached around 1.5 million downloads weekly, with many people calling it the fastest-growing open-source project ever. Huang publicly praised its adoption, saying it surpassed Linux’s growth trajectory in a matter of weeks.

The Impact: Every Company Needs an OpenClaw Strategy; Nvidia Has One

Nvidia realized this growth could become a tipping point for AI, and while rearchitecting the Groq 3 to run in an LPU 3x rack, the team built what Jensen believes is a solution ready for enterprise adoption called NemoClaw and OpenShell. These tools improve security by isolating the agent inside a sandbox and enforcing policy-based controls on what it can read, send and execute. NemoClaw is the packaging layer, a Nvidia stack that installs onto OpenClaw with one command and adds enterprise-oriented security, governance and model/runtime options. OpenShell is the core security runtime inside that stack, and it isolates the agent at the process or kernel level so the agent does not run with unrestricted host access.

Nvidia envisions agentic cloud infrastructure will enable Agent-as-a-Service, which will probably get renamed ;-)

Nvidia envisions agentic cloud infrastructure will enable Agent-as-a-Service, which will probably get renamed. NVIDIA

The slide above should be a wake-up call to every company, especially software providers. Each and every one will need an “OpenClaw strategy” or suffer serious existential threats, as end users and companies are able to build custom software solutions, potentially threatening software companies as we know them. Why buy and adapt when you can build exactly what you need and no more?

The Shift From Proprietary to Open Models

Why: Agentic AI Needs Customized AI models Which are Easily developed starting with an open model Base

But beyond the battles for heavy-duty hardware, a separate war entirely rages in the AI industry: closed vs. open models. Jensen believes agentic AI will largely depend on smaller, open models, and Nvidia is now the industry’s largest single contributor to open source AI.

Nvidia is betting that open models will be favored in the agentic era.

Nvidia is betting that open models will be favored in the agentic era. NVIDIA

Nvidia claims that Nemotron 3 Ultra isfive times more efficient and achieves the highest accuracy on GB200 NVL72.

Nemotron 3 Ultra, the open base model from which companies and sovereigns can customize to their own needs.

Nemotron 3 Ultra, the open base model from which companies and sovereigns can customize to their own needs. NVIDIA

The Impact: Nvidia Will Now Compete With And Collaborate With The Entire Open Source Community of Models

Until recently, Meta’s Llama was the preferred base model from which anyone can build customized post-trained models. Llama has fallen out of favor a bit, as Google’s Gemini has stolen the spotlight. I will anxiously look for early signs that Nemotron 3 Ultra could take Llama’s place.

My Key Takeaways

Huang has long said that Nvidia is not a GPU company; it’s a data center company. Most of us thought he meant software and networking. But now we see that he meant the whole enchilada, accelerating meaningful workloads regardless of its GPU affinity and partnering with components and services that do not merit his teams’ engineering, like storage, cooling and power. This approach, vertically integrated and horizontally open, allows Nvidia to maintain rapid growth while preserving its mid-70’s operating margin.

7 chips, 5 rack systems, and a ton of software make the next generation AI Factory.

7 chips, 5 rack systems, and a ton of software make the next generation AI Factory. NVIDIA

Life is good, but probably a little hectic at times, at Nvidia HQ.

Disclosures: This article expresses the opinions of the author and is not to be taken as advice to purchase from or invest in the companies mentioned. My firm, Cambrian-AI Research, is fortunate to have many semiconductor firms as our clients, including Baya Systems BrainChip, Cadence, Cerebras Systems, D-Matrix, Esperanto, Flex, Groq, IBM, Infleqtion, Intel, Micron, NVIDIA, Qualcomm, SImA.ai, Synopsys, Tenstorrent, Ventana Microsystems, and scores of investors. I have no investment positions in any of the companies mentioned in this article.