Our Year-End Newsletter

by | Dec 19, 2024 | In the News

Trilobite Driconurus Monstrosus fossil from the lower Devonian, 420 Million Years Ago

With the end of the year at hand, we first want to thank you for being a follower of Cambrian-AI Research, and share a few thoughts about the industry. Our community has grown tremendously thanks to you reading our many blogs on Forbes. EE Times, and on this Cambrian-AI website. If you can’t get past the Forbes paywall, you can read our research and blogs here on Cambrian-AI.com, after a 5 day waiting period for published articles.

Here’s a few thoughts to mull over as you pop some corks and prepare to enjoy the holiday season. I’ll keep this short, I promise.

In 2025, its not all about Nvidia anymore.

Well, at least its not ONLY about Nvidia. While Nvidia is still the undisputed king of the AI Hill, we have seen significant improvements in the AI accelerators designed by the CSPs (AMZN, MSFT, and GOOG). These and other hyperscalers in China will help grow AI, but we doubt that Nvidia will cry too much. More AI chips means more AI deployments, which helps Nvidia indirectly, and Nvidia is probably already sold out of Blackwell’s for all of 2025.

Nvidia

What the stock market of late is missing is the “insane” demand and future profits of Blackwell, in all its forms. Nvidia has launched new AI models, new software, new Blackwell platforms, new partnerships, and new corporate headquarters.

Amazon AWS Trainium 2 and 3

AWS Trainium 2 looks ok, but not great, as it is finally available as a preview on AWS. Think of it as a poor man’s Hopper at best. And Trainium 3 should be out by the end of 2025.

But at least Trn2 was deemed scalable enough to create a massive cloud for Anthropic. This supercomputer, dubbed Project Rainier, will deliver five times the performance of the one Anthropic used for its current generation of AI models, with “hundreds of thousands” of Trainium 2 accelerators.


Now, the skeptic in me says that Amazon may have over-estimated demand for Trainium with Nvidia Blackwell in the wings. So instead of writing them off, they gave (I mean invested) a second $4B so Anthropic could, amongst other things, invest in buying hundreds of thousands of Trainium 2 chips, probably at a really good price.

Or perhaps this was the plan all along. Either way, AWS wins, and Anthropic wins. But Nvidia does not lose. They still win. The AI tide will lift (almost) all boats.

But AWS enthusiasm did get a little carried away, with David Brown, VP of compute and networking, saying that the “new Trn2 UltraServers offer the fastest training and inference performance on AWS.” That is just not even remotely accurate (polite speak for “horseshit”). Nvidia H100 blows Trn2 away. If you disagree, AWS, how about publishing some benchmarks? Ah, yeah. Thats what we thought.

Where art thou, AMD?

AWS also dissed AMD, when Gadi Hutt, the Director of Product and Customer Engineering at Amazon’s Annapurna Labs, said AWS just isn’t seeing adequate demand for the MI300. (AMD has said that was BS.) But why would he say that? Ive met Gadi, back when I was at Calxeda, and he sure isn’t dumb. Perhaps he realizes that a) they have Trainium, b) they have a good relationship with Nvidia, and c) they need a third choice because……. Oh, right. They don’t.

Facing both Nvidia and CSP silicon as strong competitors, AMD’s prospects are not looking as great as they did a year ago. I mean, $5B in year one for its new GPU is certainly amazing, but I suspect that CSP’s self-consumed output will enjoy a greater share in 2025.

AMD has great technology, improved software, and now the TIME magazine’s CEO of the year! But they need a market, a large market that isn’t the 1st tier cloud providers. AMD snagged IBM and Oracle. That’s a good start! How about Sovereign data centers? That’s where I would go.

Google’s latest TPU

Google has also upped its chip game, with the latest TPU, called Trillium, boasting great performance and scalability. Trillium (a much better name that TPUv6e) offers over 4x improvement in training performance, up to 3x increase in inference throughput, 67% increase in energy efficiency, and twice the HBM and networking bandwidth of the 5th generation TPU predecessor. Boom! Now if they can just do a little better job of marketing it, they could further increase their cloud service market share. Recently we have seen a renewed energy and talented work from Google marketing on Quantum, Gemini 2.0, and Trillium. Keep it up! Read more here.

Cerebras Inference

The Wafer-Scale Engine company has moved beyond training and into inference processing in a very big way, recently announcing benchmarks that show Cerebras is by far the fastest inference processor. Read our article here.

And Broadcom

Broadcom just breached the $1T valuation wall, albeit briefly, by selling silicon design services to the CSP’s to build out their chip fantasies. Apple recently said they are partnering with Broadcom as well as they prepare for a 2026 entry into the AI accelerator market (Apple are once again late to the party). So, add Apple to the pile of gold that Broadcom is mining, and one can see why their stock is up so dramatically this year, especially since earnings.

And Finally, Intel

Poor Intel. They had a good CEO who knew the team, business, technology, and market. Unlike their Board members. If Pat can’t do it, nobody can. Fire the board, and bring Pat back.

Nvidia could punch back. Here’s how.

So, if the cloud providers can build AI hardware, why can’t Nvidia rent GPU services on the cloud? They could. And they are already building a multi-billion dollar software business. Collette Kress, Nvidia CFO, said the company would surpass $2 billion in annualized software revenue by the end of 2024. Thats now. And that’s how.

A recent article in The Information explored the option that Nvidia could exercise by increasing the hardware and software it sells directly to customers on its own DGX Cloud platform. Today, we would note that they sell access to GDX cloud services through, not around, CSPs. But that could change if and when Jensen feels threatened by CSP-designed silicon. But that is not his worry now, we suspect.

If I were Jensen, I’d consider selling cloud services, perhaps by buying someone like CoreWeave. But I would not go down this road until I had to, probably not until 2026 or later. Nvidia can already sell more chips than it can make. And there is no benefit to be had by angering your largest customers, even if they are competing with you.

CSPs will continue to sell access to Nvidia GPUs because their customers demand it. Why kick the goose if it is still laying golden eggs?

What Else is on Our Mind?

Here are a few topics we are fleshing out for future research to be posted on Forbes, EE Times, LinkedIn and of course at cambrian-ai.com:

  • Quantum computing stocks just exploded. Why? There’s no revenue. Not Yet.
  • The next big battle will be over power and cooling.
  • The implication of the Arm vs. Qualcomm litigation, now in court.

Happy Holidays!