
Cerebras Groq and SambaNova Logos CEREBRAS, GROQ AND SAMBANOVA
While Nvidia gets most of the press and market volume, there are three startups that have designed custom silicon and rack-scale infrastructure to compete with them head-on: Cerebras, Groq and Samba Nova.
While Groq and Samba Nova seem to be getting some traction, notably the Groq win for $1.5 billion at the Saudi Sovereign AI Infrastructure deal as well as IBM, and SambaNova’s wins in National Labs, enterprises and financial services applications, Cerebras seems to have the most differentiated product and customer wins. The company has just raised one of the largest AI hardware startup funding rounds this year at $1.1 billion.
Let’s do a deeper dive and see what all the excitement is about. (Disclosure: like many semiconductor companies, Cerebras and Nvidia are clients of Cambrian-AI Research.)
Let’s answer the hardest question first: Is there even room in the market for more than one of these guys? Research firm MarketsandMarkets predicts that AI Inference will grow from $106.15 billion in 2025 to $254.98 billion by 2030, reflecting a CAGR of 19.2%.
Nvidia, AMD and the CSP-built ASICS will likely dominate the market with 80-95% share. But even a 20% share split among the other ASIC providers could deliver over $50 billion in combined revenue by 2030. Of course, they will need to deliver extremely good business value to muscle their way in. That means affordable HW, great performance and ease of adoption.
But what if, on the other hand, these companies don’t win big deals? In that case, there could be only perhaps $13 billion of SAM, a 5% share, by 2030. So, there could be room for only one to survive.
The Class Of Three
While there are many interesting startups focused on edge (such as Sima.AI) and data center (e.g., D-Matrix) AI that sell PCIe cards with which one can build an AI server, I concentrate here on the three startups that build full-stack rack-scale AI solutions. Here are the three startups, outlining their funding, valuation, customers and key performance achievements.

Cerebras has received the most funding and appears to have garnered more customer wins that Groq or Samba Nova. THE AUTHOR
Finding legitimate performance comparisons is, however, surprisingly difficult as they focus their outbound guns on Nvidia, not each other. Samba Nova, in particular makes very few performance claims at all, while the Cerebras “chip” is a full wafer; it is kinda difficult to compare a wafer to a single chip.
The chart below shows how Cerebras, Samba Nova, Groq and GPUs compare on latency (lower is better) and throughput.

While Groq comes close to Cerebras in Latency, Cerebras is far faster to in tokens per second. CEREBRAS
From a strategic point of view, it looks to me like they are all trying to go after essentially the same markets, where Nvidia’s hold is less.

Target markets, business models, and recent momentum. THE AUTHOR
As for architectural designs, they couldn’t be more distinct. Cerebras has a wafer-scale engine (WSE) with onboard SRAM (40GB per WSE-3), while Groq’s Language Processing Units (LPUs) focus on determinism and use SRAM for memory as well. That could require additional scaling to hold very large models. SambaNova employs a three-tiered memory architecture on its Reconfigurable Dataflow Unit (RDU), with SRAM, HBM and DRAM, so RDUs can hold much larger models and more simultaneous models in memory than its competitors.

Architecture Comparisons. THE AUTHOR
Cerebras Systems
Led by Founder and CEO Andrew Feldman and started in 2016, Cerebras’ claim to fame is the Wafer Scale Engine, a frisbee-sized chip containing over 900,000 AI-optimized cores and fast SRAM memory. The idea is to improve chip-to-chip communication performance and eliminate most of the networking and infrastructure needed to connect GPUs in multi-rack-scale AI super systems. But beyond potential cost savings, Cerebras points to its inference benchmarks (throughput and latency), which are head and shoulders above all other players.
As noted above, Cerebras recently completed a $1.1 billion private funding round at an $8.1 billion valuation, led by Fidelity and Atreides Management, with participation from investors Tiger Global and Valor Equity Partners. Following this large investment, Cerebras withdrew its IPO filing for 2024, citing the need to update financial and strategic information to reflect rapid changes in the AI industry, and plans to re-file for a public offering “soon”.
Cerebras’ Recent Progress
Cerebras had previously announced it was setting up its own cloud infrastructure in North America and Europe, with six large data centers hosting customer workloads, including Meta Perplexity and others.

The eight Cerebras data centers. Five are now up and running. CEREBRAS SYSTEMS
Here are a few other highlights from Cerebras over the last year:
- Set a world record for LLM inference speed on Meta’s Llama 4 Maverick (400B parameters) at 2,500+ TPS, more than doubling Nvidia Blackwell’s performance.
- Delivered real-time reasoning for advanced models like Qwen3 32B, Alibaba’s code generation and reasoning LLM.
- Enabled leading applications for partners such as OpenAI (GPT-OSS-120B) and Mistral AI (Le Chat), boasting speeds up to 1,000 words per second.
- Won a $45 million DARPA contract, in partnership with Ranovus, to develop co-packaged optics, improving military chip interconnects.
- Expanded multicloud availability and launched new partnerships with Meta (Llama API 18x faster than OpenAI), Notion, Docker, Hugging Face, IBM and others.
- Launched the Supernova Startup Program to speed access to Cerebras compute and technical resources.
- Reported exponential customer growth and onboarding of dozens of new enterprise clients each quarter, targeting sectors like healthcare, pharmaceutical research and climate modeling.
The Performance Story Driving Cerebras’ Growth
As I mentioned above, direct comparisons are difficult given the size of Cerebras’ WSE and the rack-scale focus of these companies. Below is the best I could find, which are all rack-scale measurements.

Cerebras performance is not in the same class as Groq and SambaNova, but to be fair, its a massive chip compared to one. AMAZON.COM
Groq
Not to be confused with Elon Musks’ Grok LLM, Groq LPUs are known for their ultra-low latency, determinism and high-speed inference on popular open-source models. Groq was founded in 2016 by a group of former Google engineers, led by Jonathan Ross, one of the designers of the Tensor Processing Unit.
AI Latency Determinism
The key differentiator for Groq is its deterministic architecture, which provides predictable, real-time performance, with latency as low as 0.8 seconds for 100 output tokens. Determinism guarantees an upper bound on response time for every request, critical for operational systems where unpredictability would lead to failures or poor user experience. Determinism also allows reliable scaling for high-throughput workloads, with performance that does not degrade at higher loads or with long prompts, as seen with large language models and generative AI deployment. Groq’s design is also claimed to be 10 times more energy-efficient than GPUs for inference tasks.
Applications that benefit most from Groq’s deterministic latency are those requiring instant, reliable AI responses for real-time or latency-critical scenarios, which include:
- Financial Trading and Risk Analysis: Groq’s deterministic compute enables ultra-fast machine learning in financial services, such as high-frequency trading, real-time pricing and risk evaluation, where predictable sub-millisecond latency is vital[1].
- Real-Time Control Systems: Scientific applications like plasma control in fusion reactors leverage Groq for hard real-time requirements, ensuring operational safety and fast decision-making within strict time windows.
- Conversational AI and Voice Assistants: Interactive chatbots, customer service agents and virtual assistants need instant token generation and low-latency responses. Groq’s architecture provides consistently fast and predictable generation across variable prompt lengths, enhancing user experience.
- Industrial Automation and Robotics: Deterministic performance benefits autonomous platforms and industrial processes needing predictable inference for safety, reliability and agile control.
- Healthcare and Diagnostics: Inference platforms that support urgent medical decision-making, like real-time patient monitoring, benefit from ultra-fast, predictable inference windows.
Groq is the only company of which I am aware that offers determinism, but that being said, it remains to be seen how much market really cares about the feature.
Notable Groq Customers
- HUMAIN (Saudi Arabia): Groq powers sovereign AI initiatives and large-scale inference infrastructure for HUMAIN, which is part of a $1.5 billion partnership backed by the Saudi government for AI investments.
- Unifonic: A leading Middle Eastern enterprise AI and communications platform, using Groq for accelerating Arabic language generative AI customer solutions.
- ScreenApp: Video transcription and analysis platform leveraging Groq LPUs for ultra-fast, accurate video processing and speech-to-text tasks.
- Bell Canada: A major telecommunications provider deploying Groq for real-time AI services and as part of next-generation data center solutions.
- Willow: A digital twin and infrastructure services company, cited for leveraging Groq-powered AI for instant operational analytics and increased uptime.
SambaNova
SambaNova’s RDU excels at running very large, specialized models with high efficiency, in part due to the three-tier memory architecture mentioned previously. Independent benchmarks from Artificial Analysis measured SambaNova’s performance on the DeepSeek-R1 671B model at 198-255 output TPS. Note that this is 16 RDU chips, while Nvidia generates 25o tokens per second with just eight GPUs. albeit using FP4. SambaNova emphasizes its ability to run the full, unquantized (16-bit) version of these massive models on a single system, a feat it claims competitors achieve only by using smaller, less accurate model versions or significantly more hardware.
SambaNova Systems lists a range of prominent enterprise, government and research customers. Notable names include major financial, cloud, scientific and technology organizations, as well as global consultancies and national laboratories.
Notable SambaNova Customers
SambaNova’s deployments span finance, healthcare, cloud, research and government sectors:
- U.S. Department of Energy National Laboratories: Oak Ridge National Laboratory (ORNL), Los Alamos National Laboratory, Argonne National Laboratory, Lawrence Livermore National Laboratory.
- Financial Sector: OTP Bank (Central and Eastern Europe), leveraging SambaNova for AI supercomputing and language models.
- Large Enterprises & Consultancies: Accenture, NetApp, Analog Devices, Aramco, SoftBank.
- Technology Partners & AI Platforms: Hugging Face, Blackbox.AI and Continue.
- Research & Academic: Texas Advanced Computing Center, RIKEN Center for Computational Science (Japan)
- Healthcare & Testing: Ascend (laboratory services and electronic info delivery.
Quick Cheat Sheet
Ok, if you’ve read this far, you get a gold star! Here’s the summary most of you probably skipped to ;-):
Cerebras holds the record for raw throughput on the largest publicly disclosed models, making it ideal for massive-scale AI supercomputing and research where model size is paramount. The price of a single server is very high, but it replaces racks of GPUs. If you are looking for clear differentiation, this is it.
Groq is the leader in low-latency, real-time inference for mainstream LLMs, positioning it as the go-to solution for interactive applications and services where deterministic response time is critical.
SambaNova has carved out a niche in running large, complex and specialized mixture-of-experts models like DeepSeek-R1 with high speed and efficiency, appealing to enterprises that require accuracy from full-sized models, thanks to SambaNova’s unique memory architecture.
In summary, all three have differentiated solutions that can target specific uses cases, but competing with Nvidia takes more than good hardware technology. It takes a strong software story, great marketing, an appealing roadmap, and innovative pricing and Ts&Cs.
The next three years should determine who thrives, survives, or fails.
Disclosures: This article expresses the opinions of the author and is not to be taken as advice to purchase from or invest in the companies mentioned. My firm, Cambrian-AI Research, is fortunate to have many semiconductor firms as our clients, including Baya Systems BrainChip, Cadence, Cerebras Systems, D-Matrix, Flex, Groq, IBM, Intel, Micron, NVIDIA, Qualcomm, SImA.ai, Synopsys, Tenstorrent, Ventana Microsystems, and scores of investors. I have no investment positions in any of the companies mentioned in this article.