New Fabrics Enable Efficient AI Acceleration

by | Jul 11, 2025 | In the News

Network cable...

Photo by Marijan Murat/picture alliance via Getty Images dpa/picture alliance via Getty Images

While GPU performance has been the focus in data centers over the last few years, the performance of fabrics has become a key enabler or bottleneck in achieving the throughput and latency required to create and deliver artificial intelligence at scale. Nvidia’s prescient acquisition of Mellanox has been a critical component of its success over the last few years, enabling scalable AI and HPC performance. However, it’s not just scale-up (in-rack) performance and scale-out (rack-to-rack) connectivity; latencies in scale-within network-on-chip (NoC) have also become essential for achieving high AI throughput and improved response times.

The Importance of Advanced Fabric Solutions

The computing landscape has undergone significant changes with the advent of artificial intelligence, evolving from a loosely coupled network of independent computers to a highly integrated fabric of collaborating, accelerated computing nodes. Three levels of scale require such interconnects: the chip/chiplet, rack, and data center. Each compute element must share data with its neighbors and beyond over a low-latency, high-bandwidth communication channel to maximize performance and minimize latency.

Scale-Within Fabrics

On-chip fabrics connect processor cores, accelerators, and cache memory within a single or multi-chip module. As SoCs become more complex, integrating tens or even hundreds of cores or IP blocks, a single NoC often cannot provide the required bandwidth and scalability. Multiple NoCs, or subnetworks, are used to manage traffic between chiplets, each potentially optimized for specific data types or communication patterns. For example, one NoC might handle high-bandwidth data transfers between compute chiplets, while another manages control signals or memory access. As chiplet-based designs gain wider adoption, these NoCs become the bottleneck of chiplet-to-chiplet communication and data sharing.

A unified fabric significantly enhances latency and bandwidth in chiplet-based systems by streamlining communication across fragmented networks-on-chip (NoCs) and optimizing physical interconnects. Such a fabric can help minimize hops, improve routing, enable a higher degree of scaling, and manage congestion. More importantly, it can provide improvement in performance and reduce footprint and power through reuse of wires and logic, in a segment where every saving or every extra ounce of performance is treasured.

At the chip level, the networks on chips (SoCs) tend to be isolated; they are designed to connect a specific domain on the chip, which works great until you need to move data to another domain, creating a latency-inducing “hop” or carry overheads of working across different protocols. A unified network on chip (NoC), such as that provided by Baya Systems (a client of Cambrian-AI Research), provides a single transport mechanism for the various protocols for each fabric. Transport is separate from protocol layers, minimizing wires and logic in building a unified fabric that supports coherent, non-coherent, and custom protocols for maximum efficiency, lowest cost, and reduced power consumption.

The various on-chip networks tend to be distinct, but could be unified with the right technologies.

The various on-chip networks tend to be distinct, but could be unified with the right technologies. BAYA SYSTEMS

On-chip fabrics connect processor cores, accelerators, and cache memory within a single or multi-chip module. As SoCs become more complex, integrating tens or even hundreds of cores or IP blocks, a single NoC often cannot provide the required bandwidth and scalability. Multiple NoCs, or subnetworks, are used to manage traffic between chiplets, each potentially optimized for specific data types or communication patterns. For example, one NoC might handle high-bandwidth data transfers between compute chiplets, while another manages control signals or memory access. As chiplet-based designs gain wider adoption, these NoCs become the bottleneck of chiplet-to-chiplet communication and data sharing.

Scale-Up Fabrics

Scale-up fabrics connect accelerators (GPUs, AI processors) within a single rack or AI pod, prioritizing ultra-low latency and high bandwidth communication. Scaling up with NVLink has been the go-to standard, but the industry needs an open alternative, such as UALink, to interconnect accelerators from other vendors.

UALink and Ultra Ethernet solve different problems in the data center.

UALink and Ultra Ethernet solve different problems in the data center. SYNOPSYS

UALink is a memory-semantic interconnect standard led by the UALink Consortium, enabling accelerators to share memory directly. Its four-layer protocol stack supports single-stage switching, reducing latency and congestion. UALink will deliver up to 200 Gbps per lane and memory-sharing capabilities to scale (up) accelerator connectivity. The Consortium recently approved the V1.0 Specification of UALink in April 2025, and the first silicon is expected to be available later this year, with volume production scheduled for 2026.

Scale-Out Fabrics

Scale-out fabrics interconnect multiple racks or pods, enabling the distribution of workloads across larger clusters, or more often running a lot more “copies” of the workloads thereby increasing services to more clients. Nvidia offers both Ethernet and InfiniBand networking to connect racks for east-west traffic. As for scale-out alternatives, the industry is standardizing a high-bandwidth open networking protocol called Ultra Ethernet tailored for AI workloads across as many as 1 million heterogeneous nodes.

Ultra Ethernet IP solution will enable 1.6 Tbps of bandwidth for scaling (out) massive AI networks. UALink will deliver up to 200 Gbps per lane and memory-sharing capabilities to scale (up) accelerator connectivity.

Companies in the Fabric IP Business

Historically, fabrics have been proprietary and come from companies like Nvidia, AMD, and Intel. For Arm provides the CoreLink NIC-301 and related interconnect IP, widely used in Arm-based SoCs for scalable, configurable on-chip interconects. While Arm’s fabric is really designed for Arm CPU SoCs, Baya Systems and Arteris provide fabric IP for many implementations, including RISC-V and custom accelerators. And Baya is unique in its chiplet-first focus and the ability to scale out and scale up, while

Arteris

Arteris is recognized as a leader in providing what we have been referring to as Scale-Within fabric NoCs and SoC integration automation software to speed the development of complex SoCs. Arteris went public in October 2021 (Nasdaq: AIP), with a market cap of approximately $300 million as of mid-2025. Arteris has over 200 customers such as Samsung, AMD, Qualcomm, Baidu, Mobileye, and NXP, with an installed base of nearly four billion devices. Arteris IP is broadly deployed across the automotive segment (notably ADAS, with >70% market share), communications, consumer electronics, enterprise computing, and industrial markets.

Arteris’ products include the FlexNoC Interconnect with its integrated physical awareness technology, gives place and route teams a much better starting point while simultaneously reducing interconnect area and power consumption. Arteris claims that the FlexNoC delivers up to 5X shorter turn-around-time versus manual physical iterations. Ncore IP is similar, but is designed for multi-core cache-coherent designs.

Baya Systems

As we have noted, the AI transformation has driven the need for scale-up and scale out, and has also put a lot of demands on scale-within. In addition, the market perceived a gap emerging that wasn’t readily solved by off the shelf, scale-within IP. The market transition to chiplets which offer a promise of greater scale and cost effectiveness has different demands on a more agile data-driven design philosophy to handle the complexity of the new systems.

This is exactly what Baya Systems, a relatively new entrant aims to solve and it has been gaining a great deal of traction since it came out of stealth a year ago. Baya Systems (a client of Cambrian-AI Research) is a Silicon Valley startup with strong backing and leadership that has architected a semiconductor IP and software portfolio to enable designers of SoCs, systems, and data center scale infrastructure to build high-performance AI technology quickly and efficiently. Baya Systems chiplet-first fabrics are designed to address both on-chip, scale-up, and cross-system (scale-out) networking challenges. Its flexibility and modularity position it for broader applications, potentially integrating various processing units and accelerating communication in diverse, high-performance environments. The Baya Systems fabric supports multiple protocols, including AMBA, UCIe, UALink, and UltraEthernet.

Picture1

Baya Systems has created a comprehensive fabric that supports popular protocols for scale-within, scale-out and scale-up. BAYA SYSTEMS

Tenstorrent, an AI chipmaker considered an emerging challenger to Nvidia, recently released a white paper demonstrating how Baya’s fabric substantially boosts performance by up to 66% while reducing footprint by 50% compared to their home-grown state-of-the-art custom fabric. Tenstorrent is led by legendary computer architect Jim Keller, who is also a backer of Baya Systems, and sits on their board.

Beyond NoCs, Baya NeuraScale offers a scalable fabric solution based on the company’s WeaveIP technology, providing a non-blocking cross-bar replacement fabric that is designed to power switches for UALink or UltraEthernet standards in emerging scale-up and scale-out systems. The unique approach of using a “mesh”-based, tileable architecture that simplifies chiplet-based scaling, opens the path to much larger accelerator node counts compared to traditional crossbar switches, which are hitting reticle limits. This could enable 144-port or even 288-port racks, compared to today’s 72-port ones, substantially expanding scale.

Interestingly the company claims that the technology could enable much larger node counts beyond this once the industry adopts this. But what makes this additionally disruptive is that NeuraScale can substantially reduce the resources, time, and cost required to build these high-performance switches, thereby enabling smaller, nimble entrants to broaden and scale the market.

The WeaveIP NeuraScale Fabric

The WeaveIP NeuraScale Fabric. BAYA SYSTEMS

Fabrics Will Enable The Future Of AI

The modern data center is evolving rapidly, both in its compute elements (chiplets, chips, CPUs, GPUs) and in fabrics, to enable these systems to scale to hundreds of thousands of nodes and support AI.

While Nvidia’s new NVLink Fusion will allow non-Nvidia CPUs and GPUs to participate in the Nvidia rack-scale architecture, hardware vendors and hyperscalers will continue to seek an open fabric alternative to an ecosystem controlled by a single firm. Consequently, we envision a significant increase in these heterogeneous fabric technologies as AMD, Intel, and hyperscalers adopt them to build out their own AI Factories, both with and without Nvidia hardware. Fabrics like that of Baya Systems represent a key enabler in that evolution.

We have a more in-depth report on Baya Systems here.

And more information about Arteris can be found on their website.