The Current State of the Art will be Disrupted
Today’s state-of-the-art Nvidia DGX system combines CPUs, GPUs, a GPU-Native Fabric (NVLink and NVSwitch), and PCIe switches and NICs to connect these compute devices to a network. While Nvidia enjoys significant revenue from all those networking devices, Enfabrica represents a disruptive technology that can replace or augment a server’s “Switching Tray” and memory subsystems.
Nvidia CEP Jensen Huang does not fear technologies that disrupt the status quo. He embraces them if they advance performance and lower costs.While Enfabrica could replace a lot of NICs he sells, it is a far more elegant solution.
Enfabrica’s Accelerated Compute Fabric Switch (ACF-S)
The Enfabrica ACF-S acts like Pac-Man, gobbling up existing silicon products and replacing them with an integrated solution that is far more cost-effective and lowers latencies. The diagram below is for an Nvidia-based accelerated server, but the ideas apply more generally to systems like Intel Gaudi and AMD’s MI300.
Instead of using industry-standard PCIe and Ethernet Network Interface Cards (NICs) with RDMA, the Enfabrica ACF-S provides the interconnectivity and network services for up to eight GPUs and even provides load distribution across the GPUs. This approach reduces network hops by over half and provides memory access across the network to shared pools of elastic (CXL) slower memory to augment the HBM.
The image below shows that the ACF-S provides direct access to pools of CXL-attached DDR5 DRAM with only four microseconds latency and 400 GB/s aggregate bandwidth. In this vision, there are no more PCIe, NICs, or isolated CPU-attached DRAM. The system integrates HBM-equipped GPUs (optionally with integrated on-package CPUs) connected over NVLink and ACF-s interfacing to other nodes and shared memory.
The Impact on System Design
Nvidia’s Grace Hopper Superchip and the upcoming AMD MI300 will consolidate the computational components by combining the CPU and GPU onto a single package, w especially useful for inference processing. These systems won’t need DRAM, as the models all fit (sort of) into HBM memory on the box as well. Training will also benefit from the Enfabrica approach with discrete GPUs and CPUs. As the diagram below shows, there are a lot of soon-to-be discarded components for AI-optimized system design.
Conclusions
It is pretty cool to imagine how this technology will impact system design. Everything is shared, everything is connected with lower power and everything will be faster. You can even run your GPUs at a higher clock frequency as you aren’t cooling the now-discarded components.
In the future, Nvidia and OEMs may ship more dense and streamlined systems, replacing NICs and PCIe switch revenue with Enfabrica switch revenue.