The strong performance enhancements announced today should appeal to multiple sectors: enterprise customers (with Cooper Lake); cloud service providers (later this year, with Ice Lake); and edge computing gear (with Movidius and Stratix FPGAs). Let’s take a closer look.
The 3rd Gen Xeon Scalable CPUs: Cooper Lake & Ice Lake
Since Cooper Lake and Ice Lake were pre-announced, there were no big surprises here. We did see a few performance validations from customers, including TenCent and Alibaba Cloud, showing that the bfloat16 format seems to live up to the earlier hype. Overall, customers report around 1.8X improvement in inference and training throughput, thanks to the new Google-inspired floating-point format. Bfloat16 provides the dynamic range of the traditional 32-bit format, while still retaining adequate precision for AI calculations. Intel emphasized the Xeon portfolio roadmap, which shows that the prior Xeon Scalable line will bifurcate from Cascade Lake into two products in 2020: Cooper Lake, available now for 4-8 sockets, and Ice Lake, coming later this year for the high volume 1-2 socket servers. Next year we should see the line reconsolidate to a single family of 1-8 socket parts.
The upcoming Stratix 10 NX FPGA
Intel also announced a version of the Stratix 10 FPGA. The new offering has AI-specific enhancements, which should enable them to compete with the Xilinx ACAP architecture announced last year. The NX (available later this year), includes several eye-catching features that should appeal to customers building specialized edge devices that incorporate AI features. Specifically, the new NX includes support for tensor blocks with 8-bit integer operations, promising 15 times the performance of the current MX model. The NX also includes on-package support for HBM memory and high bandwidth networking enabling faster memory access and scale-out interconnects for larger deployments.
While the Xilinx Versal ACAP provides a specific AI engine and integrated networking, among other features, the Stratix 10 NX seems to offer a more incremental approach. It utilizes a computational Tensor Block and the software to drive it for AI applications. Instead of a significant architectural shift, Intel chose to dedicate more die area to AI, increasing the number of multipliers and accumulators (MACs) from 2 to 30, and extending them to support 4-, 8-, 12- and 16-bit numbers. Intel provided comparisons to NVIDIA V100, claiming from 2.3 to 9.5 times the performance, but of course the Ampere-based A100 is now the king of that hill. We will need to await more application performance benchmarks to make a valid comparison to that new design.
While I remain anxious to hear more details about the acquired Habana Labs chips, including performance, availability and market traction, I am encouraged to see Intel broaden its portfolio’s embrace of AI specific features—from Xeon to Movidius to Stratix. Intel’s executives get it—AI is not a fad, nor a feature of GPUs. Rather, it forms the foundation for Software 2.0 and is already transforming compute in the cloud, mobile, edge and enterprise data centers. Stay tuned!