The Blaize Graph Streaming Processor
Architectural innovation forms the core of every AI HW startup. Simply adding more multiply/accumulate registers or on-die memory will be inadequate for most high-performance applications. Blaize’s team built a general-purpose graph processor which can natively process graph-based applications, including, but not limited to the Deep Neural Networks which lie at the heart of most modern AI work. While the company claims this architecture can deliver massive gains in efficiency, we will need to await production-ready silicon next year to evaluate how well it performs against other engines that are coming to market.
Inference processing is rapidly becoming quite complex, requiring multiple models to deliver accurate results. One of the key differentiators Blaize hopes to cultivate is the ability to simultaneously deploy and stream multiple tasks onto the GSP, consequently accelerating the entire application. NVIDIA supports this by providing various engines on an SOC, as seen on the NVIDIA Drive AGX Xavier SOC, and Xilinx is heading in a similar direction with its flexible Versal ACAP. Time will tell if Blaize’s approach of task-level parallelism can deliver superior performance and power efficiency.
Blaize initially targets three markets: autonomous transportation, enterprise AI applications and smart vision devices. In fact, Blaize is pursuing a multi-application approach in each, with the hopes that its generalized graph processing capability can increase the opportunity footprint in each of these markets. For example, in automotive, Blaize is tackling in-cabin monitoring, infotainment, intelligent telematics and vision pre- and post-processing, in addition to ADAS and safety. In smart vision, Blaize clients are evaluating the GSP in detection/classification, smart retail, smart city, factory automation and robotics.
Blaize Picasso software
An area worth noting is the Blaize software platform, called Picasso. To help optimize a neural network for deployment efficiency on the GSP, the Netdeploy tool can prune the net, quantize the layers of the network for optimal precision and generate the streaming data flow graph programs for execution. Most startups have to develop these tools, which can require as many software engineers as the hardware engineers who design the chip itself.
Blaize customer traction
Early customer engagement is typically a weak point for many AI startups until they are close to having a working production platform. Blaize, on the other hand, has been working together with marquee clients in its target markets since the early conception of the firm. This has two significant benefits: co-design of the hardware and software based on specific customer needs, and early traction when hardware is production ready. Some of these early engagements produced strategic capital investments as well as early prototyping, from the likes of Samsung, Denso, and Daimler. I see a lot of startups’ pitches, but very few if any can match the scope of Blaize’s early client engagement.
Conclusions
I don’t normally blog about pre-silicon startups, preferring to wait (and wait, and wait, and wait) for production silicon and benchmark tests. However, in Blaize I see a startup that has invested as much in software and in early customer co-design engagements as it has in innovative hardware design. This approach makes it somewhat unique in the industry. Of course, it needs to tape out its chip and support its extensive customer pilot program with the attention and field engineering needed to turn them into design wins. That said, Blaize’s approach could help it stand out in the crowd and pave the way for success.