Company’s open-source software has matured significantly, and now the development community is taking over some of load.
Graphcore has made significant advancements since we published the original version of our research paper on the company’s software in May 2020. We have an update in the works, but along the way, we discovered that the company is benefiting from its commitment to open-source software and optimized models. So, let’s take a peek. (We have also published an in-depth analysis here.)
In addition to supporting the second-generation IPU datacenter infrastructure at significant scale, the new Poplar SDK 2.4 stack, introduced in December has new capabilities, models, and community contributed engineering. This indicates that the software ecosystem around the Graphcore IPU is maturing from a push to a pull model: the AI community is now actively contributing to and advancing the open-source software Graphcore has released. This transition, in our opinion, is forming the tipping point for the Graphcore ecosystem.
Poplar SDK 2.4
With a focus on the development community, Graphcore’s latest software stack makes it easier to build efficient models for scale-out in IPU Pods, with up to 256 IPUs. Starting with new public examples, including ViT, UNet, GPT, RNN-T, FastSpeech2 and TGN, the Graphcore stack is helping customers experiment with the latest models. Compilation time has been reduced up to 28%. TensorFlow users can now dynamically configure gradient accumulation count at run time, and can now use the Addons package.
Some of the biggest advancements support the increased scalability of IPU-Pods. with distributed multi-host scale-out support for PopRun/PopDist for Distributed TensorFlow 2 and IPU utilization reporting for PopART and PyTorch. And PopVision analysis tools provide developers with a deeper understanding of how their applications are performing, with enhanced IPU utilization reporting.
The system performance on the second generation IPUs has been significantly improved through the software improvements.
Finally, one should note that Graphcore’s hardware is not AI-specific, and neither is their software stack. Fine-grained parallelized HPC applications represent a significant opportunity for Graphcore to accelerate applications, and many of their early customers are using the platform for HPC. And HPC is cool once again, with Exascale deployments beginning this year, and with massive contributions to vaccines for COVID-19. Collaborating with the University of Bristol, Graphcore’s software stack and enablement has evolved to better support development and optimization of HPC codes.
Conclusions
Graphcore as been developing their acceleration platform for over five years. Their hardware is now the second generation, and the system architecture has evolved to a scalable, dynamic plug-and-play infrastructure. Now the software has matured considerably, but we believe the bigger story here is the extent to which the user community has engaged in ongoing developments and optimizations. Let’s be clear, no company can marshal the resources needed to catch up with NVIDIA. Not even Intel. Consequently, the only viable path forward is to engage the development community through open-source software and support, like we see in the many courses, webinars, and a flood of documentation Graphcore regularly produces. Other AI hardware companies should take notice and reconsider their closed software approaches. For Open Source, Intel and Tenstorrent have similar strategies, but Graphcore is out in front in realizing the benefits.