Software for new processor designs is critical to enabling application deployment and optimizing performance. UK-based startup Graphcore, the unicorn provider of silicon for application acceleration, places significant emphasis on software, dedicating roughly half its engineering staff to the challenge. Graphcore’s Intelligence Processing Unit (IPU) utilizes the expression of an algorithm as a directed graph, and the company’s Poplar software stack translates AI models and other algorithms into those graphs for execution, simplifying adoption of the chip for AI and parallel computing in general. This blog serves as a short introduction to Poplar, while our recent research paper provides more details.
The goal: attract developers of parallel applications
Graphcore attempts to address two challenges in software development with its chip: 1) make it easy to optimize and run existing Machine Learning software such as Deep Neural Networks, or DNNs, using high-level frameworks, and 2) enable research and development of entirely new fine-grained parallel workloads to run on an IPU infrastructure. This latter ability is central to the company’s strategy and could enable the company to address a larger set of market segments. The company is also planning to release its libraries to open source later this year which should help the company engage a broader development ecosystem.
What is Poplar?
Graphcore’s Poplar SDK addresses the programming challenges the IPU’s unique architecture presents by automating the compilation and optimization workflow. It exploits the fine-grained parallelism of a fabric of IPUs without requiring hand-tuned instruction-level programming, although direct hardware access is also supported.
As Figure 1 shows, Poplar consists of graph and element compilers, optimized libraries, and a graph engine for run-time management and scheduling. Machine Learning frameworks feed Poplar stack through the Poplar Advanced Run Time (PopART) interface for the Open Neural Network eXchange (ONNX) as well as the Accelerated Linear Algebra compiler (XLA) for TensorFlow-based models. Direct support will be provided for PyTorch by the end of 2020, according to Graphcore.
In addition to popular DNN frameworks, Graphcore built a custom graph framework for the IPU. This framework is intended to enable completely new parallel workloads to run on its graph architecture.
The Graph Compiler lies at the heart of run-time deployment and optimization and has been in development for over 5 years. Graphcore designed the compiler to simplify programming the IPU, especially in cases where the work is deployed across many IPUs. Instead of programming to, say, 16 distinct ASICs or GPUs in a server, the Poplar Graph Compiler targets a single “Multi-IPU”, enabling developers to focus on their data and algorithms. This flexibility and ease of scaling could be a distinct advantage over most accelerators.
The Poplar Element Compiler acts as a back end, compiling code computation into elements to run at the vertices. So, the workflow looks like this:
TensorFlow front end -> XLA -> Poplar Graph Compiler -> Poplar Element Compiler
The Poplar Libraries contain over 50 highly optimized primitives and building blocks targeting common operations such as linear algebra, common neural network functions and other operations used in machine intelligence models. Custom-built libraries also offer the possibility of publishing and running completely new kernels and DNN layers for new applications that can run on the IPU architecture.
Finally, the Poplar Graph Engine provides run-time support for the IPU, connecting software on the host CPU with software running on the IPU, managing data movement and managing the IPU device itself for I/O, application loading, debugging and profiling. Optimizing data flow from the CPU is critical to achieving maximum throughput, and the Graph Engine orchestrates the data I/O pipelines to enable continuity. The Graph Engine is supported by custom hardware on the IPU which facilitates debugging and profiling.
Conclusions
Graphcore recognizes that its success will depend as much on software as on the performance of its IPU. Its software strategy stands out in several areas:
- The Poplar Graph Compiler enables “Multi-IPU” parallelism while implementing efficient memory use and data movement.
- The Graph Framework enables new workloads, especially new algorithms from domains outside the realm of ML frameworks.
- The full support and optimization of open source ML frameworks can be extended by the user with custom “applets” which can implement new layers and kernels.
- Graphcore’s open-source strategy should help the company extend its reach across a wide community of researchers in multiple disciplines.
Graphcore also provides management software, providing the containerization, orchestration, security and virtualization on which data centers have become reliant and comfortable. Combined with its hardware design and development stack, these steps will ease adoption as more applications are deployed on the Graphcore platform. The Graphcore IPU is available now and is supported by the Dell DSS8440 IPY server and by Microsoft in Azure cloud instances. For more information, see Graphcore’s website, or our research paper at Moor Insights & Strategy, where it was originally published.