While the IBM hardware business today is limited to POWER and Mainframe chips and systems, the technology giant is quietly building its expertise and capabilities in AI hardware. Where this could end up is anybody’s guess, but here are a few thoughts about what IBM is doing and speculation as to why.
IBM research in AI hardware
IBM founded the IBM Research AI Hardware Center in early 2019 to conduct AI Chip research in collaboration with the New York State, the SUNY Polytechnic Institute, and technology companies including Mellanox, Samsung and Synopsys. The center takes a holistic, end-to-end approach to AI hardware, working towards its aggressive goal to deliver a 1000X increase in AI performance over the next 10 years. This starts with the reduced precision techniques we will discuss here. Meanwhile, the center is also developing new digital and analog AI cores to implement those innovations. The roadmap culminates in the creation of new cores, made from now-exotic materials not currently in use in semiconductor manufacturing.
IBM recently published two papers that describe its reduced precision approaches, which stand to improve AI processing efficiencies while preserving the accuracy of the predictions. Fundamentally, AI research has been exploiting reduced precision numbers and math operations for the last 4-5 years, led by NVIDIA and Google. If “Reduced Precision” sounds like a bad thing, keep in mind that higher precision comes at a significant cost—rising as the square of the bit-length of the numbers is used. So, going from 32-bit to 16-bit sped up calculations (or reduced costs) by a factor of 4. Google recently proposed a new 16-bit format called Bfloat, which uses more bits for the exponent and fewer bits for the mantissa than the IEEE 16-bit floating point standard. This preserves accuracy while using less power and chip space than the traditional 32-bit format. Intel, for one, has embraced this. However, researchers have struggled to preserve accuracy while working towards the next step of an 8-bit floating point number.
This week IBM proposed a “Hybrid 8-bit Floating Point” format that could improve performance or lower cost by up to 4X. That is, if someone (IBM?) produces a chip that can perform those calculations for training, that uses DNNs that have been properly “quantized” to those 8-bit formats. The “Hybrid” nature of these operations stems from the different precision requirements needed by the forward versus the backwards propagation calculations. By tailoring the number of bits used for the exponent and the mantissa to the forwards and backwards pass computations, IBM demonstrated that one can indeed train a neural network for vision, speech and language processing using only 8 bits for the weights and activations. Moreover, it can do this with comparable precision to the results obtained with 16-bit math. Additionally, these smaller numbers can be communicated across the chip more efficiently, further reducing training time by another 30-60%.
If all this math sounds complex, the benefits are pretty simple. This approach could theoretically enable someone to build a chip for training deep neural networks that would use ¼ the chip area, or perhaps deliver 4 times the performance at the same cost.
On the inference side of AI, chip companies and model developers are moving to 8-bit integer math for the same reasons. But could one go even lower? To date, efforts to use lower precision have failed to match the accuracy that 8-bit models deliver. However, IBM recently published a paper that proposed using two techniques called parameterized clipping activation (PACT) and statistics-aware weight binning (SAWB) that, when used in conjunction, have demonstrated 2-bit inference processing that delivers comparable accuracy to 8-bit quantized models.
So who cares?
The answer could be “everybody,” if and when someone uses these techniques in CPUs or AI chips. IBM’s roadmap certainly implies the company will do just that, but for now, this is all being done in IBM Research, not a product division. I spent 10 years at IBM, and there were often research projects that never became products; that’s the nature of research. Still, let’s think about how IBM could proceed.
I see three options for IBM to monetize AI hardware research results. First, it could license this technology to other semiconductor companies for mobile, edge and data center silicon platforms. Second, IBM could build this technology into IBM POWER and/or Mainframe processors—especially the 2-bit inference processing. Third, IBM could produce an accelerator for data center training and inference processing, competing directly with NVIDIA, Intel and others. This last option could address a larger market but comes with a business issue—IBM would need to build a channel to reach those datacenter customers beyond its POWER and Mainframe installed bases. IBM could sell through Lenovo and to ODMs that are building data center servers, for example. Of course, these three options are not mutually exclusive.
Conclusions
I am quite impressed with IBM’s accomplishments in the new AI Hardware Research Center, and it has only been at it for a year. Starting with its ambitious vision of achieving a 1000-fold improvement in efficiency, it embraces a holistic approach from algorithms to chips to materials. This gives it tremendous potential in this fast moving market. The center will soon need to make some tough choices about its business model in order to monetize this research and reach its target markets. Management needs to be patient and stay the course, and I think the next CEO, Arvind Krishna, will do just that. That said, IBM has steadily moved away from the hardware business over the last decade; we will be watching closely to see what becomes of all these great ideas.