Naveen Rao, ex-Nervana and Intel, leads the new company focused on improving the efficiency of AI Training
Training a deep neural network takes a lot of computational horsepower. Billions of trillions of multiplications and additions calculate “weights” which are eventually used to apply the “learning” to new inputs, and yield accurate predictions and classifications.
Open.ai used thousands of high-performance GPU’s for over three months to train the 175-billion-parameter GPT-3 transformer model, at an estimated expense of nearly $12 million. So, while AI models are exploding in size, the costs are becoming prohibitive. GPT-4, which is probably still a couple years out, is rumored to have 100 trillion parameters, or 500 times the size of GPT-3. Now obviously, Open.ai, the non-profit research group leading the charge, cannot spend 500 x $12M= $6B to train the network. Hardware innovations designed for massive models such as those invented by NVIDIA and Cerebras Systems will be necessary, but not sufficient: algorithmic optimization will play a critical role. Enter Naveen Rao and his new company, MosaicML.
Rao, who founded Nervana in 2014 and sold it to Intel in 2016 for some $408M, believes there is a better way than using brute compute force to train larger AI models, and he has founded MosaicML to help the AI community reimagine the training process. Given the increasing cost and environmental impact of AI computation, his timing may be perfect once again.
What is MosaicML?
Rao believes he and his team can spread a revolution in training AI models by offering their expertise and techniques as a service to institutions developing complex Artificial Intelligence models. The company’s mission is to help clients and the AI community improve quality (prediction accuracy), lower costs, and save time. As a beginning, MosaicML is offering tools to make efficient training methods accessible to data scientists. The company expects to generate revenue by offering model optimization as a service. This is not the sort of optimization effected by low-level close-to-the-hardware kernels, but rather the algorithmic techniques such as sparsity and network pruning.
The first tool, MosaicML Explorer, helps developers explore and understand potential trade-offs between time, performance, and costs across different cloud services and hardware options. This could save a lot of time by front-ending the clouds to streamline and evaluate implementation options.
The second tool is an open source deep learning library, MosaicML Composer, designed to “make it easy to add algorithmic methods and compose them together into novel recipes that speed up model training and improve model quality.” This may sounds like black magic, but Rao brings credibility to the message: he has studied and experience in both neuroscience and computer science. The library has 20 methods for computer vision and natural language processing, with models, data sets, and benchmarks.
MosaicML has already raised $37M from Lux Capital, DCVC, Future Ventures, Playground Global, AME, Correlation, E14, and a few lucky angels. The startup is not alone, with Codeplay and OctoML both offering optimization services and coding to hardware companies and model developers. However, Naveen Rao should bring ML Optimization as a service to a new level of capabilities. If the team can consistently deliver 5-10X performance improvements as Rao believes, they will quickly find themselves with a long line of customers waiting to get their foot in the door.