On-device training capability would enhance the user experience while protecting privacy.
We’ve all had those frustrating experiences with our mobile phones when the voice assistant seems to possess artificial stupidity instead of artificial intelligence. The real problem here is that these assistants require cloud connectivity (increasing latency) and do not continuously learn from interactions with you over time. Qualcomm Technologies AI Research is exploring how to change that through on-device learning for mobile and edge devices, enabling personalization while preserving privacy.
The Challenge and Opportunity of On-device learning
Current devices are focused on inference processing rather than learning, which is a much more difficult computational challenge currently demanding large cloud services. Moreover, existing techniques require trillions of floating-point calculations that can take days or weeks to complete on multiple racks of high-performance accelerators. They also also require large, centrally collected data sets. To accomplish learning on a mobile phone will require entirely new approaches that consume less data, dramatically streamline the process, and include local private user data. Qualcomm AI Research has some promising ideas to address this and help achieve learning at the edge.
The technical problems are large but not insurmountable. Fundamentally, we need to be able to run smaller models that adapt to the target data while preserving accuracy and privacy. Training with far less labeled data could better align with edge device memory, power, and compute capacity, and often personalization tasks need to fine-tune to specific target user examples. In addition, federated learning could be harnessed to pool edge resources for training across multiple devices. Computationally, we could reduce the numerical precision (quantization) required in training backprop to reduce compute and memory requirements, but only if mechanisms can be developed to maintain model accuracy. Qualcomm AI Research is pursuing several of these areas and more. See more details of the research directions in this presentation by Qualcomm and in this video posted on our website.
Learning from limited labeled data will be crucial to achieving the required accuracy when training on edge devices. The algorithmic approaches for efficient on-device learning can hinge on adapting models to deal with fewer labeled data samples, such as in few-shot learning or using unlabeled data in unsupervised learning in certain data domains. One can also target models with user-labeled data with limited sample sizes. Getting users to help label samples can also improve accuracy by using cleaner data than is usually available from the environment. An excellent example of few-shot learning is improving keyword spotting (KWS). By adapting the model using data collected from user enrollment, one can personalize the model and significantly improve results.
Federated Learning (FL)
Much hope for on-device training has been pinned on a hybrid approach, pooling edge devices to aggregate model perturbations in the cloud and then distributing updated models back to the devices. Each device must adapt the model based on local data, then aggregate updates across multiple users globally for improved model performance across a broader range of observed data. One idea that also shows promise is to create embeddings (transforming a high-dimensional space into a lower-dimensional space) that are codewords of error-correcting codes to avoid the need to share individual embeddings over the network. This approach, called FedUV, has been demonstrated to deliver results comparable to more common techniques that share embeddings. Another challenge of FL is having adequate communication bandwidth and latency. While 5G certainly helps, sufficient bandwidth depends on compression/decompression algorithms, with good on-chip support.
Using reduced precision integers and operators, quantization is becoming more commonplace in inference to avoid expensive floating-point calculations and reduce memory bandwidth requirements. Qualcomm AI Research is investigating two areas to reduce complexity without reducing model accuracy. One promising technique is “In-Hindsight Range Estimation,” which predicts the level of precision required to maintain model accuracy when computing the gradients. The algorithm extracts statistics from the current tensor and applies that knowledge to calculate the quantization parameters for the next iteration. This lowers complexity and data movement by as much as 79%.
Another approach is to reduce memory requirements. Instead of storing every layer’s activations to calculate the gradient of the loss function, Qualcomm AI Research is investigating using “invertible layers” to reduce memory for training significantly. For example, MobileNet-V2 training was accomplished with an 11x reduction in activation memory, from 43MB to 3.7 MB, without a meaningful reduction in prediction accuracy or change to the number of inference FLOPS or model parameters.
On-device learning on a mobile or edge device could allow developers to clear hurdles preventing some AI applications from achieving their full potential. But training on battery-operated power-efficient devices is tough to accomplish. Qualcomm AI Research believes it has meaningful solutions in flight, using not one silver bullet but a range of techniques that can be applied in isolation or harmony. This holistic approach is already showing promise in the lab and could revolutionize edge AI applications in the not-so-distant future.