What’s Next For On-Device AI? Ask Qualcomm

by | Jul 1, 2024 | In the News

Qualcomm held a one day analyst event in San Diego updating us all on their AI research. Pretty amazing stuff, but the big news is yet to come with a new Oryon-based Snapdragon expected this Fall, and perhaps a new Cloud AI 100 next year.

Qualcomm was pretty confident last week that their AI edge in, well, the edge, is strong and getting stronger. Most of what they covered was known, and we covered a lot of it here and elsewhere. But there were a few hints of some big updates coming as well.

Qualcomm AI: Bigger is better if you can make it smaller.

While the rush is on to make AI ever larger, with mixture of expert models exceeding one trillion parameters, Qualcomm has been busy squeezing these massive models down so they can fit on a mobile device, a robot, or in a car. They say you can always fall back on the cloud if needed for larger AI, but the pot of gold is in your hand: your phone.

Qualcomm's vision is to reduce cost and increase personalization using hybrid AI.

Qualcomm’s vision is to reduce cost and increase personalization using hybrid AI. QUALCOMM

Qualcomm has invested in five areas that enable these massive models to slim down. While most AI developers know about quantization and compression, distillation is newer and really cool, where a “student model” mimics the larger teacher model but runs on a phone. And speculative decoding is getting a lot of traction as well. Add it all up, and smaller can be much more affordable than the massive models while still yielding the quality needed.

Qualcomm is pursuing five areas to enable generative AI on their Snapdragon mobile, and now PC, devices.

Qualcomm is pursuing five areas to enable generative AI on their Snapdragon mobile, and now PC, devices. QUALCOMM

Qualcomm showed off some data that says an optimized 8B parameter Llama 3 model can yield the same quality as a 175B parameter GPT 3.5 Turbo model.

Smaller models today can deliver higher quality results using only 8B parameters. Yes, you can run an 8B parameter model on a Qualcomm Snapdragon equipped smart phone.

Smaller models today can deliver higher quality results using only 8B parameters. Yes, you can run an 8B parameter model on a Qualcomm Snapdragon equipped smart phone. QUALCOMM

So, all this AI is available to developers on the Qualcomm AI Hub, which we covered here, and runs really fast on Snapdragon 8 Gen 3 powered phones. Spokespeople said that this tiny chip, which runs on less electricity than an LED lightbulb, can generate AI imagery 30 times more efficiently than data center infrastructure in the cloud.

More interesting, Qualcomm confirmed that it will announce the next step in Snapdragon SoCs this fall, and that it would be based on the same Oryon cores that power its laptop offering, the Snapdragon X Elite. Stay tuned!

Snapdragon 8 Gen 3, available now in flagship smartphones around the world, can be up to 40 times more efficient than a server in a data center.

Snapdragon 8 Gen 3, available now in flagship smartphones around the world, can be up to 40 times more efficient than a server in a data center. QUALCOMM

The Data Center: Qualcomm is just getting started

The Cloud AI 100 Ultra has been getting a lot of wins of late, with nearly every server company providing support as well as public clouds like AWS. Cerebras, the company that brought us the Wafer Scale Engine, is collaborating with Qualcomm as their preferred inference platform. And NeuReality also selected the Cloud AI100 Ultra for its Deep Learning Accelerator in their CPU-less inference appliance.

The reason for all this attention is simple: The Cloud AI 100 runs all the AI applications you might need at a tiny fraction of the power consumption. And the PCIe card can run models up to 100 B parameters, thanks in part to the larger on-card DRAM. The Net-net: The Qualcomm Cloud AI 100 Ultra delivers two to five times the performance per dollar over competitors in generative AI, LLMs, NLP, and computer vision workloads.

And for the first time we are aware of, a Qualcomm engineer confirmed that they are working on a new version of the AI100, probably using the same Oryon cores as the X Elite, which they acquired when the company bought Nuvia. We expect this third-generation of Qualcomm’s data center inference engine will focus on generative AI. Ultra has established a strong foundation for Qualcomm, and the next generation platform could be a material added business for the company.

The Cloud AI 100 Ultra has been getting a lot of traction of late.

The Cloud AI 100 Ultra has been getting a lot of traction of late. QUALCOMM

Automotive

Qualcomm recently said its automotive business “pipeline” had increased to US$30 billion, thanks to its Snapdragon Digital Chassis. Up more than US$10 billion since its third quarter results were announced in last July, this is over twice the size of Nvidia’s auto pipeline, which the company disclosed to be some $14B in 2023.

The Snapdragon prototype car, was probably deigned by BMW, one of their partners in the automotive market.

The Snapdragon prototype car, was probably deigned by BMW, one of their partners in the automotive market. THE AUTHOR

Conclusions

We recently said that Qualcomm is becoming the juggernaut of AI at the edge, and the session with Qualcomm executives last week reinforced our position. The company has been researching AI for over a decade, having had the foresight to recognize that they could use AI to their advantage over Apple. Now they are productizing that research in silicon and software, and has made it all available on the new AI Hub for developers. The addition of Automotive and Data Center inference processing helps grow revenue in new markets and diversifies the company from its roots in modems and Snapdragon mobile business. Finally, the company has doubled down on its Nuvia bet, in spite of the two year old litigation with Arm over licensing.