Apple researchers make new ‘dense’ AI models capable of running on an iPhone


Text, images, or video – these small but dense AI models can understand them all, and better than models of larger size, say the Cupertino giant’s researchers.

Apple researchers introduced a new family of multimodal large language models (MLLMs), called MM1.5, ranging in size from 1B to 30B parameters.

While smartphones can run models with a few billion parameters, a computer would be needed for AI models exceeding 10 billion parameters. The largest, most capable models ever produced crossed the 1,000 billion parameter bar.

ADVERTISEMENT

However, Apple’s researchers claim that careful data curation and training strategies can yield strong performance even at small scales (1B and 3B).

The new models can tackle tasks involving text-rich images, visual referring and grounding, and multi-image reasoning. They have two flavors: MM1.5-Video is tailored for video understanding, while MM1.5-UI is designed for mobile UI understanding.

MM1.5 is built upon the MM1 architecture introduced in March 2024 and demonstrates significant performance improvements.

Dense models, available in 1B and 3B sizes, “are compact enough for easy deployment on mobile devices yet powerful enough to outperform larger open-source models,” the paper claims.

“The MM1.5 recipe exhibits strong scaling behavior all the way to 30B parameters, achieving competitive performance across a wide range of benchmarks.”

Apple researchers compared their 3 billion parameter MM1.5 model to the Phi-3-Vision 4 billion parameter model from Microsoft. However, there was no clear victor. Apple’s model had the edge in text-rich understanding, while the Phi-3-Vision was better in certain knowledge-based tasks.

At a scale of 1 billion parameters, researchers boast that their M1.5 is a ‘state-of-the-art’ model that clearly outperforms a curated list of competitors.

The introduced AI models more often than not appear equal to similar-sized models.

ADVERTISEMENT

Size still matters, as the top AI models from Google or OpenAI were clearly better than the much smaller 30 billion MM1.5.

It’s not clear from the paper whether the new models will be used on any devices. However, it’s a clear sign that Apple is developing and improving its generative AI capabilities.