Cupertino's crown jewel from 2022, the iPhone 14 Pro, isn’t getting Apple Intelligence despite being only marginally slower than the flagship from 2023, benchmarks reveal. Apple’s choice to skimp on RAM in its devices seems to have come back to bite them.
Apple Intelligence is launching this fall. It promises “powerful generative models right at the core of your iPhone, iPad, and Mac” to help you communicate, work, and express yourself.
Sadly, the iPhone 14 Pro isn’t getting this update. But is the device really a slouch in AI tasks?
Primate Labs, a developer of performance analysis tools for desktop and mobile platforms, has released a Geekbench AI benchmark, which measures performance for AI-centric workloads, such as machine learning, face detection, image classification, generation by style transfer, etc.
This test reveals that the iPhone 14 Pro Max is only 11% slower than the 15 Pro Max in 8-bit (so-called “quantized”) calculations. These tasks are the least demanding, as 8-bit numbers only represent values from -128 to 127.
In half-precision (16-bit) workloads, the 14 Pro Max is only 5% slower. In single precision of 32-bit number calculations, the gap widens to 10%, according to our measurements of neural engines on both devices.
The improvements are incremental but not significant enough to be a deciding factor for implementing specific AI features.
What’s even more fascinating is that the iPhone 14 Pro Max’s performance in AI tasks is comparable and sometimes even faster than scores of Macs or iPads with M1 chips from 2020 and later. Yet, Apple chose to infuse the four-year devices with Apple Intelligence while leaving 14 Pro Max owners behind.
So what gives? The answer lies in the amount of RAM, or the temporary memory, where computers and phones store working programs and data.
15 Pro Max can load 8 billion parameters, but 14 Pro Max can’t
The main difference between the two devices is the amount of memory (RAM) installed. The 15 Pro Max has eight gigabytes (GB) of onboard RAM, while its predecessor has only 6GB. Those two gigs are the only thing the iPhone 14 lacks – it can’t fit large language models in the range of up to 8 billion parameters, as our next experiment shows.
The 14 Pro Max could load and run a small two billion-parameter model, such as Gemma 2 with 4-bit quantization (accuracy reduction), using the Local Chat app. Its speed was around 8-10 tokens (words) per second, but it was unstable and decreased. After half a minute of generating text, the device became considerably warmer and slowed.
The 14 Pro Max could not load slightly larger 7-8 billion parameter models, such as Meta’s Llama 3.1 8B or Mistral 7B.
Meanwhile, the iPhone 15 Pro Max could run Gemma 2 at around 20 tokens per second. It could also load more sizeable models but struggled with those. The performance averaged eight tokens per second when running Llama 8B, decreasing over time as the device reached its thermal headroom and started to throttle the performance.
The memory headroom available to the Local Chat app was 5.33GB on iPhone 15 Pro Max, while it could access only 2.63GB on the 14 Pro Max. And RAM is also needed for other applications and tasks to run.
For comparison, the size of the 8 Billion parameters LlLama 31 8B model is 4.5GB, while the Gemma 2 2B only takes up 1.5GB of space.
Apple execs confirmed that RAM is one of the “pieces”
In an interview with John Gruber of Daring Fireball, Craig Federighi, the senior vice president of software engineering at Apple, explained that running powerful models on an iPhone is “a pretty extraordinary thing” and that it requires “many dimensions of the system.”
“This is the hardware what it takes,” Federighi said. “Yeah, RAM is one of the pieces.”
Federighi also assured the audience that it’s not Apple’s scheme to sell new iPhones. If that were the case, they would have been “smart enough” to limit support to only the most recent iPads and Macs as well. Yet, Apple has always tried to figure out how to bring back new features as far as they could.
John Giannandrea, Apple’s head of artificial intelligence, noted that the latest A17 Pro chip in the 15 Pro Max, while not the first chip with a neural engine, has a “much bigger neural engine than the chip that came before.”
“The inference of LLM is incredibly computationally expensive. And so it’s a combination,” Giannandrea said. “It's the oomph in the device to actually do these models fast enough to be useful. You could, in theory, run these models on a very old device, but it would be so slow that it would not be useful.”
A neural engine, or a Neural Processing Unit (NPU), is an accelerator for AI computations. It’s optimized to handle calculations with 16-bit numbers and fewer more efficiently than traditional CPUs or graphic processors.
While the Geekbench AI benchmark does not reflect a huge difference in scores, according to Apple specifications, the A17 Pro chip’s neural engine has 16 cores capable of 35 tera operations per second (TOPS). The A16 Bionic, the previous highest-end chip powering the iPhone 14 Pro and iPhone 15, is only capable of 17 TOPS.
The A17 Pro contains 19 billion transistors, which is a 19% increase from the A16's.
Users on the MacRumors forum and elsewhere shared disappointment in Apple’s oversight to include more RAM in their devices previously.
“This is not acceptable for a device one year old. Audience is really important at this point…” one user said.
“Looks like Apple's stinginess on RAM has come back to bite them in the butt,” another user stated.
Geekbench AI scores may be affected by the framework implementation
For now, Apple outperforms Google Pixel or Samsung devices in terms of Geekbench AI benchmark results. However, user-submitted results have significant variations and inconsistencies, even for the same device model.
The Samsung Galaxy S24 and S24+ with Exynos 2400 processor demonstrate significantly better performance in half-precision and quantized precision than the Galaxy S24 Ultra with Snapdragon 8 Gen 3, despite this chip having a very capable NPU. That may signal the difference in software and framework implementation or optimization.
The latest Google Pixel 9 Pro XL demonstrates significantly lower NPU scores than the competition. Despite that, it runs “Gemini” AI features directly on the device. It has 16GB of RAM, twice as large as the iPhone 15 Pro Max.
Quantized precision and half-precision offer significantly higher speeds than the full 32-bit precision due to the neural processors' implementation. Quantized large language models also deliver much better performance with a slight loss in accuracy, making them likely candidates for use on low-power devices.
Despite all the acceleration improvements, phones, tablets, and laptops are still very computationally weak compared to the requirements for running the most advanced large language models. In the upcoming years, AI will heavily rely on the cloud, and users may be better off buying a subscription to ChatGPT, Gemini, Claude, or another service rather than buying a new, expensive device.
Your email address will not be published. Required fields are markedmarked