Running AI directly on your device isn’t worth your money: here’s why

If current on-device AI features aren’t compelling enough for you, there might be a reason for that. The largest AI model your next phone or laptop will be capable of running on-device amounts to just one percent of ChatGPT.

Big tech is on a quest to convince consumers that they need the next phone or laptop with AI features, citing improvements in computation power. However, they may be better off just downloading an AI app running in the cloud.

The latest consumer device can only process the weakest AI models with very limited capabilities.

While ChatGPT is estimated to have around one trillion parameters, Google’s new Pixel devices with Gemini AI built-in have just a few billion parameters – less than one percent compared to ChatGPT.

Apple Intelligence has yet to come up with future updates on current iPhones. Still, according to Cybernews experimentation, those devices won’t be able to cross the ten billion parameters mark on devices. All the heavy lifting will still be done on the cloud.

Even a powerful gaming computer can barely run models with around 10% of the ChatGPT’s parameters.

However, this doesn’t mean that small models are necessarily 10-100 times worse. Think of parameters as AI’s brain cells. These are weights and settings that determine the output given the input. Smaller “brains” may not be as accurate, but their capabilities do not scale proportionally.

To put this into perspective: on a scale of 0 to 10, where 10 represents perfect performance on current industry benchmarks, the top dogs (ChatGPT, Claude Sonnet 3.5, or Gemini 1.5 Pro) score around 7/10. Any chatbot a phone or small laptop can handle would score below 3/10. A powerful computer can run an LLM scoring around 4/10.

Those are gross approximations based on Hugging Face’s open LLM leaderboard and self-reported scores for proprietary models.

Then the price comes into consideration – isn’t it better to pay 20 bucks a month for the best service that’s constantly updated and improving instead of splurging a grand on quickly aging hardware?

“Right now, generative AI relies heavily on the cloud. Since it does require so much in terms of storage and computing capabilities, most personal devices are simply not equipped to handle all of that on-device,” said Edward Tian, founder and CEO of GPTZero, an AI text detector tool.

“I do think there are a lot of wrinkles to iron out before we can get to the point where it is effectively implemented directly on devices. Energy usage and storage capabilities are currently posing some of the biggest challenges, and those need to be addressed.”

The experiment: a powerful desktop computer is barely enough for a medium-sized model

I tried running large language models (LLMs) locally on an iPhone and a powerful desktop computer to see what’s possible. Using various generative AI models on the phone is as easy as downloading an app and choosing an open-source model.

The iPhone 15 Pro Max handled the smallest Meta model, Llama 3.1, with eight billion parameters, surprisingly well. It was able to output tokens at a decent rate of around eight tokens per second (not counting the 5-10 seconds required to load the model). Tokens are basic units of text that AI processes, typically a word or part of it.

However, while using the chatbot, the phone got very hot quickly, signaling that this may be too ambitious. I didn’t want to damage the device, so I ended the experiment. Larger models are out of the question as they wouldn’t fit into a small 8GB temporary memory (RAM).

Llama's eight billion parameters is 2-4 times larger than the Google Nano, which was introduced with the newest Pixel devices. Google’s Nano-1 and Nano-2 models have only 1.8B and 3.25B parameters respectively.

While small models are fast, there’s only so much knowledge you can fit into a single DVD. Pocket-sized LLMs amplify the worst traits of their biggest cousins, such as hallucinations, inaccuracies, biases, and questionable reasoning.

We may need more power.

Geekbench’s new AI test reveals that the most powerful consumer graphics card, Nvidia RTX 4090, is ten times faster than the iPhone 15 Pro Max.

I had a PC equipped with 32GB of RAM and a dedicated AMD 7900XT graphics card with an additional 20GB of VRAM. It struggled to run models with over 70 billion parameters. Llama 70B, Qwen2-72B, and other mid-sized models on the Ollama service required significantly more memory than I had. As a result, the output of a few tokens per second was painfully slow. And those weren't the full models, but so-called quantized versions with reduced accuracy to save memory and resources.

Given these demands, no laptop or phone with just 16GB or less RAM will run such models effectively, regardless of the TOPS (tera operations per second) count their manufacturers might advertise. They also do not have half of a kilowatt of power at their disposal.

Again, the best chatbots and other AI models are available as a service on even the weakest devices for around $20 per month.

geekbench-scores — Some Geekbench AI scores - larger is better.

Small models have some use cases

Apple, Google, and Microsoft promote AI features on low-power consumer devices, saying they will run AI models locally. How capable they will be?

“In my opinion, small AI models under 10 billion parameters are generally not very effective. The most useful consumer-facing models that significantly enhance productivity are currently trained with hundreds of billions of parameters,” said Kartik Khosa, full-stack software engineer at Phoenix Bioinformatics, a non-profit biological database resource company. “Realistically, most AI tasks won’t be processed locally.”

Apple has mentioned that ChatGPT will handle complex Siri requests in the cloud.

Khosa believes local small AI models can effectively run small-scale tasks like text rewording, summarization, or basic text generation.

“I do appreciate some of the new AI features, like the image editing capabilities demonstrated at Google’s recent event. This seems to be one of the few genuinely useful AI applications on phones so far,” Khosa said.

However, those models can run even on older devices. According to Geekbench, devices from the iPhone 13 or 14 series deliver only marginally worse AI performance scores.

“Since most demanding tasks still run in the cloud, the benefit of these hardware improvements for everyday users remains questionable. Smartphones like the iPhone 13 or 14 already provide substantial processing power, and it’s unclear if higher benchmarks in newer models will translate into meaningful improvements for most users,” Khosa agrees.

“From a consumer standpoint, AI features in new devices often feel like marketing gimmicks. Personally, I don’t see the need for integrated AI features on my phone when I can use the OpenAI ChatGPT app to handle any AI tasks I require.”

In the future, the advancing hardware may be capable of tackling more capable models running locally, which are developing.

For now, according to Hugging Face data, the best pre-trained model, sized below 100 billion parameters, is Alibaba’s Qwen2-72B, which scores an average of 35 points out of 100 on various benchmarks.

One of the toughest tests for LLMs is GPQA (Google-Proof Q&A Benchmark), an “extremely hard knowledge dataset” that contains questions designed by domain experts. Its curator, David Rein, an AI researcher at New York University, compared that PhDs in different domains from the questions only scored 34% even with internet access.

Claude 3 gets ~60% accuracy on GPQA. It's hard for me to understate how hard these questions are—literal PhDs (in different domains from the questions) with access to the internet get 34%.

PhDs *in the same domain* (also with internet access!) get 65% - 75% accuracy. https://t.co/ARAiCNXgU9 pic.twitter.com/PH8J13zIef
undefined david rein (@idavidrein) March 4, 2024

Meta’s Llama was the best in this test among the smaller models, demonstrating a score of almost 20%. Large chatbots from OpenAI, Google, or Anthropic punch above 50%.

If you want to milk an old device and have all the bells and whistles – buy a subscription

Steven Athwal, CEO and Founder of The Big Phone Store, a UK refurbished phone retailer, believes that an AI subscription can help users milk aging devices for all their worth with the latest and greatest technology without buying a new device.

“By taking an online subscription, you'll have the latest and best-performing AI models to play with at any given point in time. The models are updated as relevant new technologies often come out. They will be platform seamlessly across all screens you can own today – your phone, tablet or computer without needing to buy new hardware,” Athwal said. “And for a lot of shoppers, that is both more cost-effective and convenient.”

For him, some real benefits of AI on the device can be select photography models or augmented reality. However, for many users, the new features just don’t move the needle enough.

“If your current device is still solid and ticks all the boxes for you on a day-to-day basis, then a subscription-based online AI investment may be more practical and cheaper,” Athwal concluded. “You get all the benefits of improved AI performance without needing a new device.”

Business users want to access AI on computers instead of phones

An August survey conducted by CRM platform Pipedrive reveals that workers prefer using AI tools on desktops and laptops rather than on mobile devices in a corporate setting.

Regarding AI-powered customer support tools, AI tools to create text and content, AI tools to summarize content, or AI-assisted image recognition and analysis tools, more than 90% of respondents said they would use them on a laptop or desktop.

The only area in which smartphones beat laptops and desktops for AI tool use was with voice assistance, where 61.5% of respondents to that question say they use on smartphones while 38.5% say they use on desktops and laptops.

The Pipedrive survey was conducted among 500 business professionals.

Running AI directly on your device isn’t worth your money: here’s why

More from Cybernews

The experiment: a powerful desktop computer is barely enough for a medium-sized model

Small models have some use cases

If you want to milk an old device and have all the bells and whistles – buy a subscription

Business users want to access AI on computers instead of phones