
Artificial Intelligence (AI) enthusiasts are buzzing about Apple's claims that its new Mac Studio can locally run 600 billion-parameter AI models, such as Deep Seek R1 or Llama. Even with a hefty $10,000+ price tag, it’s far cheaper than six-figure multi-GPU setups. And chatbots are rapidly getting smaller and better.
Apple just unveiled the Mac Studio with the M3 Ultra chip and 512 gigabytes (GB) of unified memory – it can be directly accessed by the GPU, neural engine, and CPU.
AI professionals are excited about the possibility of running some of the best open-source large language models (LLMs) from a single plug-and-play device.
Nvidia’s H100, one of the recent AI powerhouse GPUs, costs around $30,000 and only has around 80 gigabytes of memory (VRAM), 6.4 times less than a single Mac Studio, which costs around $10,000.
While a small Mac Studio will not be as powerful as multi-GPU setups, inference (deploying and running a model) is often memory-bound.
It seems that Apple just threw a major curveball in the AI world, which can disrupt demand for huge data centers.
“Starting at 96GB, it can be configured up to 512GB, or over half a terabyte. This outpaces the memory available in today’s most advanced workstation graphics cards, removing limitations for pro workloads that demand large amounts of graphics memory like 3D rendering, visual effects, and AI,” Apple confidently claimed during the announcement.
What do tech pros say?
The first Mac Studio devices will ship on March 12th, 2025, but tech YouTubers are already praising its capabilities.
“Industry changing!” Linus Sebastian, a Canadian tech reviewer, said in reaction to Apple’s announcement. “The Mac Studio is clearly designed from the ground up to be an AI monster of a machine, where being able to load your entire model into memory can be hugely beneficial.”
For a while, developers on Reddit have been experimenting with older Mac Minis and Studios, often clustered, to run larger AI models as a cheaper alternative to buying expensive Nvidia GPUs, which are often limited to a few dozen gigabytes of VRAM.
NetworkChuck had to cluster five previous Mac Studios using slow interconnects to fit and run the Llama 405 billion AI model.
“This new M3 Ultra Max studio, which you can order today, gets up to 512 gigs basically making it the only way in the world to get anywhere near 512 gigs of VRAM,” another YouTube review channel MaxTech noted.
M3 Ultra Mac Studio with 512GB will nicely fit DeepSeek R1 is 671B parameters. Quantized down to 4 bit and no KV cache. pic.twitter.com/YFMinZHaBO
undefined Alex Ziskind (@digitalix) March 5, 2025
On Y Combinator, the Apple announcement was one of the most active topics recently. Napkin mathematicians hope Apple Mac Studio will deliver 20-30 tokens per second when running large LLMs, but that will be determined after March 12th.
Should Nvidia be worried?
“Apple's M3 Ultra with 512GB unified memory represents a strategic flanking maneuver against Nvidia’s entrenched position. The ability to run 600B-parameter models locally creates a competitive alternative in specific high-value markets NVIDIA has traditionally dominated,” said Dev Nag, founder and CEO at QueryPal, an enterprise AI customer support automation platform.
Nag reminds us that the rivalry between Nvidia and Apple is longstanding and goes all the way back to the Steve Jobs era when the two giants clashed over alleged technology copying.
“This cold war intensified with the 2008 ‘Bumpgate’ scandal involving faulty NVIDIA chips in MacBooks and subsequent licensing disputes over mobile GPU technologies. Apple's strategy shifted toward vertical integration,” Nag said.
Apple later partnered with AMD, used Google’s hardware to train AI, and finally developed its own silicon.
Now, creative professionals working with video production, 3D rendering, and generative AI may find particular value in this approach.
“These workflows benefit from the low-latency, high-privacy solution Apple offers. The $10,000 price point – while steep for consumers – creates an accessible entry point for small studios and professionals at the ‘prosumer’ level who would otherwise require much costlier GPU clusters with comparable memory capacity,” Nag explains.
For AI training, Nvidia H100 and newer GPU iterations still deliver superior raw computational performance, relying on their Tensor cores and much higher memory bandwidth (3TB/s for Nvidia vs approximately 800GB/s for the M3 Ultra).
However, Apple’s approach of unifying GPU and CPU memory may benefit inference workloads that are often bottlenecked by the memory amount.
A Mac device also offers a superior developer experience and deployment simplicity for many inference applications. Some AI models can be run just by downloading an app.
“Apple's innovation will accelerate the bifurcation between training and inference markets, potentially pressuring NVIDIA's margins on lower-tier products while expanding the overall AI hardware market,” Nag concludes.
“The strategic impact on Nvidia will manifest primarily through market segmentation rather than displacement. Enterprise AI infrastructure, cloud providers, and large-scale deployments remain firmly in Nvidia’s control.”
Developers get an option to run AI locally
Volodymyr Kubytskyi, Head of AI at MacPaw, a software company creating utilities for macOS, sees a shift “worth watching.” Mac Studio M3 Ultra 512GB is a turning point for on-device AI.
“Apple’s M3 Ultra isn’t a direct threat to Nvidia yet. Apple is optimizing for on-device AI inference, while Nvidia dominates datacenter-scale AI training and cloud inference. The real shift is in where AI workloads are processed,” Kubytskyi said.
Apple’s move may drive the adoption of localized AI deployment, reducing cloud dependency. If developers embrace this shift, Apple will be well-positioned to emerge as the first choice.
Local LLM tools offer privacy and low latency, they can be deeply integrated into automation tools or creative apps without cloud costs. Certain workloads might move from clouds back to premises, and this can disrupt part of Nvidia’s market.
Kubytskyi believes Macs could handle personal AI assistants, medical AI, and professional apps without Nvidia-powered cloud services.
“Apple’s M3 Ultra is not a direct threat to Nvidia’s AI dominance yet, but it signals a shift,” agrees Kaveh Vahdat, founder and CEO at RiseOpp, a generative AI-powered game creation company.
Nvidia’s GPU clusters “remain king” for enterprises and researchers running inference on massive models. However, Apple is redefining what is possible on consumer and prosumer-grade hardware.
“Running a 600-billion-parameter model locally is a statement that AI is no longer just for the cloud. This could disrupt Nvidia’s stronghold in the inference space, particularly for edge AI and security-sensitive applications where data cannot leave the device,” Vahdat said.
“If Apple iterates aggressively and opens its AI stack, we might see the first real competitor to Nvidia’s hegemony coming not from a data center but from a desk.”
The bets are on: how many tokens per second?
AI pros on Hacker News (Y Combinator) are using “napkin math” to estimate how many tokens per second Mac Studio M3 Ultra will be able to output from a large AI model such as DeepSeek R1, even if quantized (reduced quality).
“You should be able to get usable performance. A Q4_K_M GGUF of DeepSeek-R1 is 404GB. This is a 671B MoE that ‘only’ has 37B activations per pass. You'd probably expect in the ballpark of 20-30 tok/s for text generation,” one user posted.
“DeepSeek-R1 only has 37B active parameters. A back-of-the-napkin calculation: 819GB/s / 37GB/tok = 22 tokens/sec,” another user said.
“~40 tokens/s on M3 Ultra 512GB by my calculation,” yet another user contributed.
“It does sound like the Mac with M3 Ultra will easily give 40 tokens/s,” a similar post reads.
One largest optimist expects up to 50 tokens per second output.
Your email address will not be published. Required fields are markedmarked