Engineer straps an RTX 5090 to a MacBook Air and actually gets it working


Many have tried and failed before – macOS simply can’t run external GPUs because it has no drivers. But one engineer has now found a way around it. A Linux virtual machine on a Mac with “enough elbow grease” can drive the most powerful consumer GPU, Nvidia RTX 5090, and play AAA games at 4K, though at a significant performance penalty.

This might be the coolest project of the year – software engineer Scott J. Goldman has just made the Nvidia RTX 5090 work as an external GPU on a MacBook Air M4.

While many before tried to plug a GPU into a Mac’s Thunderbolt port, macOS simply does not support any third-party GPUs on Apple Silicon. But Linux does, and Apple has significantly expanded virtualization capabilities in recent years.

ADVERTISEMENT

“You can run Linux in a 64-bit ARM VM on a macOS host. MacOS supports Thunderbolt devices. Linux supports NVIDIA GPUs. Let’s put the pieces together and pass through the GPU into the Linux VM,” Goldman said in a blog post detailing the experiment.

The engineer demonstrated Cyberpunk 2077 – a highly demanding game – running and 4K RT Ultra settings and achieving 27 frames per second (fps), and 111 fps with frame generation enabled.

The result is shy of 100 fps (283 fps with framegen) compared to the same GPU running on a desktop PC, but still a significant improvement over native performance. A MacBook Air running the game natively would achieve only 3 frames per second with the same settings.

jurgita justinasv Izabelė Pukėnaitė vilius Ernestas Naprys Gintaras Radauskas
Don't miss our latest stories on Google News. Add us as your Preferred Source on Google

Goldman also demonstrated that his Frankenstein setup improves Shadow of the Tomb Raider results from 8 fps when running natively to 40 fps with eGPU at 4K.

It also ran Crysis Remastered at the highest graphical settings (Can it Run Crysis) at 1080p, achieving 23 fps. A PC with the same GPU would quadruple the result.

The MacBook Air has two Thunderbolt 4 (USB-C) ports, which essentially wrap PCIe signals and transmit over the cable. All the computer sees is a normal PCIe device. PCIe is a high-speed interface used to connect GPUs in standard PCs.

Thunderbolt 4 provides 4 PCIe lanes with a maximum throughput of 40 gigabits per second, while a typical GPU could use all 16 lanes.

ADVERTISEMENT
Large image for amazon product "Apple MacBook Air 15-inch with M4 chip, Silver"

However, Thunderbolt bandwidth wasn’t even the limitation. One of the challenges was Direct Memory Access (DMA). Apple Silicon had imposed hard limits of roughly 1.5GB of mappable memory and a cap of ~64k total mappings, which is nowhere near enough for modern games.

The developer mentions plenty of “elbow grease” required for such a project, which makes it impractical for most users.

“I wish the answer were 'download this thing, and you’re good to go, ' but, alas, it’s not that simple,” Goldman writes.

“A virtual DMA device, kprobes patching the NVIDIA driver, hardware TSO mode, a pretty big QEMU patch, a mapping coalescer to stay under DART’s 64k cap… and at the end of all that, a MacBook Air really does run Cyberpunk, Crysis, and Doom on an RTX 5090 in a Linux VM.”

Check if your data has been leaked

Find out if your email, phone number or related personal information might have fallen into the wrong hands.
18,611,353,922
Breached accounts
36,030
Breached websites
Ad 1Password 1Password 1Password 1Password

So what did the engineer do?

Goldman used QEMU, an open-source virtualization software, to create the Ubuntu for ARM virtual machine, and configured it to expose the RTX 5090 as a PCI device.

This was the easy part before “the long and winding road of getting this to actually work.”

The first challenge was to mirror PCI Base Address Registers (BAR) – reserved chunks of memory the computer can read and write to – into the VM so that Linux could talk to the GPU. But it couldn’t be done directly, requiring custom C code.

ADVERTISEMENT

“As soon as the VM touched the PCI BAR memory, the host kernel crashed,” the engineer described his first attempts.

The next issue was to solve how the GPU reads and writes to RAM directly, bypassing the mentioned 1.5GB DMA limit. The engineer built a fake virtual device in QEMU acting as a middleman. A “hack, but it works.”

This led to the next problem, an alignment mismatch that broke compute workloads. The Nvidia driver was asking for memory in large chunks, but Apple’s hardware handed out pieces that didn’t fit, requiring writing another shim to match them. At this point, simple workloads worked.

“Unfortunately, if you really crank up the settings in games, we start to create tons of tiny mappings that run over the total ~64k mapping count limit,” the engineer writes.

He solved the problem by grouping nearby small memory buffers into larger chunks, reducing the active mapping count by a factor of 4.

Other bugs appeared, like macOS scheduler treating the whole setup as a low-priority background task – Goldman had to set a higher priority for QEMU.

Another major task was reducing the cost of running x86 games on ARM through emulation layers – the engineer patched QEMU to make Apple’s CPU handle memory like an x86 processor.

Losing part of performance with each step, the setup finally worked, but the stability was “not the greatest.” Goldman shared his fork of QEMU with added PCI passthrough on GitHub.

“If Linux could gain support for Thunderbolt on Apple Silicon, it would collapse a lot of the issues: no more BAR latency penalty, no more DMA limits, no more VM overhead, etc. Maybe that’ll happen at some point,” the engineer concluded.

ADVERTISEMENT


Unlock more exclusive Cybernews content on YouTube.