Apple’s new AI research could radically change the scene: why?


It probably was a bit naive to consider Apple behind its rivals in the AI race – the tech giant simply doesn’t need to yell. But when it speaks, it matters – the firm just announced significant strides in AI research through two new papers.

Apple barely talks about AI and likes to work in the background rather than join the hype cycle – this way, the company can avoid the dangers of inflated expectations.

This doesn’t mean that the tech behemoth is not working on machine learning – in July, for instance, Bloomberg News said that Apple has quietly created its own generative AI tools to rival products from OpenAI and Google but is yet to decide when to release the new tech into the wild.

ADVERTISEMENT

That’s probably because Apple has always been a consumer-facing firm concerned first and foremost with the end product and what the software does for the user. In other words, the company wants to be as ready as possible to greenlight any new features.

Now, though, Apple is moving forwards. The firm announced major strides in AI research through two new papers introducing new techniques for 3D avatars and more efficient language model inference.

These advancements might potentially enable more immersive visual experiences and, perhaps even more importantly, allow complex AI systems to run on ordinary consumer devices such as iPhones and iPads.

In the first research paper, Apple scientists propose HUGS (Human Gaussian Splats) to generate animated 3D avatars from short monocular videos (videos taken from a single camera).

According to lead authors of the paper, the method automatically learns to disentangle the static scene and create a fully animatable human avatar within 30 minutes. Compared to previous avatar generation methods, HUGS is up to 100 times faster in training and rendering.

The new 3D modeling capability – available to see in videos here – and ability to create avatars from the simplest videos seems an impressive achievement from Apple. It could unlock new possibilities for virtual try-on, telepresence, and synthetic media in the near future.

The second paper tackled a key challenge in deploying large language models (LLMs) on devices with limited memory – think smartphones. Modern natural language models like GPT-4 contain hundreds of billions of parameters – this obviously makes inference expensive on consumer hardware.

The proposed system minimizes data transfer from flash storage into scarce DRAM (dynamic random-access memory) during inference.

“Our method involves constructing an inference cost model that harmonizes with the flash memory behavior, guiding us to optimize in two critical areas: reducing the volume of data transferred from flash and reading data in larger, more contiguous chunks,” reads the paper.

ADVERTISEMENT

Co-author of the study Mehrdad Farajtabar added: “This breakthrough is particularly crucial for deploying advanced LLMs in resource-limited environments, thereby expanding their applicability and accessibility.”

The bottom line is that these optimizations might soon allow complex AI assistants and chatbots to run smoothly on iPhones, iPads, and other mobile devices.