Apple experimenting with AI models that can “see”


Apple researchers say that they’re working on large language models that can understand context, with one “substantially outperforming” GPT-4.

In a new paper entitled ReALM: Reference Resolution As Language Modeling, Apple researchers detail their work on large language models (LLMs) that can understand both verbal and non-verbal contexts.

“Human speech typically contains ambiguous references such as ‘they’ or ‘that,’ whose meaning is obvious (to other humans) given the context. Being able to understand context, including references like these, is essential for a conversational assistant,” the paper reads.

Enabling users to issue queries about what they see on their screen was also a “crucial step in ensuring a truly hands-free experience in voice assistants,” the researchers said, hinting at what could be Apple’s plans for Siri.

While language models do “exceedingly well” on sequence-to-sequence tasks, getting them to “see” things users may refer to on the screen has proved challenging, according to the paper.

Researchers have benchmarked their experiments against OpenAI’s GPT-3.5 and GPT-4. They said that their smallest model achieved performance that is comparable to GPT-4 and larger ones “substantially outperforming” it.

The paper, which was published on an open-access platform arXiv, also noted the shortcomings of the research.

“While our approach is effective in encoding the position of entities on the screen, we find that it results in loss of information that may not be able to resolve complex user queries that rely on nuanced positional understanding,” researchers said.

However, exploring “more complex” approaches, such as splitting the screen into a grid and encoding relative spatial positions into text, “is a promising avenue of future exploration.”

It was reported last year that Apple created its own generative AI tools to rival products from OpenAI and Google but was yet to decide when to release the technology to consumers.


More from Cybernews:

Mystery object crashes through Florida man’s roof

Jon Stewart says Apple begged him not to talk to FTC chair Khan on air

WhatsApp adds feature allowing users to react to photos when chatting

Ace Hardware client data affected by cyberattack

PandaBuy data breach exposes 1.3 million people

Subscribe to our newsletter



Leave a Reply

Your email address will not be published. Required fields are markedmarked