How much do AI models actually remember?


New research shows that AI language models memorize just 3.6 bits per parameter, learning patterns, not exact words, to generate text. This insight from Meta, Google, Nvidia, and Cornell helps explain how AI understands language.

Key takeaways:

Are AI models simply parrots, parroting off a load of information in an organized chaos?

ADVERTISEMENT

Not quite. They learn patterns, hence their name LLM – language learning model – but how much do they actually memorize?

It’s been a curious question for a while now. Now, the big thinkers at Meta, Google, Nvidia, and Cornell have looked into this and given us an answer.

AI reconstructs, not remembers

The researchers explained that AI models have an equivalent to a brain cell called a “parameter.”

Each parameter can hold 3.6 bits of memory, and that’s enough capacity to pick from 12 different options – think rolling a 12-sided dice.

That means each cell – parameter – is tiny, as a single letter in the alphabet needs 5 bits, so in effect they don’t “remember” words or sentences in one place.

Instead, a word like “moon” would be stored like a ripple effect across thousands or millions of these 3.6 units.

It’s like 100 people all memorizing a small fact each about the Moon, but without any of the participants, the full picture would be lost.

ADVERTISEMENT

So, when you ask an AI for a poem about the Moon, it’s not crafting it – it rebuilds it from the tiny patterns it has learned.

Blood red moon, dark blue sky, electricity pylons.
Image by Picture Alliance via Getty

The AI memory paradox

One ironic element of these findings is that when models train on more data, they spread their memory thin, which means less chance of memorizing any single piece of info.

The researchers ran a test whereby they fed the models pure noise – patterns of gibberish with no patterns to lean on.

Tellingly, the only way for the AI to prevail was to be a sponge and absorb – or memorize – the information.

Even in an extreme case such as this, the language learning model was unable to cram in any more data than 3.6 bits, revealing the need for this ripple effect modus operandi.

Izabelė Pukėnaitė Konstancija Gasaityte profile Niamh Ancell BW jurgita
Stay informed and get our latest stories on Google News

Why this matters

This can help researchers, lawmakers, policymakers, and even the end user understand how AI models actually work.

ADVERTISEMENT

A deeper understanding can lead to better and safer AI training in matters such as using an artist's work.

If an AI model doesn’t memorize Elton John's content directly, could it be argued that it’s simply producing a pale imitation of the original?

There are privacy benefits too – notably, unique data is more likely to be remembered and spread across the digital ecosystem

Bigger AI models are less likely to “spit out” any one training example, therefore making it harder for attackers to figure out if and how your data was used.