
New research shows that AI language models memorize just 3.6 bits per parameter, learning patterns, not exact words, to generate text. This insight from Meta, Google, Nvidia, and Cornell helps explain how AI understands language.
-
AI models store fragments, not full words or sentences
-
Training on more data lowers memorization of any one piece
-
Findings could shape copyright law and data privacy policy
Are AI models simply parrots, parroting off a load of information in an organized chaos?
Not quite. They learn patterns, hence their name LLM – language learning model – but how much do they actually memorize?
It’s been a curious question for a while now. Now, the big thinkers at Meta, Google, Nvidia, and Cornell have looked into this and given us an answer.
AI reconstructs, not remembers
The researchers explained that AI models have an equivalent to a brain cell called a “parameter.”
Each parameter can hold 3.6 bits of memory, and that’s enough capacity to pick from 12 different options – think rolling a 12-sided dice.
That means each cell – parameter – is tiny, as a single letter in the alphabet needs 5 bits, so in effect they don’t “remember” words or sentences in one place.
Instead, a word like “moon” would be stored like a ripple effect across thousands or millions of these 3.6 units.
It’s like 100 people all memorizing a small fact each about the Moon, but without any of the participants, the full picture would be lost.
So, when you ask an AI for a poem about the Moon, it’s not crafting it – it rebuilds it from the tiny patterns it has learned.

The AI memory paradox
One ironic element of these findings is that when models train on more data, they spread their memory thin, which means less chance of memorizing any single piece of info.
The researchers ran a test whereby they fed the models pure noise – patterns of gibberish with no patterns to lean on.
Tellingly, the only way for the AI to prevail was to be a sponge and absorb – or memorize – the information.
Even in an extreme case such as this, the language learning model was unable to cram in any more data than 3.6 bits, revealing the need for this ripple effect modus operandi.
Why this matters
This can help researchers, lawmakers, policymakers, and even the end user understand how AI models actually work.
A deeper understanding can lead to better and safer AI training in matters such as using an artist's work.
If an AI model doesn’t memorize Elton John's content directly, could it be argued that it’s simply producing a pale imitation of the original?
There are privacy benefits too – notably, unique data is more likely to be remembered and spread across the digital ecosystem
Bigger AI models are less likely to “spit out” any one training example, therefore making it harder for attackers to figure out if and how your data was used.
Your email address will not be published. Required fields are markedmarked