AI reads papers like a student pulling an all-nighter, and researchers know why




Artificial intelligence (AI) models like GPT-4 may seem smart, but new MIT research shows they read long documents with the same bad habits as a cramming student – by skimming the beginning and end, and missing what’s in the middle.
Remember, back in school or college, most students had to read those long papers for tomorrow’s class? Any student knows this – you start with the abstract and then move to the conclusion part, so you know the essence of the paper, and continue on to the middle in case you fall asleep.

Now, research from the Massachusetts Institute of Technology (MIT) shows that large language models (LLMs) tend to overemphasize information at the beginning and end of a document, and ignore what’s written the middle.

“If a lawyer is using an LLM-powered virtual assistant to retrieve a certain phrase in a 30-page affidavit, the LLM is more likely to find the right text if it is on the initial or final pages”, it’s said in the research.

ADVERTISEMENT

This is called “position bias,” and it’s a problem. When an AI skips through the middle, it might miss critical information, leading to incomplete answers, wrong conclusions, or biased results in tasks like legal document review, medical diagnosis, or code analysis.

bronze AI bot reading a book at the desk
gremlin/Getty Images

This happens partly because of how transformers are built and how the AI was trained.

So, how were LLMs built and trained?

The MIT team researched how information flows through the machine-learning architecture. Turns out, this isn’t just lazy student behavior, but rather a pre-programmed problem that arises from how these AI models were built.

The architecture of large language models actually encourages it. Certain design choices, like how the model pays attention to words and understands their position, naturally make it focus more on the start and end of any text.

Konstancija Gasaityte profile Gintaras Radauskas Ernestas Naprys Neilc
Be the first to know and get our latest stories on Google News

That is because transformer models (the brain of AIs like GPT-4 or Claude) at first figure out which words are important in a sentence. Later, they often use “casual masking” – a method that allows each word to only "look back" at the words that came before it, not ahead. This forces the model to build meaning step by step, based only on what’s already been said and not on what’s coming next.

ADVERTISEMENT

Because of this setup, words in the middle don’t get “reused” or referenced as much as the ones at the beginning of the text. Therefore, the model gives them less attention. Over time, as this behaviour repeats itself, the bias toward the beginning gets stronger.

MIT’s researchers did a test and “hid” important information in different parts of the text. This experiment showed that AI models read the text best when asked to find information that was at the start of the text, struggled in the middle, and got a bit better at the end.

For boomers, the black hole of information on the internet is the second page on Google search. For LLMs, it is the middle of the text.


ADVERTISEMENT

Leave a Reply

Your email address will not be published. Required fields are markedmarked