A psycholinguist fed ChatGPT nonsense. Here’s what it spit back.


A Kansas researcher ran a three-stage experiment to test how well AI understands obsolete words, foreign sounds, and invented terms, revealing where it mimics humans and where it breaks down.

By now, most of us know of ChatGPT’s hallucinations – faking an answer to get the task done or possibly scraping the web in the wrong place and giving you a misleading response.

That’s not all, though. A researcher recently “talked nonsense” to ChatGPT, not as a prank but as a scientific probe.

ADVERTISEMENT

Limits of language machines

Michael Vitevitch, professor in the Speech Language Hearing Department at Kansas University, gave a three-stage experiment to try to expose the limits of large language models(LLMs) or GPTs.

For the first stage, Vitevitch used 52 obsolete English words, including real oddities like “upknocker,” an 1800s occupation of someone who woke up sleeping people, to test whether ChatGPT could define them.

The results were 36 correct definitions, 11 “I don’t know” refusals, three substituted foreign language definitions, and two fabricated (hallucinated) explanations.

“It did hallucinate on a couple of things,” Vitevitch said.

“I guess it was trying to be helpful.”

This first stage shows ChatGPT’s linguistic memory, especially for obsolete words, but shows we can’t depend on it for full reliability, not to make things up.

A teacher sitting with her students.
Image via Getty
ADVERTISEMENT

Statistical instincts exposed

For the second stage, given real Spanish words, ChatGPT was asked for English words that “sound like” them – a classic human psycholinguistics test.

Instead, ChatGPT often returned Spanish words, suggesting it followed crosslingual statistical associations rather than human logic.

This highlights the mismatch between statistical language modeling and human cognition.

An AI model is prone to switch language if it gets confused, unlike a human (unless they spoke multiple languages), and hallucinations are already a known part of the personality.

Marcus Walsh profile Niamh Ancell BW jurgita Konstancija Gasaityte profile
Don’t miss our latest stories on Google News

How English is it?

For the third stage, Vitevitch created nonsense words (like “lexinize” and “stumblop”) with varying levels of English likeness, based on phonotactic probability.

ChatGPT was asked to rate how English-sounding each nonword was and how marketable it might be as a product name.

ChatGPT rated them from 1 (“bad English word”) to 7 (“good English word”) and its ratings were compared to those from human participants – and matched closely.

ADVERTISEMENT

For the second part, Vitevitch prompted ChatGPT to invent new terms for concepts that don’t have words in English. It frequently used blending and compounding, with some success.

“My favorite was ‘rousrage,’ for anger expressed upon being woken,” said Vitevitch.

Other interesting cases it made were:

Prideify: taking pride in someone else’s achievement Lexinize: when a nonword starts to take on meaning Stumblop: to trip over your own feet

By talking nonsense to ChatGPT, Vitevitch hopes to better understand when AI might be able to help humans in certain language tasks rather than simply duplicate what humans can already do quite well.