New study unveils empirical proof that AI systems can pass the famous Turing test


After conducting a thorough three-party Turing test, researchers have found that AI models – if suitably prompted – can now pass as humans in short, casual text conversations. We’ve tried out the test ourselves.

A team of researchers from the University of California, San Diego, says their study is the first empirical evidence that a modern AI system can pass the Turing test.

The latter is, of course, a major scientific benchmark that asks whether a machine can imitate human conversations so convincingly that people cannot reliably tell it apart from a real person.

ADVERTISEMENT

Indeed, in a series of experiments, people were often unable to tell the difference between humans and advanced large language models.

In the experiments, participants chatted with four different LLMs – GPT-4.5, GPT-4o, LLaMa-3.1-405B, and ELIZA, a classic 1960s rules-based chatbot.

alan-turing-papers
Alan Turing. Image by Cybernews

Across the four models, GPT-4.5 was judged to be human 73% of the time. In other words, the so-called human “interrogators” selected it as “human” significantly more often than they selected the real human participant.

LLaMa-3.1-405B, given the same prompt, was judged human 56% of the time – statistically indistinguishable from the humans it was compared against, the report said.

“What we found is that if given the right prompts, advanced LLMs can exhibit the same tone, directness, humor, and fallibility as humans,” said the study’s corresponding author, Cameron Jones.

“While we know LLMs can easily produce knowledge on nearly every topic, this test showed that they can also convincingly display social behavioral traits, which has major implications for how we think of AI.”

Check if your data has been leaked

Find out if your email, phone number or related personal information might have fallen into the wrong hands.
18,611,353,922
Breached accounts
36,030
Breached websites
ADVERTISEMENT

In the test, a participant chats simultaneously with two other parties, “witnesses,” one of which is a human and the other is an LLM, and the human “interrogator” must decide which party is the human.

I’ve tried the test myself (it’s available here). I was assigned the role of a “witness,” and was interrogated for 4 minutes and 19 seconds before the human on the other side asked me a couple of times if I knew of Karlson (it’s an indie game) and correctly decided I was a human.

Another “interrogator” once again correctly decided – in 150 seconds – I was a human after asking me whether I support Palestinians and receiving a response that it’s complicated.

According to the authors of the experiment, LLMs successfully fooled participants of the study by “making mistakes like a human would.”

Next, I tried this out as the “interrogator.” I chatted with two personas, both allegedly from Germany.

It took a while, but I soon noticed that the language of an alleged student still in school was raw enough for me to judge it as human. The other tried hard to seem chirpy, but it didn’t seem natural at all.

I was right – but I needed nearly 8 minutes. According to the authors of the experiment, LLMs successfully fooled participants of the study by “making mistakes like a human would.”

jurgita justinasv Izabelė Pukėnaitė vilius Ernestas Naprys Gintaras Radauskas
Don't miss our latest stories on Google News. Add us as your Preferred Source on Google

The results carry real-world implications for trust online – especially because the models that pass as human do so over the course of extended five or 15-minute conversations.

“We need to be more alert – when you interact with strangers online, people should be much less confident that they know they’re talking to a human rather than an LLM,” said Jones.

ADVERTISEMENT

“There are lots of people who would like to use bots to persuade people to share their social security numbers, and vote for their party, or buy their product.”


Unlock more exclusive Cybernews content on YouTube.