AI detection tools biased against non-native English speakers, study shows


GPT detection tools "inherently discriminate" against non-native English speakers, which could potentially hurt their education and career prospects, according to new research.

Tests of popular AI detectors have shown they often wrongly flag English texts written by non-native speakers as AI-generated despite professing 99% accuracy. This claim is "misleading at best," researchers from Stanford University said.

The study evaluated the performance of the seven most widely used AI-detecting tools on 91 English essays written by non-native speakers. The essays were part of the Test of English as a Foreign Language, or TOEFL, a well-known English proficiency test.

ADVERTISEMENT

Alarmingly, over 50% of these essays were mistakenly labeled as AI-written by detection programs, with at least one tagging 97.8% of the provided TOEFL essays as machine-produced. In contrast, when similar essays written by US eighth graders were tested, over 90% were correctly branded as human-generated by the same AI detection tools.

"GPT detectors exhibit significant bias against non-native English authors, as demonstrated by their high misclassification of TOEFL essays written by non-native speakers," researchers wrote in Patterns journal.

Most detectors rely on the text's "perplexity" – or the ease with which the program can guess the next word in the sentence – to assess whether it is likely to be AI-generated. It might "inadvertently penalize non-native writers who use a more limited range of linguistic expressions," the study's authors said.

However, research also showed that AI detectors can be easily fooled by prompting chatbots to produce a more "literary" language. After using ChatGPT to rewrite flagged essays in a more complex language, all were marked as human-written by detection tools.

"Paradoxically, GPT detectors might compel non-native writers to use GPT more to evade detection," the authors said.

Biased AI detection tools could marginalize non-native English speakers in academic and professional settings and "silence" their perspectives online, where platforms like Google downgrade content perceived as AI-generated, they warned.

"Even for native English speakers, linguistic variation across different socioeconomic backgrounds could potentially subject certain groups to a disproportionately higher risk of false accusations," researchers said.

The study cautions against using AI detection tools in schools – "arguably the most significant market for GPT detectors" – warning of mental health implications on students wrongly accused of cheating.

ADVERTISEMENT

Researchers suggest AI detection programs be used as educational aids rather than assessment tools.

"Proficient at recognizing clichéd expressions and repetitive patterns, GPT detectors can serve as self-check mechanisms for students. By highlighting overused phrases or structures, they may encourage writers to be more original and creative," they said.