ChatGPT knows how to fake science – but misses one obvious detail


With AI taking fake science to the next level, researchers have created a new algorithm to spot AI-produced scientific articles.

The rise of social media, together with the COVID-19 pandemic, saw a huge increase in fake science, putting many people at risk. With the boom in generative AI, the need to distinguish between human-created and AI-generated content has become even more critical.

And while AI tools can genuinely assist with scientific work, the situation may be getting out of control. Turnitin, a plagiarism and similarity detection service commonly used by universities, reported that out of over 200 million papers analyzed, over 22 million – around 11% – contained at least 20% AI-generated content.

Cybernews has previously reported on numerous research papers being written using ChatGPT and circulated in academic circles, threatening the future of academic writing.

Research papers' authors have been known to leave sentence fragments in their texts, such as “As an AI language model…” This is a typical response that ChatGPT generates when it can not precisely answer the request.

It might not be a big deal if it's just your school essay, but when scientific discoveries that influence decision-making are AI-generated, it could become a serious problem.

Researchers at Yale and Princeton Universities have warned that relying too heavily on AI could lead to scientists and scholars producing more but understanding less. According to them, overreliance on AI could create what researchers call "scientific monocultures," analogous to agricultural monocultures, which are less diverse and more vulnerable to pests and diseases.

Researchers argue that there’s a poor understanding of the limits and accuracy of AI’s predictions in fields beyond computer science.

The war against fake science

In a recent study, researchers from the University of New York and the Hefei University of Technology in China created an algorithm called xFakeSci.

In an article published in Scientific Reports, the researchers explain that the algorithm is designed to detect ChatGPT-generated content and distinguish it from real PubMed abstracts of published articles.

“My main research is biomedical informatics, but because I work with medical publications, clinical trials, online resources, and mining social media, I’m always concerned about the authenticity of the knowledge somebody is propagating,” Ahmed Abdeen Hamed, one of the research authors.

“Biomedical articles, in particular, were hit badly during the global pandemic because some people were publicizing false research,” he added.

According to the researchers, the xFakeSci algorithm they created can detect up to 94% of bogus papers – nearly twice as successfully as more common data-mining techniques.

AI tries to flex, researchers don’t

Researchers created 50 fake articles on each of three popular medical topics – Alzheimer’s, cancer, and depression – and compared them to an equal number of real articles on the same subjects to analyze the patterns of how AI-fabricated research papers are written.

After conducting experiments, they developed xFakeSci to evaluate two key aspects of the writing in these papers. The first aspect is the use of bigrams – pairs of words that commonly occur together, like "climate change," "clinical trials," or "biomedical literature." The second aspect is how these bigrams are connected to other words and concepts within the text.

“The first striking thing was that the number of bigrams were very few in the fake world, but in the real world, the bigrams were much more rich,” Hamed said.

“Also, in the fake world, despite the fact that there were very few bigrams, they were so connected to everything else.”

Researchers theorize that the writing styles used by AI are different from those used by human researchers because human researchers don’t have the same goals as the algorithm prompted to produce a piece on a given topic.

“Because ChatGPT is still limited in its knowledge, it tries to convince you by using the most significant words,” Hamed said.

“It is not the job of a scientist to make a convincing argument to you. A real research paper reports honestly about what happened during an experiment and the method used. ChatGPT is about depth on a single point, while real science is about breadth.”

The future remains uncertain

For future development, researchers plan to enhance xFakeSci by expanding it to cover more topics beyond medicine, such as engineering, other sciences, and the humanities.

This will help determine if the same word patterns apply. However, they want to stay humble with their findings, as AI is becoming more sophisticated, making it increasingly difficult to distinguish real content from fake.

“We are always going to be playing catchup if we don’t design something comprehensive,” he said. “We have a lot of work ahead of us to look for a general pattern or universal algorithm that does not depend on which version of generative AI is used,” said Hamid.