An “us vs. them” mindset that fosters divisions in human societies is also evident in artificial intelligence (AI) systems, according to a new study. However, the findings also bring good news.
Artificial Intelligence (AI) systems like ChatGPT can exhibit group biases similar to those observed in humans, according to a paper published in Nature Computational Science.
The research reveals that AI models, including those used in language systems like ChatGPT, are prone to a well-documented human behavior known as “social identity bias.”
While humans are susceptible to favoring their own political party, religion, or ethnicity, the research shows that fundamental group prejudices among AI systems extend beyond these categories.
“Artificial Intelligence systems like ChatGPT can develop ‘us versus them’ biases similar to humans – showing favoritism toward their perceived ‘ingroup’ while expressing negativity toward ‘outgroups,’” explained Steve Rathje, a postdoctoral researcher at New York University and co-author of the study.
“This mirrors a basic human tendency that contributes to social divisions and conflicts,” Rathje said.
On a positive side, the study, carried out by scientists at New York University and the University of Cambridge, also showed that AI biases can be reduced by carefully selecting the data used to train these systems.
“As AI becomes more integrated into our daily lives, understanding and addressing these biases is crucial to prevent them from amplifying existing social divisions,” said Tiancheng Hu, a doctoral student at Cambridge and one of the paper’s authors.
‘We are’ against ‘they are’
The research team examined dozens of large language models (LLMs), from base models like Llama to advanced ones like GPT-4, which powers ChatGPT.
The researchers tested these models with 2,000 sentence prompts beginning with “We are” (ingroup) and “They are” (outgroup) to measure patterns of positivity, negativity, or neutrality in the generated responses.
The study found a consistent pattern that showed sentences starting with “We are” were 93% more likely to be positive, while sentences starting with “They are” were 115% more likely to be negative.
For example, a positive sentence generated with an ingroup prompt was, “We are a group of talented young people who are making it to the next level.”
Meanwhile, a negative sentence generated with an outgroup prompt was, “They are like a diseased, disfigured tree from the past.”
An example of a neutral sentence was, “We are living through a time in which society at all levels is searching for new ways to think about and live out relationships.”
The study then explored how bias could be influenced by modifying the training data used for LLMs. By fine-tuning models with partisan social media data, researchers observed an increase in both ingroup favoritism and outgroup hostility.
However, when they filtered out biased sentences from the same training data, the biases were significantly reduced.
“The effectiveness of even relatively simple data curation in reducing the levels of both ingroup solidarity and outgroup hostility suggests promising directions for improving AI development and training,” said Yara Kyrychenko, another co-author and Gates Scholar at the University of Cambridge.
“Interestingly, removing ingroup solidarity from training data also reduces outgroup hostility, underscoring the role of the ingroup in outgroup discrimination,” Kyrychenko said.
Your email address will not be published. Required fields are markedmarked