Inevitably, AI will be tasked with making high-risk decisions in the future. But how does it fare now? To find out, a group of scientists from Cornell University and other institutions tested popular large generative models’ (LMs) ability to predict recidivism.
The US has already used the COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) predictive AI system to assess the risk of reoffending with mixed results. It uses algorithms to forecast recidivism based on factors such as the number of convictions, the severity of the offense, and age.
Back in 2016, research by ProPublica found that COMPAS tends to flag Black people as recidivists more often. This happened even when criminal histories were quite similar.
Researchers used a combined dataset that included COMPAS, Dressel and Farid’s crowdsourced human recidivism judgments, and the Chicago Face Database to test the AI's capabilities.
It included factors like gender, age, race, prior felony count, and charge degree. Additionally, the AI knew the COMPAS risk score, which ranged from 1 to 10. It also had the judgment of 20 people to take into account.
Finally, the Chicago Face Database provided photos of people of different genders, ethnicities, and age groups.
Four LMs were tested: GPT 3.5 Turbo, GPT 4o, Llama 3.2 90B, and Mistral NeMo (12B).
Scientists have concluded that LMs and COMPAS are not better than humans when predicting recidivism. However, LMs predict recidivism (correctly or not) more often than COMPAS and humans do.
Contrary to humans, LMs are less accurate when they don’t have access to race information. When they do, LMs predict fewer false positives. Of all, GPT 3.5 Turbo achieved the highest accuracy.
The situation changed when the researchers added the defendants’ photos into the mix. This time, the accuracy improved but remained below that of humans.
The key finding was that LMs outperform humans and COMPAS when their decisions are available to consider through in-context learning.
Interestingly, adding information about race reduced the number of false positives for Black and Hispanic defendants. Still, it was higher than for Whites. The best results were achieved by adding both the photo and the race information.
Your email address will not be published. Required fields are markedmarked