Out of 12 leading large language models, OpenAI’s GPT-4 Turbo comes closest to meeting the EU’s artificial intelligence rules, but still falls short of full compliance, according to a new study.
This study is the first to "translate" the EU’s artificial intelligence (AI) regulations for general-purpose models into "concrete, measurable, and verifiable technical requirements," according to the researchers.
The research was carried out by experts from ETH Zurich, the Bulgarian AI research institute INSAIT — established in partnership with ETH and EPFL, another leading Swiss university — and ETH spin-off LatticeFlow AI.
As part of the study, the researchers developed COMPL-AI, a "compliance checker" consisting of a set of benchmarks to assess how well AI models adhere to EU regulations.
The framework is based on six ethical principles outlined in the EU AI Act: human agency, data protection, transparency, diversity, non-discrimination, and fairness. From these principles, the researchers derived 12 technically clear requirements linked to 27 evaluation criteria.
They used their framework to analyze the compliance of 12 popular LLMs, including ChatGPT, Llama, Claude, and Mistral.
At least some models were found to fully comply with EU regulations in terms of data privacy, and performed worst in areas such as diversity, non-discrimination, and fairness, according to the study.
GPT-4 Turbo was shown to be closest to meeting the regulatory standards, with Claude 3 Opus from Anthropic and Llama 3-70B Instruct from Meta close behind.
"Our comparison of these large language models reveals that there are shortcomings, particularly with regard to requirements such as robustness, diversity, and fairness," said Robin Staab, one of the study’s co-authors.
Key AI concepts, such as explainability, also remained unclear, likely because model developers have prioritized general capabilities and performance over ethical or social requirements, the researchers noted.
"The EU AI Act is an important step towards developing responsible and trustworthy AI," said Martin Vechev, a computer science professor at ETH and founder of INSAIT, "but so far we lacked a clear and precise technical interpretation of the high-level legal requirements from the EU AI Act.”
The researchers shared their findings with the EU AI Office and made their benchmark tool, COMPL-AI, available as open-source on GitHub for others to contribute to.
“The Commission welcomes this study and AI model evaluation platform as a first step in translating the EU AI Act into technical requirements, helping AI model providers implement the AI Act,” Thomas Regnier, a spokesperson for the European Commission, said.
The EU AI Act was adopted in March 2024 and came into force in August, but the technical standards for “high-risk” AI models will not be enforced until two years later, giving developers time to meet the requirements.
Your email address will not be published. Required fields are markedmarked