
Could this be the end of AI hallucinations?
Wikimedia Deutschland has revealed a new project to make information on Wikipedia accessible to artificial intelligence (AI) models.
The project, called the Wikidata Embedding Project, is a “vector database for Wikidata,” which is based on a method that allows computers to understand the meaning and relationship between words.
According to Wikimedia, the project allows for “approximately 120 million open data points in the world's largest free knowledge database” to be used to train AI.
The project's idea is to provide developers outside of leading tech companies with the resources needed to develop AI applications. Now the project includes an embedding system that turns Wikidata data into vectors, while DataStax stores it in the vector database Astra DB.
According to Wikimedia, Wikidata now includes 120 million entries. While this data can be processed by a computer, the same can’t be done by generative AI systems, since this information is designed for natural language.
Translating Wikidata into vectors, or “numerical coordinates that show how different statements are related to each other,” helps AI systems learn how different terms are connected. For example, it can help AI understand that there’s a close correlation between the words “dog” and “puppy,” while “dog” and “bank account” are terms that don’t connect.
The project also introduces the Model Context Protocol (MCP), which ensures that AI and vector databases can communicate with each other. Thanks to this, software developers only have to plug in a USB connector for AI to access Wikidata’s data.
One of the issues that AI developers face is getting sources with correct, high-quality data.
The project not only allows for generative AI models to get “reliable data from Wikidata,” but also provides current data. Previous AI models may have been trained on outdated information.
This means that AI models will also provide users with more accurate and reliable answers.
According to Wikimedia Deutschland, the new project will help create more trustworthy AI applications since they’re trained on “human-reviewed” and “freely available data,” making them generate more transparent results.
Unlock more exclusive Cybernews content on YouTube.
Your email address will not be published. Required fields are markedmarked