Wikidata Embedding Project by Wikimedia Deutschland

Could this be the end of AI hallucinations?

Wikimedia Deutschland has revealed a new project to make information on Wikipedia accessible to artificial intelligence (AI) models.

The project, called the Wikidata Embedding Project, is a “vector database for Wikidata,” which is based on a method that allows computers to understand the meaning and relationship between words.

According to Wikimedia, the project allows for “approximately 120 million open data points in the world's largest free knowledge database” to be used to train AI.

The project's idea is to provide developers outside of leading tech companies with the resources needed to develop AI applications. Now the project includes an embedding system that turns Wikidata data into vectors, while DataStax stores it in the vector database Astra DB.

More from Cybernews

Sainsbury’s installs more facial rec tech in stores: Should shoppers be worried?

"Phantom squatting” uses AI hallucinated domains for cyber attacks

What they’re doing to gamers is a robbery in broad daylight

Irish parliament expands Microsoft use, despite the EU’s efforts to ditch it

Can government AI actually scrub UAP footage from the internet?

Apple CEO Tim Cook, right hand on heart, white male, grey white hair, left hand on heart

The EU and Apple CEO Tim Cook held "constructive" talks after their Siri AI dispute in Europe

According to Wikimedia, Wikidata now includes 120 million entries. While this data can be processed by a computer, the same can’t be done by generative AI systems, since this information is designed for natural language.

Translating Wikidata into vectors, or “numerical coordinates that show how different statements are related to each other,” helps AI systems learn how different terms are connected. For example, it can help AI understand that there’s a close correlation between the words “dog” and “puppy,” while “dog” and “bank account” are terms that don’t connect.

The project also introduces the Model Context Protocol (MCP), which ensures that AI and vector databases can communicate with each other. Thanks to this, software developers only have to plug in a USB connector for AI to access Wikidata’s data.

One of the issues that AI developers face is getting sources with correct, high-quality data.

Add us as your Preferred Source on Google

Add us as your Preferred Source on Google.

The project not only allows for generative AI models to get “reliable data from Wikidata,” but also provides current data. Previous AI models may have been trained on outdated information.

This means that AI models will also provide users with more accurate and reliable answers.

According to Wikimedia Deutschland, the new project will help create more trustworthy AI applications since they’re trained on “human-reviewed” and “freely available data,” making them generate more transparent results.

Unlock more exclusive Cybernews content on YouTube.

Wikidata Embedding Project: how Wikipedia will teach AI to be more reliable

More from Cybernews