Google creates a framework to stop AI from hallucinating


Google has trained a large language model to provide natural language explanations of its hidden representations.

The use of large language models (LLM) has been skyrocketing. However, despite advancements in the technology, it’s still causing concerns regarding transparency and reliability. AI models have been known to make factual mistakes, which are known in the industry as hallucinating.

Now, Google researchers have unveiled a framework called "Patchscopes," which centers on employing LLMs to offer organic language explanations of their so-called internal hidden representations. Internal representations are the way that an AI model represents what it has learned.

ADVERTISEMENT

According to Google scientists, exploring hidden representations could “unlock a deeper scientific understanding” of how these models work and provide control over their behavior.

“We are excited about its applications to detection and correction of model hallucinations, the exploration of image and text representations, and the investigation of how models build their predictions in more complex scenarios,” writes Avi Caciularu and Asma Ghandeharioun, Research Scientists at Google Research.

Patchscopes inject hidden LLM representations into designated prompts and analyze the supplemented input to generate comprehensible explanations of the model's internal understanding processes. The configuration involves four steps: Setup, Target, Patch, and Reveal, enabling augmentation of the model’s understanding of context.

According to a blog post by Google, current research unifies and extends a broad range of existing interpretability techniques and unlocks new insights into how an LLM's hidden representations capture nuances of meaning in the model's input, making it easier to fix certain types of reasoning errors.

“The Patchscopes framework is a breakthrough in understanding how language models work,” say the scientists. “This has intriguing implications for improving the reliability and transparency of the powerful language models we use every day.”

ADVERTISEMENT