Critical vulnerability affects Ollama: 300,000 servers exposed to attackers


Ollama accepts requests without authentication, and 300,000 servers are sitting ducks. A new critical vulnerability allows hackers to leak server memory storing API keys, environment variables, system prompts, and users’ conversation data.

Ollama released a new software version without even mentioning that it addressed the critical security issue.

Cyera, a cybersecurity firm, warns that the staple platform for running AI models locally is affected by a memory leak dubbed “Bleeding Llama.”

ADVERTISEMENT

No authentication or sophisticated tools are required – just a few simple HTTP requests to any exposed Ollama server, and a hacker walks away with user conversations, API keys, environment variables, and other sensitive data.

The publicly accessible Ollama servers have already been mapped by public scanners, are visible to attackers, and are potentially vulnerable, especially if owners don’t prioritize the update, the researchers warn.

jurgita justinasv Izabelė Pukėnaitė vilius Ernestas Naprys Gintaras Radauskas
Don't miss our latest stories on Google News. Add us as your Preferred Source on Google

“Ollama, when launched, listens on all interfaces by default with no authentication. Today, there are roughly 300,000 exposed servers on the internet. This means threat actors can exploit this vulnerability without any credentials – using only three API calls, they can extract the entire heap memory of the Ollama process,” the report by Cyera Research reads.

Ollama’s documentation suggests users expose their local servers to the network with a single configuration line “OLLAMA_HOST=0.0.0.0” and provides no warning that Ollama will listen for commands coming from anywhere, not just the local machine.

The risk is greatest for instances exposed to the open internet, such as cloud deployments or systems with open ports on their routers. Attackers can quickly discover them in minutes and run any API commands.

“Ollama was designed as a localhost tool, which is why it doesn’t include authentication,” warns Echo, a third-party CVE Numbering Authority that helped report the vulnerability.

“In practice, teams deploy Ollama in containers, expose it to other services, or configure it to listen on all interfaces to support multi-client use.”

ADVERTISEMENT

Ollama version 0.17.1 or later has eliminated the disclosed vulnerability, but the researchers still recommend restricting access to private networks or placing instances behind secure proxies.

Check if your data has been leaked

Find out if your email, phone number or related personal information might have fallen into the wrong hands.
18,611,353,922
Breached accounts
36,030
Breached websites

How does the bug work?

Attackers with access to Ollama HTTP servers can already instruct them to pull or delete any AI model they want, and abuse them without limits.

However, the actual disclosed vulnerability lies in how Ollama handles AI model files in GGUF format.

GGUF files contain all data about the AI model, from its name to the actual parameters, known as weights, which are like the brain cells of the AI model. The weights are stored in tensors, which are basically long lists of numbers – multi-dimensional arrays.

However, Ollama doesn’t validate if the declared size of tensors matches the actual data.

The hacker can simply craft a malicious GGUF model, providing inflated tensor dimensions, and Ollama will try to read the data out of bounds, where it is not supposed to.

“GGUF is just a binary format – anyone can create one manually and set the tensor’s shape to whatever they want. There’s no validation that the number of elements we’re about to read actually matches the real size of the data,“ Cyera’s report explains.

out of bound read
Image by Cyera.
ADVERTISEMENT

“So if an attacker puts a very large number in the shape field, the loop will blindly read past the end of the buffer – that’s our out-of-bounds heap read.”

The poisoned model can be uploaded with a simple HTTP request. The second request instructs Ollama to process it and create a new quantized GGUF model. Ollama will attempt to convert tensor data, but in this process will also read other data from RAM and store it all in the new file. This is how the user conversations, API keys, environment variables, and other data leak.

The attacker can then simply exploit another Ollama API call and push the new file with the data from memory to any server they control.

The critical heap out-of-bounds read vulnerability is tracked as CVE-2026–7482, and its severity rating is 9.1 out of 10, according to the National Vulnerability Database (NVD).

“The documented OLLAMA_HOST=0.0.0.0 configuration is widely used in practice (large public-internet exposure observed,” NVD warns.

Ollama
Image by Cybernews.

Companies deploying Ollama are in danger

Ollama was designed as a tool to run on a local machine. Therefore, it doesn’t include authentication.

“Now imagine a large enterprise with 10,000+ employees using Ollama as their AI “chat.” Think about how much sensitive data flows into the Ollama server. An attacker can learn basically anything about the organization from your AI inference – API keys, proprietary code, customer contracts, and much more,” Cyera said.

Developers also often connect Ollama to other AI tools, such as Claude Code or OpenClaw, which might potentially leak all the data flowing to the Ollama server.

ADVERTISEMENT

“What started as a local tool becomes an internet-facing service, without any of the protections that would normally be expected in that context. And at scale, that’s how teams wind up with hundreds of thousands of exposed instances,” Echo said.

Both teams criticized the Ollama team for releasing a fix without clearly communicating how sensitive the security issue was.

“Updates get prioritized based on perceived risk, and in this case, that signal wasn’t there,” Echo explained.

What to do if you run Ollama?

First, it’s paramount to update Ollama immediately to the latest version.

Second, make sure that your instance isn’t exposed to the open internet – this is always a bad cybersecurity practice, no matter the service.

“Restrict access using network controls or place it behind a secured proxy. Reducing exposure is just as important as patching,” Echo explains.

If your Ollama instance was publicly accessible before the patch, assume potential compromise – review what sensitive data might’ve been exposed, and rotate credentials.


Unlock more exclusive Cybernews content on YouTube.

ADVERTISEMENT