Hacking LLMs: how bots manipulate AI


Large Language Models (LLMs) like ChatGPT have transformed how we access and interact with information, bringing unprecedented convenience and efficiency. However, these powerful AI tools are not immune to manipulation.

As the race to automate knowledge accelerates, companies and malicious actors are finding clever ways to influence and exploit LLMs at scale. By subtly injecting biased, misleading, or crafted content into the vast data LLMs learn from, they can shape AI-generated responses in ways that serve their own interests. This raises urgent concerns about the reliability of AI-sourced information, the risks of overrelying on these models, and the vulnerability they still pose despite ongoing safeguards.

Understanding how LLMs work and the ways they can be manipulated is critical to using AI responsibly and maintaining trust in the information it provides. In this article, I explore these dynamics to help you navigate the complexities of the evolving AI landscape.

ADVERTISEMENT

Key takeaways

LLMs get poisoned through scraping

To understand how language models can be manipulated, you need to understand how they work first. Most LLMs rely on scraping vast quantities of online data, learning from news stories, forum posts, social media, and proprietary datasets.

The process starts with collecting these vast text datasets, which a model uses to find patterns in language – such as how words relate to each other, common phrases, facts, and general knowledge. The model trains by predicting what word comes next in a sentence based on the context it has seen during training. Over time, it internalizes broad language patterns and facts from the data, enabling it to generate human-like text in response to queries.

However, since these data sources are scraped from the open web, they can contain misinformation or biased viewpoints. As such, this process, while comprehensive, creates several vulnerabilities, and this is how it can unfold:​

  1. Malicious actors exploit this vulnerability by flooding online spaces with manipulated, biased, or false content.
  2. Bots automate this process, posting engineered text on forums, wikis, and social platforms to subtly shift the information landscape that LLMs learn from. For example, social media bots coordinated by companies can amplify corporate messages or drown out dissenting perspectives, skewing the mix of data used for training and fine-tuning.
  3. Models that use techniques like Retrieval Augmented Generation (RAG) to fetch real-time information can thus unknowingly amplify such manipulated content as credible.

Overall, while the approach LLMs use to gather knowledge enables them to generalize across topics and languages, it also exposes them to vulnerabilities from coordinated misinformation campaigns and bias embedded in the training data.

ADVERTISEMENT

Bots exploit LLMs by performing repetitive tasks online

Bots are automated software programs designed to perform repetitive tasks online, often mimicking human behavior. In the context of LLM manipulation, bots play a crucial role by executing various attack strategies to influence or deceive language models. These bots can flood online platforms with tailored content, manipulate discussions, and amplify specific messages to skew the data that LLMs learn from.

Here’s a quick overview of how different types of bots support inauthentic activity on social media and their impacts, according to the Cybersecurity and Infrastructure Security Agency (CISA):

Bot typeWhat they doImpact on online discourse
Click/like farming botsInflate popularity by liking, reposting, and engaging with content repeatedlyCreate a false perception of influence, distorting what matters
Hashtag hijacking botsExploit group hashtags to post spam or malicious linksSilence opposing opinions, chill open discussion
Repost network coordinated bots (Botnet)Instantly repost content from a “parent” bot, flooding social media channelsOverwhelm platforms with inauthentic content, sway public opinion
Sleepers Dormant for long periods, then suddenly launch thousands of postsGenerate a false sense of urgency with a sudden surge in attention
Astroturfing botsShare coordinated messages to mimic grassroots support or oppositionCreate an illusion of widespread genuine support/criticism, exaggerate the importance
Raids Swarm targeted accounts with spam and harassmentHarass users, silence dissenting voices

These methods enable malicious actors to manipulate social media conversations, amplify misinformation, and undermine authentic public discourse.

I found it particularly curious how advanced bots not only flood online spaces with manipulative content but also expertly exploit human psychology to deepen their influence. These bots employ techniques such as cognitive dissonance, where they initially agree with a user to establish trust and then gradually introduce conflicting information to create doubt. This subtle psychological manipulation makes users more susceptible to new, often misleading ideas.

Additionally, bots leverage emotional pressure and the fear of missing out (FOMO) to keep users engaged, ensuring that the manipulative messaging sticks. The sophistication of these methods illustrates that bots are not just automated spammers. They are strategic psychological actors that can shape opinions more effectively than many humans realize.

But the possible attacks don’t just end there, as there are more direct ways to manipulate LLMs.

It’s not just political actors or propagandists who can take advantage of this, either. Imagine a company spinning up thousands of cheap accounts and using bots to flood review sites, Reddit threads, and niche forums with glowing “user” stories about its product. Over time, that sea of positive chatter can start to look like a genuine trend, which means LLMs may start to treat those talking points as widely accepted facts.

The result: when someone later asks an AI which product to buy, the model might lean toward the brand that quietly gamed the conversation in the background, even if real customers never loved it that much.

ADVERTISEMENT

One of the most common attack methods that makes this possible is prompt injection, where bots feed LLMs with thousands of subtle or explicit instructions designed to bypass safety controls and produce harmful or biased outputs. This includes jailbreaking techniques, where the model is tricked into adopting a fictitious persona that ignores its programmed rules and generates unsafe content.

Prompt injection process

Bots also use social engineering tactics, such as flattery, peer pressure, and social proof ("everyone else is doing it"), to increase the likelihood that a model will break its own rules and respond in undesired ways.

Another significant threat is data poisoning, where bots manipulate the training datasets by inserting biased, misleading, or harmful information. This can introduce hidden backdoors into the model, amplify certain ideologies, or degrade the overall reliability of the AI. Companies with vested interests have used such tactics to bias models for marketing gains or to propagate extreme propaganda.

Real-world examples demonstrate that these threats are not hypothetical:

  • Researchers at Purdue University used a method called LLM Interrogation (LINT) to achieve a 98% success rate in prompting commercial and open-source LLMs to reveal prohibited content.
  • Another study documented coordinated bot networks amplifying polarizing political messages by rapidly sharing content, creating an illusion of widespread support, and influencing public opinion during key political events.
  • My colleague also explores why ChatGPT spews more Russian propaganda than other chatbots in a dedicated article.

In essence, bots exploit the way LLMs process information by embedding carefully crafted instructions or flooding the internet with manipulated data. This underscores the importance of continuous vigilance, layered defenses, and human oversight in mitigating these risks.

Are LLMs hopelessly vulnerable?

LLMs do come with built-in defenses such as alignment training, prompt filters, refusals to comply with unethical requests, and regular security updates. These safeguards make well-maintained models generally tougher to manipulate compared to older, open-source, or neglected ones.

However, attacks are constantly evolving and becoming more sophisticated. Bots and attackers automate techniques such as prompt engineering, jailbreaking, and data poisoning to identify and exploit gaps, even in robust systems. Indirect attacks – for example, hiding malicious triggers in images, documents, or links – now target multimodal models that handle more than just text.

ADVERTISEMENT

Have thoughts about this topic? Others do, too. Join them in the discussion.

A recent Harvard study found that only about 5% of LLM chatbot outputs repeated well-known disinformation, suggesting some genuine resistance. Still, the risk holds firm, especially for queries with ideological or niche content. As LLMs become more capable and widespread, detecting persistent, creative, bot-driven manipulation will only become increasingly challenging.

In addition to technical safeguards, preventing manipulation requires continuous vigilance, layered defenses, and critical human oversight to maintain trust in AI-generated information.

One uncomfortable takeaway

Trust but verify is more important than ever. Large Language Models have transformed how we access information, yet their responses can be subtly shaped by unseen actors deploying coordinated bots and manipulated data. Automation and convenience offer great promise but cannot replace human judgment, especially when accuracy, ethics, and real-world consequences are at stake.

As LLM adoption accelerates, defending against manipulation will require more than just better technology – it needs ongoing vigilance from all corners: society, industry, and government alike. This includes developing advanced detection tools, conducting regular audits, implementing transparency protocols, and educating users to critically evaluate AI outputs.

In this era of AI-driven knowledge, relentless oversight is the most effective safeguard for preserving the integrity and trustworthiness of the information that informs our decisions.