LLMs are now a lot cheaper than many believe: 1000x price drop


Generative artificial intelligence (AI) is now relatively cheap, with AI companies undercutting each other in an ongoing frenzy to offer better and cheaper large language models (LLMs). Some argue that LLMs are already 1,000 times cheaper than two years ago, and, in comparison, simple web searches are more expensive.

The belief that running LLMs is expensive is a complete misconception, and now the opposite is true, Juho Snellman, a systems programmer currently living in Zürich, Switzerland, argues in a weblog post.

“Inference has gotten cheaper even faster than models have gotten better, and nobody has an intuition for something becoming 1000x cheaper in two years,” the developer says.

ADVERTISEMENT

Snellman backs the claims with data. The cost of LLMs (API) is usually measured per million tokens. Some of the cheapest small but still capable models – such as Gemma 3, Qwen3, Gemini 2.0 Flash, and GPT-4.1 nano – can be used for less than a dollar per million tokens.

Only the most expensive models, such as O3, Claude 3.7 Sonnet, or Gemini 2.5 Pro, will cost above $10 per million tokens, with the priciest ones charging $75/1M tokens. The prices change frequently, usually going downward.

For comparison, the author lists the cost of accessing the Search API as $35 per 1,000 queries for the “Ground with Google Search” feature and $15 per 1,000 queries for Bing Web Search, less for other search engines.

Google Search can also be accessed via the Custom Search JSON API, with cheaper requests costing $5 per 1000 queries.

Gintaras Radauskas Konstancija Gasaityte profile jurgita vilius
Be the first to know and get our latest stories on Google News

Using LLM, a chatbot usually takes 500-1000 tokens to answer a given query, so the prices can be directly comparable.

“​​The low end of that spectrum is at least an order of magnitude cheaper than even the cheapest search API, and even the models at the low end are pretty capable,” Snellman said.

“A lot of people think LLMs are expensive to operate. That was true a couple of years ago, but these people haven't updated their views after an approximately 1000x reduction in prices over two years.”

ADVERTISEMENT

The author also argues that LLM's pricing levels are sustainable. Some AI companies, such as DeepSeek, provide their model weights for free for anyone to run the LLM locally, and despite that, their paid APIs have wide profit margins.

Why does this matter?

Many people think running LLMs is expensive. In reality, shrinking costs make new AI uses viable.

“Running the AI is already cheap, will keep getting cheaper, and will always have a monetization model of some sort since it's what the end user is interacting with,” Snellman argues.

The blog post sparked a debate on Hacker News on whether AI companies will be able to reach profitability. They spend heavily on research and AI training, and the revenues may not cover that.

Many of the unprofitable AI Labs have yet to start monetising consumer traffic, and this may be challenging as new cheaper models pop up and switching costs are low.

Many believe that ad-supported models may emerge as AI companies monetize free user access.

Yet, the cheap AI will be used in numerous new emerging cases, from enhanced personal assistants, or customer support chatbots, to personal AI agents, potentially constantly scraping the web, i.e., for concert tickets whenever a band plays in town. However, this use case would be very costly even with the current LLM pricing, according to Snellman.

Cheap AI will also bring scraping challenges for many service owners, who will need to find ways to limit AI agents from depleting their resources, potentially leading to more paywalls.

ADVERTISEMENT