What is LLM temperature?
Being behind major reports like The Mother of All Breaches and RockYou2024, our in-house cybersecurity experts and journalists provide unbiased, real-world testing and in-depth analysis.
We maintain complete transparency by openly sharing our testing methodologies with our audience.
Learn more
Ever wonder why some AIs seem more creative than others? Why do some hallucinate more than the competitors? The answer is quite simple – large language model (LLM) temperature.
Essentially, LLM temperature measures the unpredictability of an AI model's answer. At temperatures, models are pretty rigid. They only give the most probable outputs. Meanwhile, at high temperatures, models might surprise you with how creative and out there their answers can be. However, this creativity comes at the cost of hallucination.
This brings about many questions. Which temperature to choose? What do temperature values like 0, 0.3, 0.8, and 1.0 actually mean? What is LLM temperature’s potential business impact?
In this article, I explain what LLM temperature is, why it matters, when to adjust it, and give you a step-by-step guide to picking the perfect LLM temperature, along with real use cases and tips.
What is LLM temperature?
LLM temperature is essentially a control knob for your AI's creativity, changing the unpredictability of its answers. But, to understand how LLM temperature works, we need to start by understanding what a large language model is and how it works.
LLMs operate on a predictive model. They don’t know what to say, but rather predict what word (token) makes the most sense based on their training. This means that LLMs actually generate their answers word by word, evaluating which word is the most likely. Temperature is the setting that changes the probability tolerances for each word in an LLM's output.
In short, low temperature is deterministic, repetitive, and predictable. Medium temperature gives you relative consistency, with more variance in the answers. And high temperature gives you a lot of creative wording choices, but also risks hallucinations.
Three major temperature settings
I want to take a look at the three major and most typical temperature settings for LLMs. These are rough estimates for what each setting should be optimal for various uses. Note that these assume that the max temperature is 1. If the max temperature is 2, you should check how the LLM scales the setting.
- Low temperature (<0.2). Low temperature offers little variability, making it a great choice for coding, technical documents, or complex research. Essentially, if your task requires precision, rather than creativity, low temperature is an excellent setting.
- Medium temperature (0.3-0.7). At this setting, your LLM will get a little more creative. It will use synonyms and create varied input. This is a great point for writing articles or generating a presentation. Hallucinations can happen, but are rare.
- High temperature (0.8-1.0). High temperature is best for brainstorming and creative writing. If you’re looking for new ways to explain something, letting the LLM improvise with high temperature might get you on the right track. That said, its output is rarely production-ready, and should be edited before use.
How does LLM temperature work?
Since LLM temperature is based on the probabilistic nature of an LLM, I’ll give you a simple example of how an LLM processes a prompt.
For example, if you ask “what is French toast made of?”, an LLM will then quickly generate a set of tokens with probabilities that fit given answers. So in my example, the most probable answers (say, 90% probability) will be “eggs and bread”. A higher-temperature LLM might use less obvious phrasing as it gradually reduces the weight of the assigned probabilities in favor of a bit of randomness.
Perhaps it will say “brioche and egg custard”, or maybe add some spices to the mix. As your temperature approaches the max of 2.0, it’s likely to use weird synonyms, “sourdough product with chicken protein liquid”, until it flattens out its probabilities and starts spewing nonsense like “banana flavored rosin”.
Since LLMs are autoregressive, the previous words also change the probabilities for the following words. This impacts quality at higher temperatures. For example, if you ask an extremely high-temperature AI to explain E=mc², if it randomly determines that E is actually enjoyment, then it might start saying that m is mimosas, c is celebrations, and 2 is a multiplier, rather than a square. While it's still unlikely that this string will happen due to pre-learnt behaviors, if you run the prompt thousands of times, eventually a big mistake like this will happen.
Temperature isn’t the only factor that determines response quality. Top-P or nucleus sampling also works to maintain response quality by setting up a minimum required cumulative probability that a response has to have to be considered by the temperature. Essentially, top-P adds words until they cumulate to a set probability, and discards the lower probability ones. For example, a Top-P set at 90% would likely discount the most absurd answers (say “e refers to a barrister”), but an early bad token would still affect your answer.
Importance of fine-tuning LLM temperature
Fine-tuning your LLM’s temperature is critical to ensuring that your chatbot generates the results you need. To help you understand how this works in practice, here are a few examples.
Balancing creativity vs reliability
When setting LLM temperature, you have to think about what you need. Sometimes, you need a creative writer who will create text that doesn’t have that rigid AI feel. In that case, you raise the temperature. Other times, you may want concrete answers and don’t need the AI to be overly original. In that case, a lower temperature will suit you well.
Controlling variability across runs
With higher temperatures, LLM responses will also vary more across multiple attempts to complete the same prompt. This is great for ideation, brainstorming or mass rewrites, but not a great choice for automations that require predictability.
Matching domain and user expectations
The field used for an LLM is crucial for deciding an LLM's temperature setting. Legal or medical LLMs have to be predictable, so a low temperature is definitely required. On the other hand, an advertising company or scriptwriter will benefit from a higher temperature.
Supporting A/B testing and prompt design
Setting the right temperature can be crucial for evaluating engineered prompts. At Cybernews, when testing an AI prompt, we always decide on the temperature first. This is because, depending on the setting, the same prompt can have wildly different results. Always A/B test prompts with the same temperature settings to ensure that your comparison is appropriate.
When to use temperature?
Here are a few scenarios where you should modify your LLM’s temperature, and what the setting should be:
- Structured data extraction and formatting. In this case, you need a low LLM temperature (around 0.2). This will increase the likelihood that your data will be transferred accurately, and that the LLM won’t hallucinate information that isn’t there.
- Production code generation and refactoring. Go low-medium here, perhaps around 0.3. You need predictability here, and the only part that can benefit from creativity is the notes in the code.
- Q&A or support assistant. Use a medium temperature (around 0.5) here. You need some creativity to ensure that the responses aren’t bland or robotic, but you also need a lot of adherence to facts to ensure that the responses are reliable.
- Brainstorming, copywriting, and ideation. Use medium-high temperature (around 0.6-0.7) for this. You’ll get creativity while also ensuring that you avoid the LLM generating nonsensical responses, which may affect the quality of your output or send you down the wrong path.
- Storytelling, fiction, or creative experimentation. Use a high temperature above 0.8 for this. As long as you’re aware that this is essentially a mad scientist creating crazy word combinations within its learned patterns, it can be very useful, especially if your work is more abstract.
However, the most important thing is that temperature isn’t a tool that will replace good prompting, guardrails or evaluation. It can improve your outputs, but without creating an entire prompting ecosystem, it won’t change that much.
How to choose the perfect temperature (step-by-step)
To help you pick the perfect temperature for your LLM, I created a step-by-step guide.
Step 1. Define the goal and risk tolerance
The first thing you have to consider is simply what you will use the LLM for. This will define how risk-tolerant you are. If you’re doing creative writing, hallucinations will be far less impactful than if you’re doing law or medicine. Simply – decide what you want the model to generate, and how much a potential error can cost you.
Step 2. Pick an initial temperature band
Refer to my suggested temperature ranges and use them as a baseline. In case none of the examples apply to you, here are some broad ranges:
- 0–0.2 for strict, high-risk tasks
- 0.3–0.5 for balanced tasks
- 0.6–0.9 for creative work
Set this as your baseline before you start prompting.
Step 3. Run small experiments across a temperature sweep
Now, within that temperature range, test out the same prompt on different settings. Do not change any settings or wording. See which result is the most factually accurate and adheres to your required format, while maintaining creativity and avoiding hallucinations. You can try out a few other prompts if you feel like you need to double-check the setting.
Step 4. Evaluate outputs against real criteria
Next, you should evaluate your outputs against concrete criteria. These should include:
- Did it answer the question fully?
- Did it respect constraints (length, style, format)?
- Did it stay safe and on-policy?
- Did it make anything up?
You can also add additional criteria based on your specific use case.
Step 5. Lock defaults and adjust per use case
Once you’ve found the perfect setting, set it as a default in your LLM API. If you saw that different solutions require different approaches, set up easy-to-use overrides to quickly switch between them. Be sure to also document your process and temperature, so that other team members can understand everything clearly.
Real-world use cases for LLM temperature modeling
To help you further understand how setting up LLM temperature for specific workflows may help you achieve your goals. For example:
- Customer support tools. Generally require a lower temperature to avoid hallucinations, but in some scenarios may benefit from a slightly higher one to provide more empathy and a more sympathetic tone.
- Content marketing. A medium-temperature LLM will be optimal for conceptual work, but you may want to add a lower-temperature LLM to the pipeline for proofreading and fact-checking.
- Education and tutoring. A medium-temperature LLM will do a good job translating teachable information into interesting and engaging content.
- Coding assistants. Low-temperature LLMs should be used for coding to ensure high-quality code. However, it can be upped to medium for creative code explanations.
Best tips for setting the right LLM temperature
Picking LLM temperature is better if you follow some best practices. I recommend looking out for the following when modifying your chatbot’s settings:
- Always set a low temperature for high-risk tasks. If you’re working with sensitive information or in a field that requires precision (e.g. law, medicine, engineering) I recommend a low temperature environment to reduce the risk of AI hallucinations causing serious negative consequences.
- Use medium temperature as a baseline for general assistants. I always use medium temperature, for general purpose chatbots, as this provides me with a good combination of creativity and reliability for non-critical tasks.
- Only turn up the temperature in controlled environments. Don’t set AI with access to your systems or automations at high temperatures, as they may cause damage to your device or data. Instead, I recommend using them in closed environments for an exact purpose.
- Experiment with fixed prompts while changing only the temperature. By using the same prompts across multiple temperatures, you’ll find a sweet spot between creativity and predictability that fits your exact needs.
- Document your choices and revisit them. I regularly write down the reasons for a certain temperature setting, and then revisit them when my needs change or I introduce a new model to my pipeline.
- Combine temperature with other controls. Temperature isn’t the only tool that can help you control your LLMs output. I use top-p, system prompts, and guardrails to fine-tune my models.
Final thoughts
LLM temperature may seem daunting, but it’s actually a quite simple way to improve your LLM’s performance. You can treat it as a sort of slider between predictability and creativity. Most of the issues with temperature aren’t the fault of a setting itself, but rather a mismatch between the setting and your use case. A high temperature for a medical assistant is a bad idea, as is a low temperature for a poetry generator.
Treat the temperature setting as a knob you can adjust whenever you need to, and keep experimenting and re-evaluating your model’s performance. This way, you’ll be able to make the most out of both your model and your knowledge of LLM temperature.
As more LLM apps are created, it’s very likely that temperature will be less customizable for the end user. However, for general-purpose chatbots and APIs, adjusting temperature can still be an excellent way to customize your AI to your liking, so knowing how to handle AI temperature is important.
FAQ
What is LLM temperature in simple terms?
LLM temperature is a setting that controls the unpredictability and creativity of an AI's responses. A lower temperature makes the AI more repetitive and factual, while a higher temperature allows it to be more creative and varied.
What happens if I always set temperature to 0 for every task?
Setting the temperature to 0 makes the AI completely deterministic and rigid. While this is excellent for tasks requiring strict precision like coding or data extraction, it will make creative tasks like brainstorming or writing feel robotic and repetitive.
Is there a “best” default temperature that works for all use cases?
No, there is no single best temperature, but a medium setting (0.3-0.5) is a great baseline for balanced tasks. You should always adjust the setting based on whether you need strict reliability or varied creativity.
How does temperature interact with other settings like top-p or system prompts?
Temperature works alongside settings like Top-P to control the AI's output quality. While temperature adjusts the randomness of word selection, Top-P sets a minimum probability threshold to prevent the AI from using completely absurd words. System prompts provide the overarching guardrails to fine-tune the model.
Should I let end users change temperature themselves, or keep it fixed per application?
It is usually best to keep the temperature fixed based on the application's specific use case to prevent errors. However, for general-purpose chatbots, allowing users to modify the temperature can be an excellent way to let them customize the AI's creativity to their liking.