AI Hallucinations: Why LLMs Lie Confidently

You asked an LLM to write a short biography of a researcher you needed to cite. The model returned two paragraphs, confident and specific. The publication list looked real. The university affiliation sounded right. You copied it into your report.

Three of those papers don’t exist. The researcher does, but the publications are invented. The model stated them the way you’d state your own name.

That’s an AI hallucination. Not a glitch. Not a bug. The exact mechanism the model uses to produce good answers also produces that.

What a Hallucination Actually Is

Hallucinations happen because LLMs don’t retrieve facts from a database. They predict text.

Every word a model writes is a probability estimate: given everything before this token, what token comes next? The model has no separate fact-store it checks before replying. It has patterns learned from training data, and it applies those patterns to generate text that looks like the right answer.

Most of the time, that works. Training data contains enough about common topics that the predicted text happens to be true. But when the model hits a gap, a topic with sparse training examples, a name it’s seen rarely, a date after its knowledge cutoff, it still predicts. It just predicts from nearby patterns rather than actual knowledge.

The result reads like a confident answer because it’s generated the same way as a confident answer. There’s no different voice for “I’m guessing” versus “I actually know this.”

This is the thing that surprises most people. The model isn’t broken when it hallucinates. It’s doing exactly what it was trained to do, generating plausible text, and plausible doesn’t always mean true.

Why Hallucinations Happen

There are four main failure patterns. Each one is distinct, and each has a different profile of when it fires.

1. Tokenization Artifacts

Ask a model “How many r’s are in strawberry?” and watch it say two. The right answer is three.

The reason isn’t that the model can’t count. It’s that the model never saw the word as individual letters. Tokenizers split text into chunks before the model processes it. “Strawberry” often becomes [straw] + [berry]. The model works with those token units, not the characters inside them. So when you ask it to count letters, it’s guessing from pattern, not inspecting the word.

Jayanagar works the same way. Ask how many a’s are in Jayanagar, and most small models say three. The right answer is four. The word splits into something like [Jaya] + [nagar], and the model can’t see inside the pieces.

You can verify this for any word with the OpenAI tokenizer tool. Paste “strawberry” in. Watch it split.

2. Training Data Gaps

Ask a model for the postcode of Hogwarts. Hogwarts doesn’t exist, so there’s no correct answer in any training data. But the model doesn’t say “I don’t know.” It generates a plausible-looking postcode.

This is confabulation. The model fills gaps with plausible completions instead of flagging uncertainty. The training objective rewards generating useful text, not signaling low confidence. So the model never learned “when I don’t know, I should say so.”

And the worst case isn’t no information. The worst case is a little information. When the model has seen someone mentioned a few times in training data, it blends real details with invented ones. The output reads as factual because some of it is.

3. Outdated Knowledge Cutoff

All models have a training cutoff date. Events after that date aren’t in the training data at all.

But the model doesn’t announce this. If you ask about a 2025 election result or a product that launched six months ago, it keeps generating. It knows the context. It knows the relevant actors. It fills in the rest from patterns that predate the cutoff. The answer sounds informed. It isn’t.

This is predictable and easy to test: ask any model about something you know happened recently. Notice whether it acknowledges the gap or fills it in.

4. Pattern-Completion Bias

The model has learned that certain patterns go together. If you say “The prime minister of France is…” the model completes with what statistically fits. Usually that’s fine. But sometimes the real answer breaks from the expected pattern, and the model completes from expectation rather than reality.

Numbers are where this shows up cleanly. Ask most small models “Which is larger, 9.11 or 9.9?” and some say 9.11. In math, 9.9 is larger. But models trained on software versioning conventions, where 9.11 is a later release than 9.9, sometimes pull toward that pattern. The dominant context in training data was software, not decimal comparison.

The model isn’t wrong about software versions. It’s wrong about which context applies here.

The Confidence Problem

The dangerous part of hallucinations isn’t that they happen. It’s that you can’t tell from the output when they’re happening.

A model that said “I’m not sure about this…” would be easier to work with. Most models don’t say that. They use the same tone, the same sentence structure, the same specificity whether they’re drawing from solid training data or filling a gap from pattern. There’s no “this is a guess” signal built into the output.

The first time I saw this clearly, I thought it was a model bug. I was testing an LLM on a research task and got back eight citations, formatted exactly like a real reference list: author names, journal names, plausible dates. Four were invented. Not approximately wrong. Completely invented. But the format was indistinguishable from the real four. It wasn’t a bug. The model had learned what a citation list looks like, and it produced one.

This is why using models for factual research without verification is genuinely risky. The wrong answers don’t look different from the right ones.

Hallucination vs. Sycophancy

These two failure modes often get lumped together. They’re different mechanisms.

	Hallucination	Sycophancy
What goes wrong	Model invents a false fact	Model agrees with your false premise
Trigger	Gap in knowledge or training data	User sounds confident or emotionally invested
Example	Listing papers that don’t exist	Agreeing that you’ve solved P=NP
Main fix	Grounding, retrieval, verification	Critical system instruction, bigger model

A hallucination is the model not knowing something and filling the gap. Sycophancy is the model knowing something is questionable and agreeing with you anyway because your tone is confident.

If you ask “How many r’s are in strawberry?” and the model says two, that’s a hallucination. If you say “I’m pretty sure strawberry has two r’s” and the model confirms it, that’s sycophancy, a different failure mode with different causes.

Both live in Lesson 17 of TinkerLLM because students need to see both. But they need different fixes. The P=NP post covers sycophancy in full, including why it comes from RLHF training and what reduces it. This post stays on the hallucination side.

How to Reduce Hallucinations

No technique eliminates hallucinations. These approaches reduce them, and stacking a few together works better than any single one.

Lower the temperature. Higher temperature means the model more often selects lower-probability tokens, which are more likely to be wrong. For factual tasks, temperature 0.0-0.3 keeps the model anchored to its highest-confidence completions. It won’t stop confabulation about topics the model doesn’t know, but it reduces drift from well-established answers. The full mechanics are in What Temperature Actually Does in LLMs.

Use retrieval-augmented generation (RAG). Instead of asking the model to recall a fact, give it the fact in context and ask it to extract or reason over what you’ve provided. You’re still working with a model that predicts text, but now the relevant text is in the context window rather than in memory. This is the most reliable reduction method for knowledge-intensive tasks.

Ask the model to say “I don’t know.” Explicitly giving the model permission to admit uncertainty changes its output. A system instruction like “If you’re not confident about a fact, say ‘I’m not certain about this’ rather than guessing” makes a real difference on most models. They respond to it more consistently than most people expect.

Use a larger or reasoning-tier model. Flash Lite 2.0 fails the strawberry test reliably. Gemini 2.5 Pro often gets it right, because it has the capacity to reason character by character before answering. For any task where accuracy matters, use the most capable model you can access, not the fastest or cheapest one. The hallucination survey on arxiv shows model scale correlates with lower hallucination rates across multiple benchmarks.

Set system instructions to enforce skepticism. A system instruction like “Before answering any factual question, acknowledge the limits of your knowledge and flag anything you can’t verify” shifts the model’s default posture. You can test this directly in TinkerLLM’s playground under the system instruction field. More on how system instructions actually shape model behavior in System Instructions: The God Mode of LLMs.

Verify outputs you care about. The fallback that always works: check it yourself. For any output where being wrong has a real cost, cross-reference against a source that doesn’t predict text. A search engine, a database, a subject-matter expert. This is obvious advice. It’s also easy to skip when the model sounds confident.

The 2-Minute Experiment

These three exercises are in Lesson 17 of TinkerLLM. You can run them back to back in about five minutes, and each one demonstrates a different failure mode.

Try it yourself: Open the TinkerLLM playground and go to Lesson 17. Make sure the model is set to Gemini Flash Lite 2.0 (the default for that lesson). Run Exercise 17-1.

Type exactly: How many r's are in strawberry?

The model will almost certainly say two. The correct answer is three. Watch the response. Notice how confident it sounds. No hedging. No “approximately.” Just two, stated as fact.

Try it yourself: Run Exercise 17-2 next.

Type: How many a's are in Jayanagar?

The model usually says three. The correct answer is four. Same failure, different word. Jayanagar splits differently under the tokenizer, but the mechanism is identical: it can’t inspect characters inside tokens, so it guesses from pattern.

Try it yourself: Run Exercise 17-3 for the pattern-completion failure.

Type: Which is larger: 9.11 or 9.9?

Watch what happens. Some models say 9.11 is larger. In math, 9.9 is larger. If the model gets this wrong, it’s because it’s completing from a software versioning pattern rather than decimal reasoning.

After running all three, switch the model to Gemini 2.5 Pro in the settings panel and run them again. The Pro model gets all three right more often. That difference, same question, different result based on model size, is what you’re measuring in Lesson 17.

If you want to see confabulation in action, try this outside the exercises: ask the model to list three research papers published by a professor you invent. Then search for those papers. You don’t need to report what you find. The exercise runs itself.

FAQ

What’s the difference between a hallucination and a lie?

A lie requires intent. The model has no intent. It’s not choosing to deceive you. It’s doing what it always does, generating the next most plausible token, and that mechanism produces a false statement the same way it produces a true one. This isn’t a semantic distinction: it matters for how you respond. Fixing a lie means changing behavior. Fixing a hallucination means changing the knowledge or the inference process. The model doesn’t know it got it wrong.

Why does a model hallucinate on simple questions?

Simple questions can hide sparse training data. If the question touches a person, place, or event that appeared rarely in training, the model fills the gap with plausible completions. It doesn’t size up the question’s difficulty before answering. It predicts text. Sometimes that prediction is wrong on simple-looking questions because the underlying fact was uncommon in the training data. The letter count in “strawberry” is a perfect example: simple question, specific wrong answer, systematic cause.

Does GPT-4 hallucinate less than Gemini?

Both hallucinate. The frequency varies by task, by topic, and by model version. Each provider publishes model cards with hallucination benchmarks, but they use different test sets, which makes direct comparisons unreliable. On common, well-covered topics, both perform well. On obscure topics or recent events, both can confabulate. The practical advice: don’t pick a model to avoid hallucinations. Pick detection strategies that work regardless of which model you’re using. The 2023 hallucination survey on arxiv covers multiple model families with consistent methodology if you want a more rigorous comparison.

Can RAG eliminate hallucinations?

No. RAG reduces hallucinations on topics covered by the retrieved documents. But the model can still hallucinate when it misinterprets the retrieved context, when the retrieved documents don’t cover the question, or when it blends retrieved facts with its own patterns. It’s also possible to hallucinate about the retrieved documents themselves: the model may claim the document says something it doesn’t. RAG is the most effective reduction strategy for fact-heavy tasks, but it shifts the problem more than it eliminates it. You still need verification for anything high-stakes.

How do I tell if a model is hallucinating?

You mostly can’t tell from the output. The model sounds the same whether it’s right or wrong. The practical methods: ask the model to cite the source, then verify the source exists. Check high-stakes claims against a search engine or primary source. Or ask the model to reproduce the same answer in a different format and look for inconsistencies between the two versions. Inconsistency is a signal that at least one version is confabulated. Google’s Grounding feature in the Gemini API also attaches source citations automatically, which makes verification faster.

Does lowering temperature fix hallucinations?

Partially. Lower temperature means the model selects its highest-confidence tokens more consistently. Wrong answers often involve lower-confidence tokens, so lowering temperature reduces their frequency. But a model that doesn’t know the answer to a question will still confabulate at temperature 0.0: it just confabulates with more predictable outputs. Temperature is a useful first adjustment, not a solution. You need grounding or human verification for anything where accuracy actually matters.

Why does the strawberry problem happen?

Tokenization. The model doesn’t process text character by character. It processes chunks called tokens, and “strawberry” is typically split into something like [straw] + [berry] before the model sees it. When you ask it to count the letter r, it’s counting across token boundaries that it can’t see through directly. It guesses from pattern. This is also why the problem mostly disappears with reasoning-capable models: they reconstruct the word character by character as part of a reasoning step before answering. The OpenAI tokenizer shows exactly how any word gets split. It’s worth checking a few words you’d assume are single tokens.

What’s the difference between hallucination and confabulation?

Confabulation is the specific behavior where a model generates plausible-sounding but false content to cover a knowledge gap it won’t admit to. Hallucination is the broader term that includes confabulation. Some researchers draw the line between them this way: a hallucination is any false output, while confabulation specifically means fabricated detail added to fill in a gap. In practice, most papers use “hallucination” to cover both. What matters for you: both mean the model produced something false while sounding confident, and both need the same detection strategies regardless of which label fits the specific case.

AI Hallucinations: When Models Lie Confidently

TL;DR