"Don't Hallucinate" and Other Bad Prompts

A lot of people add unnecessary preambles to their LLM prompts—and that’s what this post is pushing back on.
Sometimes it's a simple—
"Please don't hallucinate"
or...
I knew someone who prepended every prompt with a 200-token manifesto:
"When responding, focus on delivering accurate, well-reasoned answers grounded in reliable sources or well-established principles. Be upfront about any limitations in the available information or your ability to provide a definitive answer. Distinguish clearly between widely accepted facts, informed interpretations, and areas where evidence is limited or inconclusive. Avoid overstating confidence, and don’t fill in gaps with assumptions unless clearly marked as such. Clarity, transparency, and intellectual honesty are more important than sounding persuasive."
Why you don't need to do this
They're already trained to do this
LLMs like GPT-4 (and especially GPT-4-turbo/o4 models) are already trained and fine-tuned with exactly these priorities in mind: factual accuracy, logical coherence, distinguishing between evidence and speculation, etc.
These models have extensive instruction tuning and alignment reinforcement to respond with facts, logical coherence—by default.
They are explicitly penalized during training for hallucinating or overstating certainty.
You cannot simply tell an LLM to not hallucinate–that's like telling a fish not to swim.
Or a smalltown kid he can't dance.

You're wasting your breath
These long preambles eat into context length.
In practice, every prompt has a token cost. Repeating a 100+ token preamble every time wastes valuable space, especially in longer chains or documents.
Even if you're not using a long preamble, and you're simply saying, "don't hallucinate", you're wasting your precious time.
It doesn't significantly affect results
If you’re already writing clear prompts (e.g., “Summarize the research on X” or “Give evidence-based reasoning”), the model will usually respond with appropriate caution and transparency.
⚠️ When it might add value:
That being said, these prompts do have some impact, and that might be valuable in a different way than you're expecting.
Old or Bad Models
If you're working with a model that tends to be overconfident or speculative (some older or smaller models), such instructions can help steer behavior.
⭐ Adjusting prompt tone
If you include a long set of instructions in a system prompt (not repeated with each message—just the first message of a conversation), this could provide a very custom, tailored, and persistent tone across all conversations.
There are plenty of use-cases where this can be helpful—especially in structured workflows (where machines, agents, etc. are calling the LLM and acting on that output, etc.).
If the LLM is being embedded in a specific domain app—e.g., a psychology tool, legal assistant, or research assistant—the system prompt can help lock in a conservative epistemic stance across all outputs.
It's probably not going to improve the accuracy though. Just tone. But sometimes tone sort of is accuracy? If that's very important to your application.
Tool Use Can Be Prompt-Guided
If your system prompt urges the LLM to use tools that it has access to, like citations, that may make the results easier to verify, or maybe even more accurate if, i.e., it runs some generated code with the code interpreter and gets an error.
(WARNING, SUBJECTIVITY INCOMING)
Personally, when it comes to LLMs, I'm team anonymity.
I don't want the LLM to know anything about me. I don't want memory across conversations, and 99% of the time, especially in a conversational use-case, I don't want a system prompt.
Any system prompt I craft will just bend the model toward me; I want vanilla.
I get that a lot of users love memory and system prompts—I won’t yuck your yum. That’s just not how I use these tools. I don’t want a relationship. I’m already happily married, thanks.