3 Prompt AI Series #5: The Three-Rule Framework in Cybersecurity, Blue, Red, and Purple Team Applications

The Three-Rule Framework in Cybersecurity: Blue, Red, and Purple Team Applications

This is the fifth post in a series about the three-rule framework — Force Blank, Penalize Guessing, Show the Source. So far, the framework has been applied to document extraction, worldbuilding, business scenario building, and implementation at scale.

This post applies it to a domain where the stakes are as high as they get: cybersecurity.

The structural problem is familiar. AI generates confident-sounding security assessments that blend verified findings with inferences and assumptions — and nobody can tell which is which. A vulnerability scan result sits next to an architectural inference sits next to an untested assumption about network segmentation, all presented with the same confidence. The output looks thorough. Some of it is grounded. Some of it was invented to make the narrative hold together.

In business scenario planning, that’s a waste of planning resources. In cybersecurity, it’s a breach waiting to happen.

Why Cybersecurity Is Especially Vulnerable

A 2025 survey paper on hallucinations in AI-driven cybersecurity systems identifies the core risk: AI models optimized for fluency can misidentify safe activities as threats or miss actual dangers, and they do so with the same confident tone regardless of whether their assessment is grounded in evidence. The paper notes that the design goal of fluency over accuracy in many LLMs reinforces this problem — models produce highly confident and coherent statements even when they lack factual grounding.

The OWASP Foundation’s 2025 guidance lists AI hallucination among the top risks to LLM-powered security tools. The concern isn’t just about false positives or false negatives in isolation — it’s about the blending of real findings and fabricated assessments into a single output that humans then treat as uniformly reliable.

This maps directly onto the problem the three-rule framework was built to solve. And it adapts cleanly to all three team perspectives.

The Cybersecurity Source Tags

In every previous domain, the framework distinguishes between what’s established, what’s derived, and what’s unknown. Cybersecurity needs the same distinction, but the specific tags change depending on which team perspective you’re operating from — because each team has a different “source of truth.”

The gap labels, however, are universal across all three teams:

[VISIBILITY GAP] — something the team cannot see or hasn’t instrumented: unmonitored network segments, missing log sources, shadow IT, cloud services without telemetry
[VALIDATION GAP] — a control or detection rule that exists on paper or in configuration but has never been tested against actual attack behavior

The distinction between these two is critical. A visibility gap means you don’t have the data. A validation gap means you have the data (or think you do) but haven’t proven the detection actually works. Both are dangerous; they require different responses.

Blue Team: Defense

The Blue Team’s source of truth is: what has been confirmed about the defensive posture through direct evidence — not what the policy says, not what the vendor promises, not what should be true.

This distinction matters more than it might seem. Research from Mitiga suggests that traditional SIEMs detect only about 21% of MITRE ATT&CK techniques on average — meaning roughly four out of five technique categories have no validated detection coverage. Yet many organizations report much higher coverage numbers, because they count configured rules rather than validated detections. The gap between “we have a rule for this” and “this rule has fired correctly against a real attack” is exactly the gap the three-rule framework surfaces.

AttackIQ’s March 2026 analysis captured this perfectly: coverage isn’t static, environments change, and a detection that worked last month may have silently broken. An attendee at one of their webinars put it bluntly: having detections means nothing if they don’t produce signal the team can act on.

Blue Team Source Tags

(VERIFIED) — confirmed through direct evidence: tested control, observed log entry, validated configuration, scan result with timestamp
(POLICY-STATED) — documented in policy, configuration guides, or vendor documentation but not independently validated. “Our policy says MFA is enforced” is not the same as “MFA is enforced”
(INFERRED) — derived from indirect indicators. “No alerts in 90 days” doesn’t mean “no intrusions in 90 days” — it might mean the detection isn’t working

Blue Team Prompt

You are a defensive security analyst reviewing our environment. For every security claim, tag it with its evidence basis:

• (VERIFIED) — confirmed through direct testing, log evidence, or scan results. Cite the specific evidence.

• (POLICY-STATED) — documented in policy or configuration guides but not independently validated. Flag what validation would look like.

• (INFERRED) — derived from indirect indicators. State the inference chain and what could invalidate it.

• When you identify a control or coverage area that hasn’t been tested or instrumented, flag it as [VISIBILITY GAP] or [VALIDATION GAP] with a brief explanation.

• A false sense of security is more dangerous than a known gap. A known gap gets resourced. False confidence gets breached. When in doubt, flag the gap.

Blue Team Example Output

Control	Status	Source	Note
MFA on VPN	Enabled	POLICY-STATED	Azure AD conditional access policy requires MFA. [VALIDATION GAP: no penetration test has verified enforcement on the Cisco AnyConnect endpoint]
EDR coverage	94% of endpoints	VERIFIED	CrowdStrike dashboard, pulled April 2026
Lateral movement detection	Active	INFERRED	SIEM has rules for pass-the-hash. No Purple Team exercise has tested whether they fire. [VALIDATION GAP]
DNS exfiltration detection	—	VISIBILITY GAP	No DNS logging configured on internal resolvers. Cannot detect DNS tunneling.

The crucial row is the first one. Without the framework, the MFA assessment would read “MFA: Enabled” — green on the dashboard, confidence high, nobody worries. With the framework, you see it’s POLICY-STATED, not VERIFIED, with a specific validation gap. That gap is exactly what an attacker would exploit.

Red Team: Offense

The Red Team’s source of truth is: what has been demonstrated through actual exploitation — not what should be vulnerable, not what Shodan shows, not what the CVE database suggests.

The risk of unflagged assumptions in Red Team reporting is that defensive resources get allocated to the wrong priorities. A hypothesized attack path reported as confirmed sends the Blue Team chasing a phantom while the actual exposure remains unaddressed.

Red Team Source Tags

(CONFIRMED) — vulnerability exploited, access achieved, data exfiltrated — with evidence (screenshot, hash, artifact, session log)
(INDICATED) — strong indicators of vulnerability but not yet exploited: version fingerprint matches a known CVE, default credentials detected but not tested, open port with a known-vulnerable service banner
(HYPOTHESIZED) — attack path exists in theory based on architecture review but no testing has been performed. States the assumptions the path depends on

Red Team Prompt

You are a red team operator analyzing potential attack paths. For every finding or attack path, tag it:

• (CONFIRMED) — vulnerability exploited and access demonstrated. Cite the specific evidence (tool output, artifact, hash).

• (INDICATED) — strong indicators suggest vulnerability but exploitation has not been attempted. State the indicators you’re relying on.

• (HYPOTHESIZED) — attack path is theoretically plausible based on architecture but untested. State the assumptions the path depends on.

• Flag untested attack surfaces as [UNTESTED SURFACE: description].

• Flag assumed network paths or trust relationships as [ASSUMED PATH: what connectivity or trust is assumed].

• A hypothesized vulnerability reported as confirmed wastes defensive resources on the wrong priority. When not yet tested, say so.

Red Team Example Output

Finding	Severity	Source	Detail
Jenkins RCE (CVE-2024-XXXX)	Critical	CONFIRMED	Exploited via Metasploit, reverse shell obtained. Artifact: session log ID 47
Lateral movement Jenkins → DB	High	HYPOTHESIZED	Jenkins on VLAN 10, DB on VLAN 20. [ASSUMED PATH: assumes no ACL between VLANs — not validated]
S3 bucket public access	Medium	INDICATED	Bucket policy allows s3:GetObject for *. [UNTESTED SURFACE: no attempt to download sensitive content]
Domain admin via Kerberoasting	High	INDICATED	SPN found on service account with weak encryption (RC4). Hash not yet cracked.

The lateral movement finding is where the framework earns its keep. Without tags, the report reads “Jenkins server can be used to pivot to the database” — which sounds confirmed. With the framework, you see it’s HYPOTHESIZED with an ASSUMED PATH. The Blue Team’s response shifts from “patch this immediately” to “first, verify whether the VLAN segmentation actually blocks this traffic.”

Purple Team: Collaboration

Purple Teaming is where the framework becomes most powerful, because the entire point is to map offensive capabilities against defensive capabilities — and the space between them is exactly what the gap labels surface.

A Purple Team exercise structured with the framework produces a matrix that answers four distinct questions for each technique: Did the Red Team succeed? Did the Blue Team detect it? Where did detection fail — at the visibility layer or the logic layer? And which techniques weren’t tested at all?

That last question — what wasn’t tested — is often the most important finding of a Purple Team engagement, and the one most likely to go unreported without explicit tagging.

Purple Team Source Tags

(DETECTED) — Blue Team successfully detected and alerted on the attack technique. Cites the specific alert, rule, or SIEM correlation
(PARTIALLY DETECTED) — some indicators were logged but not correlated into an actionable alert. The raw data exists; the detection logic doesn’t
(MISSED) — attack technique succeeded without triggering any detection. Highest-priority finding
(NOT TESTED) — this MITRE ATT&CK technique was not exercised in this engagement. Detection capability cannot be assessed

Purple Team Prompt

You are a purple team analyst mapping attack results against detection capabilities. For each technique tested, tag the outcome:

• (DETECTED) — Blue team detected and alerted. Cite the specific detection rule, alert, or correlation.

• (PARTIALLY DETECTED) — indicators were logged but not correlated into an alert. State what was logged and what detection logic is missing.

• (MISSED) — attack succeeded without detection. This is the highest-priority finding.

• (NOT TESTED) — technique was not exercised in this engagement. Do not assess detection capability for untested techniques.

• For every MISSED finding, flag the root cause: [VISIBILITY GAP] (data not collected) or [DETECTION GAP] (data collected but no rule or correlation exists).

• Do not assume a control works because it exists. A SIEM rule that has never fired against a real attack is a [VALIDATION GAP], not a (DETECTED).

• An untested technique reported as detected gives false confidence. When not tested, say NOT TESTED — never extrapolate.

Purple Team Example Output

MITRE Technique	Red Result	Blue Result	Source	Action
T1566.001 Spearphishing	Payload delivered	Alert fired in 12 min	DETECTED	Review: 12 min mean-time-to-detect acceptable?
T1003.001 LSASS dump	Credentials extracted	No alert	MISSED	[VISIBILITY GAP: no LSASS access monitoring configured]
T1071.001 Web C2	C2 channel established	Proxy logged traffic, no alert	PARTIALLY DETECTED	[DETECTION GAP: beaconing pattern recognition needed]
T1053.005 Scheduled Task	—	—	NOT TESTED	[VALIDATION GAP: persistence techniques not in scope]

Read the matrix from left to right, and it tells you exactly where you stand: one technique is covered, one is blind, one has raw data but no intelligence, and one hasn’t been evaluated. That’s a decision-ready assessment. No AI-generated optimism obscuring the picture.

The Cybersecurity Version of Rule 2

The “penalize guessing” rule takes on particular urgency in security contexts. In document extraction, a wrong answer wastes time. In cybersecurity, the costs compound differently:

A (HYPOTHESIZED) finding reported as (CONFIRMED) → defensive resources go to the wrong place
A (POLICY-STATED) control reported as (VERIFIED) → actual exposure stays unaddressed
A (NOT TESTED) technique reported as (DETECTED) → the team believes they’re protected when they’re blind
A [VISIBILITY GAP] that goes untagged → the adversary operates in the one area nobody is watching

The cybersecurity formulation of Rule 2 is:

False confidence in a security control is more dangerous than a known gap. A known gap gets resourced. False confidence gets breached.

This isn’t hypothetical. The pattern of “assumed coverage, actual blindness” is precisely what major breaches exploit. Attackers don’t target your strongest defenses — they find the areas where you think you’re covered but aren’t. Every [VALIDATION GAP] the framework surfaces is a place the adversary would probe. Every unflagged assumption about detection capability is an invitation.

The Cross-Team Pattern

Looking across all three team perspectives, the framework reveals a consistent pattern:

Team	Source of Truth	Source Tags	Gap Labels	Rule 2 Formulation
Blue	Confirmed defensive posture	VERIFIED / POLICY-STATED / INFERRED	VISIBILITY GAP / VALIDATION GAP	A policy-stated control treated as verified is an open door
Red	Demonstrated exploitation	CONFIRMED / INDICATED / HYPOTHESIZED	UNTESTED SURFACE / ASSUMED PATH	A hypothesized path treated as confirmed wastes defense resources
Purple	Validated detection capability	DETECTED / PARTIALLY DETECTED / MISSED / NOT TESTED	VISIBILITY GAP / DETECTION GAP / VALIDATION GAP	An untested technique treated as detected creates false confidence

The structure is the same in every case: distinguish between what’s been proven and what’s been assumed, make the boundary visible, and treat false confidence as worse than acknowledged uncertainty.

Or, to put it in terms any security professional will recognize: the framework turns “assumed coverage” into “validated coverage” — and makes everything in between explicit.

Integration with MITRE ATT&CK

The three-rule framework maps naturally onto MITRE ATT&CK assessments. The ATT&CK matrix gives you the what — which techniques exist. The framework gives you the how confident — which techniques you’ve actually validated coverage for.

A standard ATT&CK heatmap shows red (no coverage) and green (coverage). The three-rule framework adds the crucial middle layer: green that’s been validated vs. green that’s been assumed. As AttackIQ notes, the gap between a green box and actual defensive capability can be enormous. Coverage for one procedure is not coverage for a technique. A detection that worked last month may have silently broken this month. Without continuous validation, your heatmap shows what you deployed, not what works.

Adding the three-rule tags to your ATT&CK coverage assessment turns it from a deployment map into a confidence map — and confidence maps are what security decisions should be based on.

Sources and Further Reading

CAEE Journal (April 2025): “The Paradigm of Hallucinations in AI-driven Cybersecurity Systems.” Taxonomy of hallucination impacts on cybersecurity tools.
AttackIQ (March 2026): “What Does MITRE ATT&CK Coverage Really Mean?” On the gap between claimed coverage and validated detection, including the distinction between procedure-level and technique-level coverage.
Mitiga (2025): “Measurements That Matter.” Reports ~21% average ATT&CK detection rate for traditional SIEMs.
Kroll Cyber Risk (2023): “MITRE ATT&CK Detection Maturity Assessment Guide.” Template-based approach to identifying coverage gaps.
OWASP / Hacken (2025): “LLM Security Frameworks: A CISO’s Guide.” On NIST AI RMF, ISO 42001, and hallucination monitoring requirements.
MITRE ATT&CK: attack.mitre.org. The knowledge base of adversary tactics and techniques.
Previous posts in this series:
Post 1: Three Prompt Rules That Stop AI From Guessing — And the Science Behind Them
Post 2: From Document Extraction to Alternate History
Post 3: The Three-Rule Framework for Scenario Building
Post 4: Implementing the Framework: Calibration, Governance, and Trade-offs

April 20, 2026

3 Prompt AI Series #4: Framework: Calibration, Governance, and Trade-offs

Implementing the Three-Rule Framework: Calibration, Governance, and Trade-offs

The previous post in this series introduced a general framework for AI-assisted scenario building: Force Blank, Penalize Guessing, Show the Source. The framework produces output where every claim is tagged as VERIFIED, ASSUMED, or PROJECTED, and where gaps are explicitly labeled instead of silently filled.

That’s the what. This post is about the how — three practical challenges that anyone implementing the framework will encounter:

Calibration: You’ve tagged something as ASSUMED. How do you check whether the assumption is reasonable?
Governance: How do organizations enforce tagging in actual workflows — not just in one person’s prompt?
Trade-offs: Doesn’t all this tagging create cognitive overload? How do non-experts read a document full of provenance labels?

1. Calibrating Assumptions: From “Tagged” to “Tested”

Tagging an assumption is necessary but not sufficient. (ASSUMED: market grows 15% annually) is better than an unlabeled 15% baked into the projection — but it still doesn’t tell you whether 15% is defensible. The framework surfaces assumptions; calibration tests them.

Four calibration methods work well with the tagged output:

Reference Class Forecasting: The Outside View

Daniel Kahneman and Amos Tversky’s distinction between the “inside view” (planning based on the specifics of this project) and the “outside view” (what happened in similar projects historically) is the single most useful concept for calibrating assumptions. The planning fallacy — systematically underestimating costs and timelines — is so well-documented that the American Planning Association officially endorsed reference class forecasting in 2005 as a corrective.

In practice, this means: for every ASSUMED tag, ask the model (or yourself) to identify 3–5 comparable situations and their actual outcomes. If you assume 15% growth, what growth did similar products in similar markets actually achieve? If you assume a 6-month regulatory timeline, how long did comparable approvals actually take? The tagged format makes this step natural — you have a list of assumptions; now walk down it with an outside view on each one.

You can even build this into the prompt:

For every ASSUMED tag, add a “Calibration” note: identify 2–3 comparable historical cases and their actual outcomes. If no comparable data exists, note [NO REFERENCE CLASS].

Sensitivity Testing: What Breaks If This Is Wrong?

Not all assumptions are equally important. RAND’s Assumption-Based Planning calls this “criticality” — an assumption is critical if its failure would require fundamental changes to the plan. In practice, this means testing: what happens to the conclusion if this assumption is 50% wrong? If the answer is “not much,” the assumption is low-priority. If the answer is “the entire business case collapses,” that’s your highest-priority validation target.

The tagged format enables this directly. You can ask the model:

Take the three ASSUMED items with the highest downstream impact on the final projection. For each, recalculate the projection with the assumption at 50% of stated value and at 150%. Show me which assumptions the conclusion is most sensitive to.

Pre-Mortem: Imagine It Failed

Gary Klein’s pre-mortem technique inverts the question: instead of asking “will this work?”, you start from “it failed — why?” This is particularly effective for ASSUMED tags, because it surfaces failure modes that optimism hides. Ask the model:

Assume this scenario failed after 12 months. Which of the ASSUMED items were most likely the point of failure? For each, describe a plausible narrative of how that assumption broke down.

Temporal Decay: When Does the Assumption Expire?

Assumptions have shelf lives. A market size estimate from a 2025 Gartner report is still reasonable in 2026. A competitive landscape assumption from 2024 may already be wrong. Adding a temporal dimension to ASSUMED tags helps:

For each ASSUMED tag, add an expiry estimate: how long is this assumption likely to remain valid? Mark anything older than 12 months or based on pre-2025 data as [STALE ASSUMPTION].

2. Governance: Making the Framework Stick Beyond One Person’s Prompt

The framework works well when one person uses it in one chat session. The governance question is: how does it survive contact with an organization — multiple people, multiple AI tools, multiple documents, over months?

The Problem: Tags Die in Translation

What typically happens: an analyst generates a beautifully tagged scenario. They copy it into a slide deck. The tags disappear. A manager reads the deck, sees “Year 1 revenue: €310K” with no indication that the number is PROJECTED from two unvalidated ASSUMED inputs. The ghost scenario lives again.

This is a knowledge management problem, not an AI problem. And it has knowledge management solutions.

Level 1: Template Enforcement

The simplest governance mechanism is a template. If your organization uses AI for scenario planning, the output template should have provenance columns built in. Not optional, not “add if useful” — structurally required. A scenario document without source tags should be treated the same way as a financial report without citations: incomplete.

Concretely: create a standard table format for all AI-assisted scenario outputs:

Variable	Value	Source	Basis / If Wrong	Validated By	Date
(All AI-generated scenario outputs must use this format)

The “Validated By” and “Date” columns are the governance additions. They turn a prompt technique into an audit trail. Someone must sign off on each ASSUMED item before it enters planning.

Level 2: Review Workflow

For organizations with more structured processes, integrate tagging into the review cycle:

Step 1 — Generation: AI produces tagged output using the three-rule prompt.
Step 2 — Assumption Review: A domain expert reviews all ASSUMED and PROJECTED items. Each gets one of three dispositions: confirmed (reclassified to VERIFIED), challenged (sent for calibration), or accepted with risk (kept as ASSUMED with a documented rationale).
Step 3 — Gap Triage: All DATA GAP and ASSUMPTION GAP items are triaged: resolvable (assign someone to find the data), irreducible (the uncertainty is inherent — document it and plan around it), or deferred (not needed for this decision stage).
Step 4 — Decision Package: The final document separates “what we know” (VERIFIED), “what we believe” (ASSUMED, with calibration notes), and “what we don’t know” (remaining gaps). Decision-makers see all three.

Level 3: System Prompt Standardization

If your organization uses AI across multiple teams, standardize the system prompt. Don’t rely on individual analysts remembering to apply the three rules. Embed the framework into every AI access point — whether that’s a shared Claude project, a custom GPT, an API wrapper, or an n8n workflow. The prompt becomes infrastructure, not personal practice.

For teams using Claude Projects or custom GPTs, the three-rule prompt goes into the project instructions or system message — it’s active for every conversation in that workspace without anyone needing to remember to include it.

The Cultural Challenge

The hardest governance problem isn’t technical. It’s that tagging uncertainty feels like weakness. Presenting a scenario full of ASSUMED and DATA GAP labels to a board looks less confident than presenting clean numbers. The organizational response to this must be explicit: a tagged scenario is not an incomplete scenario — it’s an honest one. The clean numbers were never clean; they just hid where the guesses were.

This is exactly what Bent Flyvbjerg’s decades of research on megaproject failures shows: the projects that went most catastrophically over budget weren’t the ones with the most uncertainty — they were the ones where the uncertainty was hidden. Transparency about assumptions is a risk reduction strategy, not an admission of weakness.

3. Trade-offs: When Tags Become Noise

A document where every sentence carries a provenance label is exhausting to read. The framework creates real cognitive overhead, and pretending otherwise is dishonest. The question isn’t whether there’s a cost — there is — but how to manage it.

The Overload Problem

Consider a 20-variable scenario with source tags, calibration notes, and “if wrong” annotations on every ASSUMED item. For the analyst who built it, this is valuable — they can see exactly where to direct attention. For the executive who needs to make a decision based on it, it’s a wall of qualifications that obscures the bottom line.

Both perspectives are legitimate. The solution isn’t to choose one over the other — it’s to serve both with different views of the same underlying data.

Solution: Layered Presentation

The tagged scenario should exist in at least two layers:

Layer 1 — Decision Summary: One page. Key conclusions, key numbers, key risks. No tags in the running text. Instead, a single “Confidence Profile” section at the bottom:

This scenario rests on 14 verified data points, 6 stated assumptions, and 3 projections. Two data gaps remain unresolved (market-specific CAC, regulatory timeline). The assumption with the highest downstream impact is [X] — if wrong by 50%, projected revenue shifts from €310K to €180K.

That’s the executive view: how much of this is solid, how much is uncertain, and what specifically could break it.

Layer 2 — Full Tagged Analysis: The complete output with all provenance tags, calibration notes, gap labels, and sensitivity analysis. This is the working document. It’s what the analyst uses, what the reviewer signs off on, and what gets archived. It’s the audit trail.

The relationship between the layers is like the relationship between a financial statement and its footnotes. The statement tells you the numbers; the footnotes tell you what the numbers rest on. Both exist. Different readers use different layers.

How Non-Experts Read Tags

For teams where not everyone is fluent in the tagging system, simplify the visual language. Three colors work better than three acronyms:

VERIFIED → presented as normal text (no special marking needed — it’s the baseline)
ASSUMED → highlighted or marked with a distinct visual cue (e.g., italic, a colored sidebar, or a simple ⚠ symbol)
DATA GAP → presented as an explicit blank with a brief note

The core message non-experts need to internalize is simple: unmarked text is grounded; marked text is uncertain; blanks are honest. That’s a ten-second briefing. If someone can read a weather forecast that distinguishes “current temperature” from “tomorrow’s forecast,” they can read a tagged scenario.

When to Reduce Tagging

Not every use case needs full provenance. The right level of tagging depends on the stakes:

Stakes	Tagging Level	Example
Low	Tag only gaps	Internal brainstorming, early-stage ideation
Medium	Tag gaps + assumptions	Project proposals, budget drafts, team planning
High	Full tagging + calibration	Board presentations, investment decisions, regulatory submissions

For a casual strategy brainstorm, requiring VERIFIED/ASSUMED/PROJECTED on every line would kill the creative flow. For a €2M investment decision going to the board, anything less than full tagging is irresponsible. Match the framework’s intensity to the decision’s consequences.

The Framework Maturity Model

Putting it all together, organizations adopting the three-rule framework can think of implementation in three stages:

Stage 1 — Individual Practice: One person uses the three-rule prompt in their own AI conversations. Tagged output stays in their workspace. Value: personal quality control. Cost: near zero.

Stage 2 — Team Standard: The prompt is embedded in shared AI workspaces (Claude Projects, custom GPTs). Templates enforce the table format. Assumptions get informal peer review. Value: consistent quality across a team. Cost: template creation, brief training.

Stage 3 — Organizational Governance: The framework is integrated into planning processes. Assumption review is a formal workflow step. Calibration (reference class, sensitivity, pre-mortem) is standard practice. Decision packages separate confidence layers. Value: systematic risk reduction. Cost: process change, cultural shift.

Most teams should start at Stage 1 and see results immediately. Whether to progress to Stage 2 or 3 depends on how much is at stake when AI-generated scenarios inform real decisions. The higher the stakes, the more the governance investment pays for itself.

Limitations and Known Gaps

The three-rule framework is a practitioner pattern, not a peer-reviewed method. It deserves the same critical scrutiny it asks users to apply to AI output. Here are the things it doesn’t solve — and the ways it can be misused.

1. Not empirically validated

There are no controlled experiments, before/after error-rate measurements, or user studies behind this framework. Research shows that provenance tagging and structured prompting can reduce hallucinations — sometimes significantly — but this has been demonstrated for specific tagging schemes under controlled conditions, not for the exact VERIFIED / ASSUMED / PROJECTED pattern proposed here. Treat the framework as an engineering heuristic that probably helps in many cases, not as something whose effectiveness you can assume without measuring on your own use cases. If you adopt it, track whether it actually improves your outputs.

2. The prompt is one lever, not the only lever

The framework leans heavily on prompt design as the primary mechanism for controlling model behavior. In practice, prompts can reduce hallucinations, but models still violate instructions under pressure — especially when optimization, reward models, or fine-tuning push toward fluency and completeness. For production systems, prompt-level rules should be complemented by architecture-level controls: retrieval-augmented generation (RAG) to ground outputs in actual data, rule-based filters to catch unsupported claims, abstention mechanisms that refuse to generate when confidence is low, and human review workflows. The prompt is the user-accessible lever. It is not the only lever, and in high-stakes deployments, relying on it alone is fragile.

3. VERIFIED means “sourced,” not “infallible”

The framework’s tag hierarchy implies a confidence gradient: VERIFIED = solid, ASSUMED = fragile, PROJECTED = derived. But “verified” data can itself embed significant problems. Historical figures can reflect measurement error. Market data can encode vendor assumptions or sampling bias. Financial actuals can be non-stationary — a Q4 2024 revenue figure may be misleading for Q4 2026 projections in a post-shock market. The framework tracks provenance (where did this number come from?) but not quality (is this number still a reliable guide?). Users should resist the temptation to treat VERIFIED as “settled.” Data fundamentalism — assuming that sourced data is correct data — is a different failure mode than hallucination, but it can drive equally bad decisions.

4. Tags expose inputs, not structural validity

A scenario can be perfectly tagged — every number sourced, every assumption labeled, every gap flagged — and still be fundamentally misleading because the underlying causal model is wrong. Treating customer churn as independent of pricing. Ignoring feedback loops between marketing spend and brand perception. Assuming linear scaling where the real dynamics are nonlinear. The framework catches factual hallucinations (wrong inputs) but not structural errors (wrong model of how the inputs relate). The calibration methods described earlier — sensitivity testing, pre-mortem — partially help by stress-testing individual assumptions, but they test assumptions in isolation, not the relationships between them. ABP and scenario planning literature emphasize structural thinking, exploration of alternative logics, and the “world of no broken assumptions” as a reference scenario. This framework focuses on tagging and gap flagging, not on the quality of the mental model. A well-tagged bad model is still a bad model.

5. Labels don’t expose whose assumptions are being encoded

The categories VERIFIED / ASSUMED / PROJECTED can give a veneer of objectivity that hides power dynamics. Management may encode optimistic growth targets as ASSUMED without revealing the political pressure behind the number. A vendor’s market-size estimate tagged as VERIFIED may embed that vendor’s commercial interests. An analyst’s PROJECTED calculation may use a model that reflects institutional bias toward certain outcomes. The framework does not require the model (or the human) to reveal whose assumptions are being encoded or how they were generated. In organizational contexts, this matters: the question isn’t just “is this sourced or assumed?” but “whose interests shaped this assumption?” The framework doesn’t answer that question — and claiming it does would be a form of the same false confidence it’s designed to prevent.

6. Too many gaps can paralyze decisions

The framework explicitly penalizes guessing and encourages the model to flag [DATA GAP] and [ASSUMPTION GAP] at every opportunity. In high-uncertainty domains — which is most strategic planning — this can produce outputs dominated by gaps and caveats. ABP literature stresses that some assumptions must be made “for planning purposes” or planning cannot proceed. The stakes-based scaling table earlier in this post partially addresses this (brainstorming gets light tagging, board decisions get full tagging), but the underlying tension remains: the framework promotes a norm where “silent invention is worse than flagged uncertainty” without explicitly discussing when too much uncertainty signaling undermines decision-making. In a corporate context, if every plan is filled with prominent warnings, managers may either ignore the warnings as boilerplate or become overly cautious and delay needed decisions. Match the framework’s intensity not only to the decision’s stakes but also to the organization’s risk appetite and decision timeline.

7. Domain-specific adaptation required

The series claims the framework is portable across domains — document extraction, worldbuilding, business scenarios, cybersecurity, scientific writing. But those domains have very different stakes, epistemic structures, and regulatory environments. In medicine, tagging something as ASSUMED is far from sufficient to make it safe — existing guidance requires retrieval-augmented generation, external verification, and human oversight. In legal work, a custom label scheme might conflict with established citation standards or be misinterpreted by courts. In regulated industries, compliance frameworks may have their own provenance requirements that the three-rule labels don’t map onto. The general pattern provides a starting structure; domain-specific adaptation and validation are required before relying on it in regulated or high-stakes environments. The domain-specific posts in this series (cybersecurity, scientific writing) are first steps in that adaptation, not finished products.

These limitations don’t invalidate the framework — they bound it. The three rules are a significant improvement over the default (no provenance, no gap flagging, no penalty for guessing), but they are not a complete solution. They’re the beginning of a practice, not the end of one.

Sources and Further Reading

Kahneman, D. & Tversky, A. (1979): “Prospect Theory: An Analysis of Decision under Risk.” The foundational work on cognitive biases in decision-making, including the inside view vs. outside view distinction.
Flyvbjerg, B. (2008): “Curbing Optimism Bias and Strategic Misrepresentation in Planning: Reference Class Forecasting in Practice.” The definitive paper on using the outside view to correct planning forecasts.
Cantarelli, C.C. et al. (November 2025): “Reference Class Forecasting: Promises, Problems, and a Research Agenda Moving Forward.” Systematic review of RCF covering 2001–2025.
Klein, G. (2007): “Performing a Project Premortem.” Harvard Business Review. The pre-mortem technique for surfacing failure modes before they occur.
Dewar, J.A. (2002): “Assumption-Based Planning: A Tool for Reducing Avoidable Surprises.” Cambridge University Press / RAND.
Lambdin, C. (2024): “Assumption-Based Planning.” On the “ghost scenario” and load-bearing assumptions.
Ramírez, R. et al. (December 2025): “A Faster Way to Build Future Scenarios.” MIT Sloan. On AI-assisted scenario planning and surfacing unexamined assumptions.
Previous posts in this series:
Post 1: AI Honesty
Post 2: Worldbuilding
Post 3: Scenario Building

April 17, 2026

The Three-Rule Framework for Scenario Building

The Three-Rule Framework: From Document Extraction to Business Scenario Building

This is the third post in a series about a small set of prompt rules with a surprisingly wide reach.

In the first post, I showed how three rules — Force Blank, Penalize Guessing, Show the Source — stop AI from silently guessing when extracting data from contracts and invoices. In the second post, I adapted them for alternate history worldbuilding, where the same rules keep lore consistent and real history accurate.

This post takes the final step: generalizing the three rules into a framework that works for business scenario building — strategic planning, KPI development, project risk assessment, financial modeling, market entry analysis, and any other context where you’re using AI to think about the future.

Why Scenario Building Is Vulnerable to the Same Problem

Scenario planning has a long intellectual history. RAND developed Assumption-Based Planning (ABP) for the U.S. Army in the 1990s. Shell pioneered corporate scenario planning in the 1980s under Peter Schwartz. The Oxford Scenario Planning Approach, described in a December 2025 MIT Sloan article, now integrates generative AI into the process itself.

All of these methodologies share a core principle: make your assumptions explicit. RAND defines an assumption as “an assertion about some characteristic of the future that underlies the current operations or plans of an organization.” Every plan has them. Most are invisible. The ones that stay invisible are the ones that cause failures.

Now consider what happens when you hand your business data to an AI and ask it to build a scenario. The model does exactly what it does with contracts and fiction: it fills gaps. Revenue growth for Q4? The model picks a plausible number. Competitive response to your market entry? The model invents one. Timeline for regulatory approval? The model estimates. Customer churn under the new pricing? The model generates a figure.

Every one of these is an assumption. None of them are labeled as such. The scenario reads like a coherent analysis, backed by data — but some of the “data” is real, some is derived, and some was fabricated to make the narrative hold together. You can’t tell which is which.

This is the same problem in its third incarnation. And it responds to the same three rules.

The General Pattern

Across three domains, the same structure repeats:

Domain	Canon (Source of Truth)	Source Tags	Gap Labels
Document extraction	The document	EXTRACTED / INFERRED	BLANK
Worldbuilding	Real history + Your lore	HISTORY / LORE-ESTABLISHED / LORE-INFERRED	HISTORICAL GAP / LORE GAP
Scenario building	Verified data + Established constraints	VERIFIED / ASSUMED / PROJECTED	DATA GAP / ASSUMPTION GAP

The underlying logic is always the same: distinguish what is known from what is invented, and make the boundary visible.

For business scenarios, the “canon” has two layers — just like worldbuilding:

Verified data — things you know from actual measurements: last year’s revenue, current headcount, signed contracts, measured KPIs, market data from credible sources
Established constraints — things that are decided, not speculated: budget limits, regulatory requirements, contractual deadlines, board-approved targets

Everything else — market growth estimates, competitive behavior, customer adoption rates, technology readiness timelines — is an assumption. And assumptions come in two flavors: ones you’ve thought about and can defend (even if uncertain), and ones the AI just made up because the scenario needed a number.

The three rules exist to separate these categories.

The Three Rules for Business Scenario Building

Rule 1: Force Blank → Flag Unknown Variables

When the AI encounters a variable it doesn’t have data for, it should say so — not invent a plausible value.

The gap labels for business scenarios split into two types:

[DATA GAP] — a factual input the scenario needs but that hasn’t been provided or isn’t available. Example: “This projection requires customer acquisition cost (CAC) for the DACH region; no data was provided.”
[ASSUMPTION GAP] — a strategic or behavioral assumption the scenario relies on but that hasn’t been explicitly stated. Example: “This scenario assumes competitor X will not lower prices in response. This assumption has not been validated.”

This is where RAND’s ABP framework and the three rules converge most directly. Dewar and his colleagues at RAND argue that every plan has a “ghost scenario” — the implicit, unstated set of assumptions about the future to which the plan is suited. The most dangerous assumptions are the ones nobody realized they were making. Forcing the AI to flag gaps is a practical way to surface the ghost scenario.

Rule 2: Penalize Guessing → A Silent Assumption Is Worse Than a Known Unknown

The business version of “a wrong answer is 3× worse than a blank” is this:

A hidden assumption baked into the scenario is worse than an explicitly flagged uncertainty. When you don’t have data, flag the gap — don’t fill it with a plausible number.

Why is this more dangerous in scenarios than in document extraction? Because scenarios compound. A single unflagged assumption about market growth feeds into revenue projections, which feed into headcount planning, which feeds into budget allocation, which feeds into board presentations. By the time the assumption fails, six months of planning has been built on top of it.

ABP calls these “load-bearing assumptions” — the ones whose failure would require fundamental changes to the plan. The three-rule framework surfaces them before they bear load.

Rule 3: Show the Source → VERIFIED / ASSUMED / PROJECTED

Every number, every trend, every behavioral claim in the scenario gets one of three tags:

(VERIFIED) — based on actual data you’ve provided: financial reports, signed contracts, measured KPIs, credible third-party research with a citation
(ASSUMED) — a belief about the future that the scenario relies on but that could be wrong. The model must state the assumption explicitly: “Assumes 15% annual growth in segment X, consistent with 2023–2025 trend”
(PROJECTED) — a value derived or calculated from verified data and stated assumptions. The model must show the derivation: “Projected from Q1–Q3 actuals at current run rate”

The critical distinction between ASSUMED and PROJECTED: an assumption is a belief you bring to the scenario; a projection is a calculation the model performs using your data and assumptions as inputs. Assumptions can be challenged (“what if growth is 5% instead of 15%?”). Projections can be audited (“show me the calculation”).

This maps directly onto what scenario planning practitioners call “sensitivity analysis”: identifying which assumptions the scenario’s conclusions are most sensitive to. With source tags in place, you can immediately see which conclusions rest on verified data (stable) and which rest on assumptions (fragile). That’s where your attention should go.

The Combined Prompt

Here is the full framework as a system prompt. Replace the bracketed placeholders with your specific context.

You are my scenario planning analyst. We are building a [TYPE: business plan / market analysis / project risk assessment / KPI framework / budget scenario] for [CONTEXT: company, project, product, market].

Your task is to produce analysis that is transparent about what it knows, what it assumes, and what it doesn’t know. Follow these rules strictly:

Rule 1 — Flag unknown variables:
• If the scenario requires data that has not been provided, do not invent a plausible value. Use [DATA GAP: description of what data is missing and why it matters].
• If the scenario relies on a strategic or behavioral assumption that has not been explicitly validated, flag it with [ASSUMPTION GAP: description of the unstated assumption].

Rule 2 — Do not fill gaps silently:
• A hidden assumption baked into the analysis is worse than an explicitly flagged uncertainty.
• When data is missing, flag the gap. Do not generate a plausible-sounding number.
• When an outcome depends on assumptions about competitor behavior, market dynamics, regulatory decisions, or customer response, state the assumption explicitly rather than embedding it as fact.

Rule 3 — Source labeling:
Tag every significant claim, number, or conclusion with its source:
• (VERIFIED) — based on actual data I provided or credible, cited third-party data
• (ASSUMED) — a belief about the future that the scenario depends on. State the assumption and what it is based on.
• (PROJECTED) — calculated or derived from verified data and stated assumptions. Show or describe the derivation.
• For every ASSUMED tag, briefly state what would change if the assumption is wrong.

Example: Market Entry Scenario

Here’s what the output looks like when the rules are active. Imagine asking the AI to assess a SaaS product launch in a new market:

Variable	Value	Source	Note
Current ARR	€2.4M	VERIFIED	Q4 2025 financial report
Target market size (DACH)	€340M	VERIFIED	Gartner 2025 report, cited
Year 1 market share	—	DATA GAP	No comparable launch data provided for this segment
CAC (DACH region)	—	DATA GAP	Current CAC is for US market only; DACH acquisition costs not provided
Pricing model	€49/seat/month	VERIFIED	Board-approved pricing decision, March 2026
Competitor response	No price reduction	ASSUMED	Assumes incumbent competitor maintains current pricing. If they discount 20%, projected margin drops from 68% to ~51%
Year 1 revenue projection	€180K–€420K	PROJECTED	Range based on 30–70 enterprise seats at stated pricing. Lower bound assumes no channel partners; upper bound assumes 2 reseller agreements (ASSUMPTION GAP: no reseller discussions confirmed)

Compare this to what the same model would produce without the rules: a single confident revenue projection of €310K, a specific market share percentage, an assumed CAC that looks like data, and no indication of which numbers are real and which are invented.

The tagged version takes thirty seconds longer to read. It saves weeks of planning on false foundations.

Applications Beyond Strategy

The same framework adapts to any structured planning context:

KPI Development: When defining KPIs for a new initiative, tag each target as VERIFIED (based on historical baseline), ASSUMED (based on industry benchmarks or management expectations), or PROJECTED (calculated from verified inputs). Flag any KPI that lacks a reliable baseline with [DATA GAP]. This prevents the common failure mode where AI-generated KPI dashboards contain a mix of real metrics and invented benchmarks with no way to tell them apart.

Project Risk Assessment: For each identified risk, tag the probability and impact as VERIFIED (based on historical incident data), ASSUMED (based on expert judgment or analogy), or PROJECTED (derived from a model). Flag risks where neither data nor expert input exists with [ASSUMPTION GAP]. The result is a risk register that honestly distinguishes evidence-based risks from plausible-sounding guesses.

Budget Scenarios: Tag every line item. Fixed costs from signed contracts are VERIFIED. Headcount-dependent costs using planned hiring are PROJECTED (with the hiring plan as a stated assumption). Revenue-dependent items are ASSUMED if revenue targets haven’t been validated against pipeline data. The budget becomes a map of its own confidence levels.

Competitive Analysis: Every claim about a competitor’s strategy, pricing, or market position should be tagged. Public financial data is VERIFIED. Inferences from job postings or patent filings are PROJECTED. Assumptions about their future moves are ASSUMED — with an explicit “if wrong” note. This prevents the common scenario planning failure where competitive intelligence is a blend of hard data and conjecture, presented uniformly as fact.

The Framework as a Pattern

Looking across all three posts, the general framework can be stated in one paragraph:

When using AI in any domain where fidelity to sources matters, apply three rules: (1) give the model explicit permission to not-know, with labeled gaps; (2) make the cost of silent invention higher than the cost of flagged uncertainty; (3) require every claim to carry a provenance tag showing whether it comes from verified source material, from stated assumptions, or from the model’s own inference. The specific labels change by domain, but the structure is universal.

	Rule 1: Force Blank	Rule 2: Penalize Guessing	Rule 3: Show Source
Extraction	BLANK + Reason	Wrong answer 3× worse	EXTRACTED / INFERRED
Worldbuilding	HISTORICAL GAP / LORE GAP	False invention worse than gap	HISTORY / LORE-ESTABLISHED / LORE-INFERRED
Scenarios	DATA GAP / ASSUMPTION GAP	Hidden assumption worse than known unknown	VERIFIED / ASSUMED / PROJECTED
General	[GAP: type + explanation]	Silent invention > flagged uncertainty	SOURCE / DERIVED / INFERRED

That bottom row is the portable version. It works for legal research, medical summarization, academic literature review, code refactoring, translation — any task where you need the model to be useful without being dishonest.

RAND’s James Dewar wrote in 2002 that every plan has a “ghost scenario” — the unstated set of assumptions about the future to which the plan is unconsciously suited. The three-rule framework is, in essence, a ghost-scenario detector. It forces the invisible to become visible, whether the plan is a vendor contract, a fictional universe, or a five-year business strategy.

The models are getting smarter every quarter. Making them honest is still up to us.

Sources and Further Reading

Dewar, J.A. et al. (1993/2002): “Assumption-Based Planning.” RAND Corporation. The foundational methodology for identifying, testing, and planning around critical assumptions. Overview at MindTools.
Lambdin, C. (2024): “Assumption-Based Planning.” Excellent deep-dive into ABP with the “ghost scenario” concept.
Ramírez, R. et al. (December 2025): “A Faster Way to Build Future Scenarios.” MIT Sloan Management Review. On integrating generative AI into the Oxford Scenario Planning Approach.
Schwartz, P. (1991): “The Art of the Long View: Planning for the Future in an Uncertain World.” The foundational text on corporate scenario planning.
Previous posts in this series:
ChatGPT and Claude Got Smarter. Not More Honest. — The original three rules for document extraction.
From Contract Extraction to Alternate History — Adapting the rules for worldbuilding.

April 11, 2026

From Document Extraction to Alternate History: Why the Three Honesty Rules Work for Worldbuilding Too

A few weeks ago I wrote about three prompt rules that stop AI from guessing when extracting data from documents. The rules — Force Blank, Penalize Guessing, Show the Source — were designed for mundane business problems: contracts with contradictory clauses, meeting notes with ambiguous commitments, invoices with missing fields.

But the more I used them, the more I noticed something: the same rules solve an entirely different problem — one that has nothing to do with business documents.

They solve worldbuilding.

The Problem: AI as a Continuity Editor

Anyone who has tried to use an LLM for sustained creative work knows the pattern. You’re building an alternate history, a fantasy setting, a science fiction universe, a tabletop RPG campaign. You’ve written hundreds of pages of lore. You hand it to Claude or ChatGPT and ask a question about how your fictional world works.

And the model invents something.

It creates a faction that doesn’t exist. It attributes a technology to the wrong era. It “remembers” a character who was never in your notes. It confidently places a fictional event in a real historical period and gets the real history wrong while doing so. The output sounds plausible, internally consistent, beautifully written — and it contradicts everything you’ve built.

This is the same structural problem I described in the earlier post, just in a different domain. The model is trained to produce complete, coherent output. When your lore has a gap, the model fills it — because filling gaps is what it was optimized to do. Whether the gap is “what are the payment terms in section 4” or “what happened in the Imperial Senate after the divergence point,” the instinct is identical: make something up that sounds right.

Researchers have a term for this in the fiction context: “character hallucination” (Wu et al., 2024) — when an AI playing a role violates the established identity of that role. The IJCAI 2025 tutorial on LLM role-playing calls the broader challenge “controlled hallucination”: the model must invent creatively within the established rules of a fictional world, while rigorously refusing to invent things that contradict those rules. The line between productive creativity and lore-breaking confabulation is exactly the line the three rules are designed to draw.

The Adaptation: Worldbuilding Has Two Canons, Not One

In contract extraction there’s one source of truth: the document. Extract what’s there, flag what isn’t, don’t invent.

In alternate history, there are two sources of truth operating simultaneously:

Real history — everything that happened in our world before the story diverges from it
Your lore — everything you’ve established about what happens after the divergence

Both are canonical. Both are places the AI must not invent. And the boundary between them is sharp: the “point of divergence” (POD), the moment at which your fictional timeline breaks from real history.

Before the POD, the AI must be a historian. It can reference real people, real technologies, real battles, real events — but only things that actually happened. Inventing a battle that didn’t happen or a person who didn’t exist is as bad as making up a contract clause.

After the POD, the AI must be a continuity editor. Only the things established in your lore exist. Everything else is a gap — and gaps should be labeled, not filled.

This is where the three rules come in, almost unchanged.

The Three Rules, Adapted

Rule 1: Force Blank → Label the Gaps

In document extraction, the model leaves a field BLANK when the data is missing and explains why. In worldbuilding, the same principle applies with two labels instead of one — because there are two types of gap:

[HISTORICAL GAP] — for events before the point of divergence that the model isn’t certain about. Don’t invent a Roman consul’s biography; flag the gap.
[LORE GAP: no established specification] — for developments after the point of divergence that your lore hasn’t addressed yet. Don’t invent a new faction, technology, or major event; flag the gap.

The crucial move is the same as before: give the model explicit permission to not-know. Without this permission, the model’s completion instinct will override its uncertainty detection, and you’ll get confidently-written hallucinations that feel like canon but aren’t.

Rule 2: Penalize Guessing → A False Invention Is Worse Than a Gap

The business version of this rule says: “A wrong answer is 3× worse than a blank. When in doubt, leave it blank.”

The worldbuilding version is even more forceful, because the consequences are worse. A wrong payment term on a spreadsheet gets corrected. A wrong lore detail, accepted into your canon because it sounded right, can poison hundreds of hours of subsequent writing. Every future reference builds on it. Every character interacts with it. By the time you catch it, it’s woven through your world.

So the rule becomes:

A false invention is worse than acknowledging a gap in the worldbuilding.

No multiplier needed. The asymmetry is total. In creative work, a gap is a prompt to expand your lore on your own terms. A bad invention is a bug that ships.

Rule 3: Show the Source → Three Provenance Tags Instead of Two

In document extraction, every value is either EXTRACTED (directly from the source) or INFERRED (calculated or derived). In worldbuilding, you need three tags because you have two canonical sources plus your own extrapolation:

(HISTORY) — real historical fact from before the point of divergence
(LORE-ESTABLISHED) — stated exactly this way in your source lore
(LORE-INFERRED) — a logical consequence the model is drawing from your lore, with a one-sentence justification

The third tag is where the magic happens. You want the model to extrapolate — that’s what makes it useful for worldbuilding. An established technology must have consequences; an established faction must interact with other factions; an established event must have ripple effects. But you want those extrapolations flagged, so you can review them and decide whether they fit your vision. A flagged inference you disagree with takes thirty seconds to correct. An unflagged inference that quietly becomes canon takes hours to untangle three sessions later.

The Combined Prompt

Here is the full adaptation, structured as a system prompt you can paste into any long-running chat about your fictional world. Replace the bracketed placeholders with your own setting.

We are building an alternate timeline that begins in [YEAR] with [CHANGE / POINT OF DIVERGENCE]. You are my historian and continuity editor for this alternate-history universe. Your task is to produce texts, responses, and lore concepts that are absolutely free of contradiction.

The primary rule (the Point of Divergence): The year of divergence is [YEAR].

Rule 1 — BEFORE the Point of Divergence (strict history):
• Everything that happened before this date must correspond 100% to real, verifiable Earth history.
• Do not invent historical persons, technologies, battles, or events.
• If you are not certain of a historical detail, do not invent one. Use the placeholder [HISTORICAL GAP] instead.

Rule 2 — AFTER the Point of Divergence (strict lore canon):
• Everything that happens after this date must be based exclusively on lore texts I provide.
• Do not invent new factions, main characters, major events, or fundamental technologies that are not established in my texts.
• If asked about developments my lore does not specify, respond with [LORE GAP: no established specification]. A false invention is worse than acknowledging a gap in the worldbuilding.

Rule 3 — Source and logic labeling:
To keep the worldbuilding clean, mark in parentheses at the end of each paragraph or for each significant claim where the information comes from:
• (HISTORY) for real historical facts before the point of divergence
• (LORE-ESTABLISHED) for facts stated exactly this way in my texts
• (LORE-INFERRED) for logical conclusions drawn from my lore (e.g., how an established technology affects daily life). When inferring, briefly explain what you are drawing the inference from.

Plug in the year, plug in the divergence event, attach your lore documents, and you have a continuity editor that actively refuses to lie to you.

What This Enables

The workflow change is significant. Without these rules, every AI-generated paragraph needs to be cross-checked against both real history and your own notes — which nobody actually does, which means errors accumulate silently. With the rules, your attention goes exactly where it should: to the gaps (where you get to decide what your world does next) and to the inferences (where you get to approve or override the model’s extrapolation).

A few observations from applying this in practice:

The gaps are often the most interesting output. When the model flags [LORE GAP] for something, that’s the moment you realize your lore has a hole — and often, that hole is exactly the next thing you should develop. The model isn’t failing to answer; it’s telling you where your world needs more work.

Inferences reveal your lore’s implications. A well-labeled (LORE-INFERRED) paragraph often surfaces consequences you hadn’t thought through. “You established that faction X controls the trade route in Y; inferring, this would mean port city Z becomes economically dependent, which suggests tension with neighbor W.” That’s useful even if you reject the specific extrapolation — it shows you a logical consequence of your own setup.

Real history keeps the fiction grounded. Alternate history works best when the “before” is accurate. If your timeline diverges in 1914 and the model gets the pre-1914 world wrong, the whole divergence loses meaning. Forcing (HISTORY) labels — and forcing the model to flag [HISTORICAL GAP] when it’s uncertain — keeps the foundation solid.

The Deeper Pattern

What I find striking is that the same three rules work across two domains that seem to have nothing in common. Business document extraction and creative worldbuilding share no vocabulary, no audience, no workflow. But they share a structure: in both cases, the user needs the AI to distinguish between what is established and what is invented, and to flag the boundary clearly.

That structural similarity is worth taking seriously. It suggests the three rules aren’t really about contracts or fiction specifically — they’re about the general problem of using AI in any context where fidelity to a source matters more than fluency of output. Legal research. Code refactoring against a style guide. Historical research. Medical summarization. Translation against a glossary. Technical writing against a spec. Academic literature review.

In each of these, the AI’s default behavior — produce a confident, complete, coherent answer — works against the user’s actual need, which is to know which parts of the output are grounded and which are the model’s own contribution. Force Blank gives it permission to not-know. Penalize Guessing changes the calculus in favor of honesty. Show the Source makes the boundary between source and invention visible.

Three rules. Two sentences each. Apply everywhere fidelity matters.

The alternate history version is just one adaptation. I’d be curious what other domains this pattern fits — if you find one, I’d love to hear about it.

Sources and Further Reading

Wu et al. (2024): “RoleBreak: Character Hallucination as a Jailbreak Attack in Role-Playing Systems.” Paper defining character hallucination as violation of role identity.
IJCAI 2025 Tutorial: “LLM-based Role-Playing from the Perspective of Hallucinations.” Introduces the concept of “controlled hallucination” — creative invention constrained by scenario-specific rules.
Previous post: “ChatGPT and Claude Got Smarter. Not More Honest.” The original three rules for document extraction.
Panickssery, N. (2025): “Why do LLMs hallucinate?” On why hallucination is the default behavior of base models and requires active training or prompting to suppress.
Bicking, I. (2023–2025): “Creating Worlds with LLMs.” Series of essays on worldbuilding with LLMs, including the tension between consistency and surprise.
DiGRA (2025): “Reconceptualizing LLM-Induced Hallucinations as Game Design Features.” On when hallucinations enhance versus break i

March 31, 2026

Three Prompt Rules That Stop AI From Guessing — And the Science Behind Them

Every new model generation arrives with fanfare: better benchmarks, higher accuracy scores, more impressive demos. GPT-5 reasons through complex problems. Claude plans ahead when writing poetry. Gemini processes images and video with startling fluency. The intelligence curve keeps climbing.

But there’s a second curve that rarely makes the keynote slides — the honesty curve. And it’s barely moved.

This isn’t a vague philosophical complaint. It’s a structural problem baked into how these models are trained, evaluated, and deployed. And it’s one that hits hardest in exactly the kind of work where people increasingly rely on AI: extracting data from contracts, parsing invoices, summarizing meeting notes, building CRM records from messy inputs.

This post unpacks why the intelligence-honesty gap exists, what the latest research tells us about its causes, and — most practically — three prompt rules you can apply today to force AI to be honest about what it doesn’t know.

The Gap: Intelligence vs. Honesty

When we say a model “got smarter,” we usually mean it scores higher on benchmarks — math competitions, coding challenges, multi-step reasoning tasks. These are real improvements. But benchmark scores measure a model’s ability to produce correct answers. They don’t measure a model’s willingness to say “I don’t know.”

In fact, the incentive structure actively punishes honesty.

In September 2025, OpenAI published a research paper that made this problem precise. The team — including researchers from Georgia Tech — examined major AI benchmarks and found that the vast majority use binary grading: either the answer is correct and gets a point, or it’s wrong and gets zero. Crucially, abstaining — saying “I don’t know” — also gets zero. The mathematical consequence is straightforward: guessing always has a higher expected score than abstaining. A model that bluffs on every uncertain question will rank higher than one that honestly declines.

OpenAI’s own blog post put it plainly: the situation is like a multiple-choice test where leaving an answer blank guarantees a zero, but guessing at least gives you a chance. Under those rules, the rational strategy is to always guess — even when you have no idea. And that’s exactly what the models learn to do.

The paper demonstrated this with a striking example: when asked for the PhD dissertation title of one of its own co-authors, a widely-used model confidently produced three different titles across three attempts. All three were wrong. It did the same with his birthday — three dates, all incorrect, all delivered with unwavering confidence.

This isn’t a bug that can be patched. It’s the natural outcome of optimizing for accuracy-only metrics. As the OpenAI researchers argue, the mainstream benchmarks and leaderboards need to be redesigned to penalize confident errors more heavily than uncertainty. Until that happens, every model that climbs the leaderboard does so in part by learning to bluff better.

Why Models Confabulate: Insights from Interpretability Research

The OpenAI paper explains the incentive problem. But what happens mechanically inside the model when it makes something up?

Anthropic’s interpretability research — published in March 2025 under the title “Tracing the Thoughts of a Large Language Model” — provides some of the most detailed answers we have. Using what they describe as a “microscope” for AI, Anthropic’s team traced the internal circuits that activate when Claude processes a question. It’s worth noting that these findings are specific to Claude 3.5 Haiku — other model families may handle uncertainty through different internal mechanisms — but the patterns are likely general enough to be instructive.

One of their most revealing discoveries involves what we might call a default refusal mechanism. In Claude, refusing to answer is actually the default behavior: the researchers found a circuit that is “on” by default and causes the model to state it has insufficient information. But when the model recognizes a “known entity” — say, Michael Jordan the basketball player — a competing set of features fires up and suppresses this default circuit, allowing the model to respond.

The problem arises when this mechanism misfires. If the model recognizes a name but doesn’t actually know the relevant facts, the “known entity” signal can still override the “I don’t know” circuit. The result: a confident, detailed, completely fabricated answer. In one experiment, the researchers used a person named Michael Batkin — someone unknown to the model, who by default triggered a refusal. But when they artificially activated the “known entity” features or inhibited the “can’t answer” features, Claude promptly — and consistently — hallucinated that Batkin was famous for playing chess.

Even more unsettling: Anthropic found evidence that when Claude can’t easily compute an answer (say, the cosine of a large number), it sometimes engages in what philosopher Harry Frankfurt would call bullshitting — producing an answer without any internal evidence of the calculation actually occurring. Despite claiming to have run the math, the interpretability tools revealed no trace of any computation. When given a hint about what the answer should be, Claude worked backwards, constructing plausible-looking intermediate steps that lead to the hinted answer — a textbook case of motivated reasoning.

These findings matter because they show that the honesty problem isn’t just about training incentives. The models have internal mechanisms that are supposed to catch uncertainty — but those mechanisms can be overridden by other pressures, including the drive toward grammatical coherence and the pattern-matching instinct to fill in gaps.

Automation Bias: Why This Matters More Than You Think

All of this would be merely academic if people treated AI output with appropriate skepticism. They don’t.

Automation bias — the tendency to over-rely on automated recommendations — is one of the most thoroughly documented phenomena in human-computer interaction research. A 2025 systematic review published in AI & Society analyzed 35 peer-reviewed studies spanning healthcare, finance, national security, and public administration. The pattern was consistent across domains: when an AI system delivers a confident answer, people accept it. They check less. They override their own judgment.

A randomized clinical trial conducted with AI-trained physicians in Pakistan (published as a preprint in August 2025) made the dynamic especially clear. Even doctors who had completed 20 hours of AI-literacy training — including instruction on how to critically evaluate AI output — were vulnerable to automation bias when exposed to erroneous LLM recommendations. The training helped, but it didn’t eliminate the problem. Confident-sounding AI output has a gravitational pull that’s difficult to resist, even when you know to look for errors.

The real-world consequences are already visible. In February 2024, Air Canada was ordered to pay damages to a customer after a support chatbot — not a large language model, but an AI system nonetheless — hallucinated a bereavement fare policy that didn’t exist. The chatbot confidently told the customer they could retroactively request a discount within 90 days of purchase. The actual policy allowed no such thing. But the system stated it with such authority that the customer relied on it to make a financial decision. The underlying technology differed from today’s LLMs, but the dynamic was identical: confident AI output, uncritical human acceptance.

In an operations context, the failure modes are subtler but no less damaging. Consider a contract with payment terms mentioned on page 8 and page 14 — and the two pages say different things. A human reviewer might catch the discrepancy. An AI, asked to extract the payment terms, will pick one and move on. It won’t mention the conflict. It won’t flag the ambiguity. It will fill the cell in your spreadsheet with “Net 30” and give you no indication that page 14 says “Net 45.”

Meeting notes are another minefield. “Let’s circle back next week” becomes a specific date and a named owner in the AI’s summary — details that nobody actually stated, but that the model invented to produce a clean, actionable output.

The pattern is the same across invoices, insurance documents, lease agreements, vendor scoring, CRM data entry: wherever AI is used to extract structured information from messy sources, the model’s instinct to fill every field works directly against the user’s need to know which fields are uncertain.

Three Prompt Rules That Change the Incentive

These three problems — training incentives that reward guessing, internal mechanisms that can override uncertainty detection, and human psychology that accepts confident output at face value — come from different research streams. But they converge on the same practical conclusion: by default, AI will guess rather than admit ignorance, and people will trust the guess.

You can’t fix the training pipeline. You can’t redesign the benchmarks. But you can change the local incentive structure inside the conversation. The following three rules — adapted from a practical framework by D-Squared — do exactly that. They work because they explicitly reverse the default dynamic: instead of rewarding completeness, they reward honesty about uncertainty. Note that the effectiveness of these techniques may vary across model families — they’ve been tested primarily with ChatGPT and Claude, and other models may respond differently.

Rule 1: Force Blank + Explain

The single most effective change you can make is to explicitly instruct the model to leave fields blank when the data is ambiguous, missing, or unclear — and to explain why.

Without this rule, every field gets filled. With this rule, the model produces output like:

Field	Value	Reason
Payment Terms	— BLANK	Pages 8 and 14 state different terms — net 30 vs net 45
Renewal Date	Jan 15, 2027	—
Liability Cap	— BLANK	References “Exhibit B” — not included in document

The blank fields are where the value is. They tell you exactly where to focus your attention. They’re the model admitting “I’m not sure” — something it would never do without explicit instruction.

The prompt language:

Extract the following fields from this document into a table. Rules: Only extract values that are explicitly stated in the document. When a value is ambiguous, missing, or unclear, leave the field BLANK. Add a column labeled “Reason.” Next to every blank field, include a one-sentence explanation of why you left it blank. Base every value on what the document actually says. Quote or reference the specific section you pulled it from.

One way to think about why this works is through the lens of Anthropic’s interpretability findings. The model has internal mechanisms for recognizing uncertainty — the default refusal behavior described above. But those mechanisms get overridden by the pressure to produce complete, coherent output. The “Force Blank” instruction may effectively give the uncertainty pathway permission to activate, rather than being suppressed by the completion instinct. We don’t know for certain that this is the internal mechanism at work — but the practical result is consistent and reliable.

Rule 2: Penalize Guessing

By default, from the model’s perspective, a wrong answer and a blank answer carry equal weight — neither earns praise, neither triggers correction. The model has no reason to prefer one over the other, so it defaults to guessing (which at least has a chance of being right).

Rule 2 changes this calculus with a single sentence:

A wrong answer is 3× worse than a blank. When in doubt, leave it blank.

This mirrors the scoring reform that OpenAI’s September 2025 paper advocates at the benchmark level. The researchers propose that evaluation systems should award points for correct answers, penalize wrong answers more heavily than abstentions, and give partial credit for appropriate expressions of uncertainty. They note that some standardized human exams have used this approach for decades — penalizing wrong guesses more heavily than skipped questions — precisely to discourage blind guessing.

You can’t change the benchmark. But you can embed the same incentive structure in your prompt. The 3× multiplier is arbitrary — pick any number that makes the model understand that silence is preferable to fabrication. The key insight is that you need to say it explicitly. The model won’t infer this preference on its own.

Rule 3: Show the Source

Even models that are told to “extract only” will drift toward inference. They’ll compute a renewal date from a start date and term length. They’ll estimate a total from line items. They’ll infer a contact person from an email signature. These aren’t necessarily wrong — but they’re not extraction, and the user needs to know the difference.

Rule 3 requires the model to label every value as EXTRACTED (directly stated in the document) or INFERRED (derived, calculated, or interpreted), with an explanation for every inferred value.

The prompt language:

For each field, add a column called “Source.” Mark each value as one of: EXTRACTED — directly stated in the document, exact match. INFERRED — derived from context, calculated, or interpreted. For every INFERRED field, include a one-sentence explanation of what you based it on.

The output looks like this:

Field	Value	Source	Evidence
Start Date	Jan 15, 2025	EXTRACTED	Section 2.1, paragraph 1
Term Length	24 months	EXTRACTED	Section 2.1, paragraph 2
Renewal Date	Jan 15, 2027	INFERRED	Calculated 24 months from start date. Check Section 8 — early termination clause may alter this.

The EXTRACTED/INFERRED distinction is a practical implementation of what hallucination researchers call “provenance tracking” — tying every claim back to its source. The model is perfectly capable of making this distinction; it just doesn’t bother unless you ask.

The Combined Prompt

All three rules work together. Here’s the complete version:

Extract the following fields from this document into a table.

Rules:

– Only extract values explicitly stated in the document.

– When a value is ambiguous, missing, or unclear, leave the field BLANK.

– A wrong answer is 3× worse than a blank. When in doubt, leave it blank.

– For each field with a value, add a “Source” column: EXTRACTED = directly stated, exact match. INFERRED = derived, calculated, or interpreted.

– For every INFERRED field, add a one-sentence explanation.

– For every BLANK field, add a row to a separate “Flags” table explaining why the value could not be extracted.

The workflow change this enables is significant. Instead of reviewing every extracted value (which nobody actually does), you review only the blanks and the inferred fields. Everything marked EXTRACTED with a section reference can be trusted at a higher confidence level. Your attention goes where it matters.

The Bigger Picture

These three rules are a stopgap. They work — sometimes remarkably well — but they’re fighting against the grain of how models are trained. The deeper fix requires changes at the infrastructure level.

OpenAI’s hallucination paper calls for benchmark reform: scoring systems that reward calibrated uncertainty instead of confident guessing. Anthropic’s interpretability work points toward architectural insights — understanding the internal circuits well enough to strengthen the “I don’t know” pathway rather than relying on prompt-level patches.

Perhaps the most structurally promising direction is OpenAI’s “Confessions” research (2025). Instead of relying on users to prompt honesty, the Confessions approach separates the honesty objective from the performance objective during training itself. After producing a main answer — optimized for all the usual factors like correctness, style, and helpfulness — the model generates a separate “confession” report. This report is scored exclusively on honesty: Did the model flag its uncertainties? Did it acknowledge where it took shortcuts? Crucially, nothing in the confession is held against the main answer’s score, so the model has no incentive to hide its doubts. If this approach scales, it could move the honesty problem from something users have to prompt-engineer around to something the model handles natively.

These are promising directions, but none of them are available to you today. What is available is the ability to change the local incentive structure in your prompts. Force blanks. Penalize guessing. Require source labels. These three rules won’t make AI honest by nature, but they create an environment where honesty is the path of least resistance — and that turns out to be surprisingly effective.

The models are smart enough to know when they’re guessing. They just need permission to say so.

Sources and Further Reading

OpenAI (September 2025): “Why Language Models Hallucinate.” Research paper arguing that standard training and evaluation procedures reward guessing over acknowledging uncertainty.
OpenAI (2025): “How Confessions Can Keep Language Models Honest.” Research on training models to produce separate honesty reports, scored independently from main responses.
Anthropic (March 2025): “Tracing the Thoughts of a Large Language Model.” Interpretability research revealing internal circuits for refusal, known-entity recognition, and hallucination in Claude 3.5 Haiku.
Anthropic (March 2025): “On the Biology of a Large Language Model.” Companion paper on circuit tracing and attribution graphs.
Carnat, I. (November 2024): “Human, All Too Human: Accounting for Automation Bias in Generative Large Language Models.” International Data Privacy Law, Vol. 14, Issue 4, pp. 299–314.
Qazi, I.A. et al. (August 2025): “Automation Bias in LLM Assisted Diagnostic Reasoning Among AI-Trained Physicians.” Randomized clinical trial, medRxiv preprint.
AI & Society (July 2025): “Exploring Automation Bias in Human–AI Collaboration.” Systematic review of 35 studies.
D-Squared (2025): “ChatGPT and Claude Got Smarter. Not More Honest.” Original slide deck presenting the three prompt rules.

December 10, 2016

Linkedin Pulse Replay: Random Thought of the Night: One Smartphone to secure them all! RTOTN

My preciousss! Most people treat their smart phone as if it were one of the rings of power. Why? The few non-initiates have been asking this question for a long time. The answer is simple. A smartphone is the most
versatile and most often used item I own. I read news, books, comics, listen to podcasts, watch videos, navigate, communicate. I also create content, do picture, videos, write text. I am stating the obvious. But a smartphone also boosts security and helps me become invisible!

Security
Of course we use a different password for every service. But there is something even better. Two factor authentication is the answer to protect your online persona. The easiest way to do this is to sign up via your cell phone number. You get a text message with a one time code to login. This is an easy solution to all those hacked social media accounts. Even if the hacker gets his hands on your password, as long as he does not also get the text messages to your cell number you are fine. Since your precious phone is living with you 24/7 it makes a lot of sense to authenticate this way.

The cell phone option is available from all the big services such as gmail, yahoo, outlook.com, etc. It can be even used with a dumbphone. Another pretty common type of two factor authentication is RSA tags, these small keychain authenticators. But the token might be forgotten, and carrying one for every service is a logistical nightmare. RSA key tags have been around a long time. Most gamers use them to protect their MMO accounts from being hacked. By now there are smartphone app versions which add the benefit of the ‘one ring’ to the security of RSA. If you use these you just have one device you take care off. And no text messages needed. This can be a big help while travelling in foreign countries.
The most versatile app is the Google authenticator. It helps you to easily two factor authenticate with your Googlemail, Evernote, Facebook etc. as you can easily assign the app to do more than one authentication. Mine is producing revolving keys to 5 different services right now. Just make sure you keep a backup of your smartphone setup and some printed out spare keys in a locker for emergencies!

Invisibility:
This sounds counterintuitive. Everyone is clamoring about how a smartphone logs everything you do and everyone you contact. True, but that is ON YOU! If you decide to use life logging or a location service on your facebook picture upload, of course you leave a lot of traces. One way to prevent this is not using a smart phone, but you can also avoid this by using your smart phone more smartly.

Think about why those services for counting your videos, tracking your fitness, displaying your pictures, collecting information are offered for free. Most services which offer to track stuff for you also track your stuff for themselves. Now imagine logging in to every service you use via e.g. your Facebook or Gmail. You do not only give the company a lot of aggregated data. Sign up with your main account to everything and the cloud owns all your data in one neat package for companies to profile you on! And worse, once you get hacked, the hacker controls everything, not just the one service.

But there is an easy solution: Spread out the data! Don’t use your main account for every service you use. Why not sign up for different services with different email accounts? Convenience, you say? Too many passwords and logins to remember?
There as ways to mitigate this: One is the password safe. Aside from being a very helpful tool to not have the same account/password for everything, you can use really weird account/passwords that are difficult to hack. Just make sure your password safe and your smartphone are secure.

If you want to keep the password in your head, make sure you add some specialty for every service so the checksum of the password will be different. MyPassword46 and mYpassword64 produce very different hashes (In the end, the more you spread out your web use, the more difficult it becomes for companies to gather data on you, or for hackers to use your data against you. Of course this is not perfect. If you use all of your accounts on the same smartphone there is always a way to combine your data if one service is registered to myname@gmail.com and one to sunset82@yahoo.com.

P.S: And with non-standard email apps such as Aquamail for Android or Thunderbird you can easily keep all the different accounts at your fingertips. More on useful apps later.

September 9, 2015

Read a Survey: Today’s State of Work: The Productivity Drain

http://www.servicenow.com/content/dam/servicenow/documents/whitepapers/sn-state-of-work-report.pdf

• Managers say that they spend 2 days a week, on average, on administrative tasks, and that this prevents them from doing strategic work.

• 9 in 10 managers say their productivity depends on the efficiency of routine work processes provided by other departments – such as delivering marketing services, IT support, issuing purchase orders, and onboarding employees.

July 27, 2015

The need of Europe (and the world) for IT skills

Found this on via a friend on Facebook yesterday and thought it a very good starter on the topic of knowledge, IT and managing both in the modern day and age.

While I am pretty amazed by how education in my older daughters school has been digitized I see the need of the WHOLE populace to get more into contact with how IT works.

She is e.g. preparing power points in school now for topics she has to report on and all is presented on a smartboard. The smartboard is also used to watch youtube in free hours :-). On the other hand a friend of mine who is a highschool teacher tells me that when he talked to colleagues in a education class about online media he was the only one in a big room full of people who not only new what Facebook was but also had his own account. He even KNEW people doing online gaming. Oh the devil!

The EC poster shows how it is.

Young People, growing up with tech just use it naturally but have no idea how it works beyond their immediate need. They all use whats app, Facebook, Skype etc. but have no idea what that implies for them in the long run. In the future I will write about media competencies because of that.

Working Age People, often are like young people but are also not interested in how things work. They just want to use and consume but not many want to become ICTs. Who can this be made more attractive is another topic I want to discuss. Knowledge management and knowledge education could also be an answer here.

And finally, with the world becoming more and more digitized Older People, will need at least basic skills in IT operations to keep up with development. If you never used the internet, how can you get you banking data, if the bank only sends out digital reports? My in-laws just told me that this just happened to them. For now they still get paper copies in a kind of “old age legacy program”. But this will stop too at one point. If by then the next generation has not gotten the skills to operate IT then they will be lost. Bringing everybody on board of IT is also just important to not leave older people behind.