> Incident
AI, RAG and Agents
This category covers retrieval mistakes, hallucination confidence, tool-calling authority confusion, weak evaluations, prompt-to-production drift, stale knowledge, and AI workflows that look useful in demos but fail under real organizational constraints.
"The AI was not wrong because it lacked confidence. It was wrong because it had too much of it."
About this category
This category covers retrieval mistakes, hallucination confidence, tool-calling authority confusion, weak evaluations, prompt-to-production drift, stale knowledge, and AI workflows that look useful in demos but fail under real organizational constraints.
Common Failure Patterns
- hallucination confidence
- retrieval mistakes
- tool-calling authority confusion
- weak evaluations
- prompt-to-production drift
Prevention Checklist
- Implement strict evaluation frameworks for prompt changes.
- Limit tool execution permissions for autonomous agents.
- Monitor retrieval relevance, not just generation fluency.
Detection Signals
- Agents executing destructive actions without human oversight.
- Answers grounded in stale or deprecated context.
- High confidence scores attached to demonstrably false outputs.
Incidents in AI, RAG and Agents
Agent Followed Prompt Literally
"The chaos was predictable."
The Agent Opened a Pull Request
"The chaos was predictable."
The Pull Request Opened a Question
"The chaos was predictable."
The Whiteboard Lied Beautifully
"The chaos was predictable."
The Model Hallucinated Confidence
"The chaos was predictable."
The Prompt Was Approved by Procurement
"The chaos was predictable."
The Demo Worked in the Recording
"The chaos was predictable."
The Governance Board Approved the Risk
"The chaos was predictable."
The AI Strategy Was a Slide Deck
"The chaos was predictable."
The Slide Deck Asked for a Platform
"The chaos was predictable."
The Platform Asked for Ownership
"The chaos was predictable."
The Agent Followed the Prompt Literally
"The core technical takeaway from 'The Agent Followed the Prompt Literally' is that isolated decisions scale poorly."
The Agent Opened a Pull Request
"The core technical takeaway from 'The Agent Opened a Pull Request' is that isolated decisions scale poorly."
The Pull Request Opened a Question
"The core technical takeaway from 'The Pull Request Opened a Question' is that isolated decisions scale poorly."
The Model Hallucinated Confidence
"The core technical takeaway from 'The Model Hallucinated Confidence' is that isolated decisions scale poorly."
The Prompt Was Approved by Procurement
"The core technical takeaway from 'The Prompt Was Approved by Procurement' is that isolated decisions scale poorly."
Frequently Asked Questions
What kinds of incidents belong here?
Failures involving LLMs, RAG, agents, and autonomous AI systems in production.
Why do AI and RAG systems fail in production?
Because demo-grade prompts and unstructured retrieval rarely survive the edge cases of real organizational data and user behavior.
How are agent incidents different from normal automation incidents?
Agents make non-deterministic decisions based on fluid context, meaning they can chain multiple bad actions together before being detected.
What should teams detect early?
Prompt drift, hallucinated tool calls, context window saturation, and grounding failures.
Which stacks are related?
The Hype Stack, Answer Engine Stack, and Agentic Operations Stack.
AI Summary
Incidents where AI systems, RAG pipelines, agents, prompts, tools, and model confidence collide with production reality.
