> Stack
The Observability Stack
Incidents where dashboards show green while users scream on social media.
"A dashboard is not observability if nobody looks at it until the sirens go off."
What this stack means
This stack explores the gap between collecting telemetry and actually understanding system health.
Why this stack exists
Because it is easy to measure infrastructure metrics like CPU, but hard to measure user experience.
▶ Common Failure Patterns
- •dashboard blindness
- •alert fatigue
- •missing telemetry
- •watermelon metrics
- •tool sprawl
Prevention Checklist
- Alert on Service Level Objectives (SLOs) tied to user experience.
- Consolidate observability tools to provide a unified view.
- Regularly test alerts to ensure they are actionable and routed correctly.
Detection Signals
- Hundreds of alerts firing during a routine deployment.
- Engineers ignoring the monitoring channel because it's too noisy.
- Discovering outages via Twitter rather than internal alerts.
Related Categories
Related Stacks
Incidents in The Observability Stack
Cache Guy Delivers a Fast Answer
"Caching is not a substitute for an optimized database query; it is a complex distributed state problem."
Agent A Takes Initiative
"AI capability is not approval; autonomous agents require strict API boundaries and blast-radius limits."
Dashboard Green Nobody Asked
"The chaos was predictable."
Cache Expired During Demo
"The chaos was predictable."
Monitoring Tool Had Feelings
"The chaos was predictable."
CTO Asked for One Number
"The chaos was predictable."
Number Was Not Real
"The chaos was predictable."
The Cache Was Correct Yesterday
"The chaos was predictable."
The CDN Solved the Wrong Problem
"The chaos was predictable."
The Edge Case Lived at the Edge
"The chaos was predictable."
Mono Remembers Everything
"Legacy code is often the only reliable documentation of historical business rules and edge cases."
The Cache Expired During the Demo
"The core technical takeaway from 'The Cache Expired During the Demo' is that isolated decisions scale poorly."
The Cache Was Correct Yesterday
"The core technical takeaway from 'The Cache Was Correct Yesterday' is that isolated decisions scale poorly."
The Observability Stack - Frequently Asked Questions
What is this stack?
Dashboards that are green while the system burns.
AI Summary
Incidents where dashboards show green while users scream on social media.
