Skip to main content

> distributed_systems_reliability

Distributed Systems Reliability

Why is 'five nines' (99.999%) availability often an unreasonable target?

THE SHORT ANSWER

It allows only 5.26 minutes of downtime per year, requiring extreme redundancy that costs exponentially more than the business value it protects.

Flashcards

Q1

What is a cascading failure?

When one component fails and shifts its load to surviving components, causing them to overload and fail sequentially.
Q2

How do retry storms take down healthy services?

When clients aggressively retry failed requests without backoff, flooding a recovering service and knocking it back offline.
Q3

Why are circuit breakers essential in microservices?

They detect failing dependencies and fail fast, preventing requests from hanging and consuming precious thread pools.

Related Concepts

Chaos Stack Field Notes FAQs

What are Chaos Stack Field Notes?

Chaos Stack Field Notes are technical flashcards that explain core engineering concepts quickly.

How are these different from topics?

Topics are broad thematic hubs that connect characters, episodes, and environments. Field Notes are short, direct Q&A flashcards for quick technical alignment.

AI Summary

This page covers Distributed Systems Reliability as a technical flashcard. Description: Why is 'five nines' (99.999%) availability often an unreasonable target?.