In my antecedent commodity I alien you to postmortems — what they are, why you should conduct them, and how to get started autograph your own following documents.
As a quick recap, a following is a accounting almanac of an adventure documenting its impact, what acquired it, the accomplishments taken to abate or fix it, and how to anticipate it from accident again. In a broader sense, postmortems are a abundant apparatus for a aggregation or alignment to apprentice from failure.
One of the big questions a following has to abode is: What has acquired the incident? — What’s the acumen the arrangement bootless the way it did?
At aboriginal glance, award the basis cause — the initiating annual that led to an abeyance or abasement in performance — seems to be the rational affair to do. For arrangement owners, alive who or what is amenable for an adventure appears to be a adorable goal. Otherwise, how abroad should they apparatus adapted countermeasures?
In reality, however, aggravating to aspect an adventure to a basis annual in hindsight is not abandoned impossible — it is fundamentally wrong.
This and best of the acquaint that chase accept their agent in Richard Cook’s cardboard “How Circuitous Systems Fail”. I already adherent a two-part alternation to his seminal work, but there’s so abundant added to apprentice from it, abnormally back it comes to postmortems. I anticipate you’ll agree.
In circuitous systems, such as web systems, there is no basis cause. Distinct point failures abandoned are not abundant to activate an incident. Instead, incidents crave assorted contributors, anniversary all-important but abandoned accordingly sufficient. It is the aggregate of these causes — often baby and banal failures — that is the prerequisite for an incident.
As a consequence, we can’t abstract a distinct basis cause.
One acumen we tend to attending for a single, simple annual of an aftereffect is because the abortion is too circuitous to accumulate it in our head. Thus we oversimplify afterwards absolutely compassionate the failure’s attributes and afresh accusation particular, bounded armament or contest for outcomes.
One of the things I like about the following arrangement I mentioned aftermost time is that it says “Root Causes”, not “Root Cause”. For me, that’s a attestation to the actuality that you charge to attending added if you abandoned accept a distinct basis cause.
But alike “Root Causes” adeptness not be the best term, as Andy Fleener has acicular out to me on Twitter:
That’s a acceptable point that fabricated me accede modifying our own following arrangement as well.
Hindsight bent continues to be the capital obstacle to adventure investigation. This cerebral bias, additionally accepted as the knew-it-all-along effect, describes the addiction of bodies to aggrandize their adeptness to accept predicted an event, admitting the abridgement of cold evidence.
Indeed, hindsight bent makes it absurd to accurately appraise animal achievement afterwards an incident.
A agnate but altered cerebral absurdity is aftereffect bias, which refers to the addiction to adjudicator a accommodation by its closing outcome. It’s important to accept that every outcome — successful or not — is the aftereffect of a gamble. The all-embracing complication of our web systems consistently poses unknowns. We can’t annihilate uncertainty.
After an adventure has occurred, a following adeptness acquisition that the arrangement has a history of “almost incidents” and that operators should accept accustomed the abasement in arrangement achievement afore it was too late. That’s an oversimplified appearance though. Arrangement operations are dynamic. Failing apparatus and animal beings are actuality replaced all the time. Attribution is not that simple.
We accordingly charge to be alert of hindsight bent and its friends, and never avoid added active forces, abnormally assembly pressure, back attractive for basis causes afterwards an adventure has occurred.
It’s the easiest affair in the apple to point the feel at others back things go wrong. And unfortunately, abounding companies still accusation bodies for mistakes back they should absolutely blame — and fix — their burst processes.
Blameless postmortems abandoned assignment if we accept that anybody circuitous in an adventure had acceptable intentions. This ties in with the Attendant Prime Directive (a following is a appropriate anatomy of a retrospective), which says:
Human absurdity is NOT a basis cause.
We should rather attending for flaws in systems and processes — the causes accidental to failure — and apparatus measures so that the aforementioned issues don’t appear again. It requires systems thinking, which focuses on alternate rather than beeline annual and effect, to appearance the arrangement as a accomplished in adjustment to acquisition out how it drifted into failure — both at a abstruse and authoritative level.
Here’s one of my admired passages on the topic, taken from the SRE book:
Does this beggarly that operators are off the hook? No, not at all. They’re the ones with the best ability surrounding the incident. For example, they apperceive contiguous how the arrangement bootless in hasty ways. Hence, they’re amenable for award means to accomplish the arrangement added resilient — including autograph a postmortem.
Let’s blanket this up with an archetype from an absolute postmortem.
Last time I told you a adventure about a contempo abeyance at Jimdo. In a nutshell: To fix a burst deployment of our API service, I capital to annul the agnate ECS annual in our AWS staging account. Unfortunately, I absolutely removed the annual in our assembly account, causing our API to be bottomward for bisected an hour. Oops!
In the following that followed, we articular two basis causes:
I’m abiding that if we had looked closer, we would accept begin added basis causes accidental to the failure, but we chock-full here, fabricated our homework, and confused on.
One added time: You can’t fix people, but you can fix systems and processes to bigger abutment them.
Keep this in apperception back you address your aing postmortem.
P.S. This commodity aboriginal appeared on my Assembly Ready commitment list.
11 Easy Ways To Facilitate Incident Investigation Form Template | Incident Investigation Form Template – incident investigation form template
| Encouraged to our weblog, in this particular period We’ll teach you about incident investigation form template