π Post-Mortem & Incident Review β
Learning from failures and incidents without blame.
Purpose β
A post-mortem is a structured review after an incident or project failure. The goal is to learn and prevent recurrence, not to assign blame.
Post-mortem process β
- Timeline β Reconstruct what happened, when, and who was involved
- Root cause analysis β Use the 5 Whys to dig deeper
- Impact assessment β What was affected? Users, revenue, reputation?
- What went well β What worked during the response?
- What went wrong β Where did things break down?
- Action items β Specific, assigned, time-bound improvements
5 Whys example β
Problem: Deployment caused 2h outage
Why? β Config file was wrong
Why? β It wasn't reviewed before deploy
Why? β No review step in the deploy process
Why? β Deploy process was never formalized
Why? β Team grew fast, process didn't scale
β Action: Formalize deploy checklist with mandatory reviewPost-mortem template β
| Field | Content |
|---|---|
| Date | ... |
| Severity | P1 / P2 / P3 |
| Duration | ... |
| Summary | One paragraph |
| Timeline | Chronological events |
| Root cause | 5 Whys result |
| Impact | Users, systems, business |
| Action items | What, who, by when |
Best practices β
- Hold the post-mortem within 48 hours
- Blameless culture β focus on systems, not individuals
- Share findings broadly to prevent similar issues elsewhere
- Follow up on action items in the next sprint
- Archive post-mortems as organizational knowledge