Incident Response at 2 AM: A Runbook for Engineering Leads
The worst incidents I've been part of were slow not because the fix was hard but because decision authority was unclear. A runbook that helps with that.
The worst incidents I've been part of were slow not because the fix was hard but because decision authority was unclear. A runbook that helps with that.
Overview
This note is part of the field-notes archive generated for this site. The summary below is the published excerpt; you can expand the full write-up anytime in the CMS.
Series
Part of On-Call Philosophy (installment 1).
Related notes
Tags
- incident-response
- on-call
- engineering-management
- reliability
- runbook
Manish Bookreader
Electronics enthusiast, Embedded Systems Expert, Linux/Networking programmer, and Software Engineer passionate about AI, electronics, books, and cooking.