How to handle downtime:

Based on Lenny Rachitsky's Upside of Downtime talk at Velocity 2010

Before: Prepare

  1. Communication Channel
    • Must be easy to find
    • Must be hosted off-site
    • Must be real-time
  2. Process
    • Give authority for people to communicate
    • Set up a MTTC (Mean Time To Communicate)
    • Have a process to respond to & escalate issues

During: Communicate

  1. Communicate
    • Use the communication channel
    • Adhere to your MTTC policy
    • Describe who/what is affected
    • State when the issue started
    • Give an ETA for a resolution
    • Update regularly
  2. Fix it!

After: Explain

  1. Post-mortem
    • Admit failure
    • Sound like a human
    • Give the start & end time of the issue
    • State who/what was impacted
    • Describe what went wrong
    • Share the lessons you learned
  2. Learn and improve

