Every app is going to go down at some point. No matter how good your team is you are going to have a major downtime. It’s not a question of “if” it’s a question of “when”. Even Google goes down from time to time. And when downtimes do happen, things get ugly quickly. These are going to be the hardest times for you and your customer support department.
Every serious outage causes loses. How much you lose depends directly on how your customer support agents have handled it.
Groove, one of our competitors, had a major downtime a couple of months back. 11 hours passed between the time when the downtime happened and when they noticed it. I'm not trying to make fun of a competitor – I really feel for them. But 11 hours is freaking crazy, sorry.
You don't want to be in their shoes. To prevent this you can use tools that monitor the server uptime and notify you as soon as your app goes down. Like Pingdom and PagerDuty. When something goes wrong our phones start blowing up with automated text messages and calls. Chances that the entire team will sleep through these alerts are pretty slim (hey, a side-benefit of having your team spread across different time zones).
You need to have a central place where you can continuously communicate with your customers throughout the downtime. Like most companies, we use Twitter for this. Some big companies have dedicated status pages (you can use StatusPage.io, if you want one).
When tickets start pouring in you need to apologize, let them know your handling it and send everyone to that place (hey, another side-benefit: downtimes are great for getting new Twitter followers). And post regular updates afterwards. Here is what customers want to know:
After everything is settled write a blog post. Users need to know what happened, how it affects them and what measures have you taken to prevent this from happening in the future. If the outage affected all of your users, you may also want to send an email to all customers.
Here are some general guidelines for the blog post: