During a recent Twitter conversation about disaster recovery and business continuity testing, I began to consider how we communicate during a disaster. We do so not with normal communication methods, but more often than not with an interrupting form of communication—one in which constant requests for updates, criticisms, and outright demands for attention are directed at those who are doing the work of recovering a system. During a disaster recovery effort, communication breaks down. Why? Generally, not enough testing has been performed to document communication issues or any other types of issues. How can we improve this communication, or even get the proper people involved, when six feet of snow, water, or mud surrounds our place of work?
Now, granted, this is a good reason to use a cloud provider; but what if the disaster affects our provider? We need to learn from the failures of others to improve our overall disaster recovery. We need to plan for all forms of failure, including the failure to communicate. Hearken back to Sandy, Katrina, Malibu mudslides, forest fires, and similar disasters. Would you even be able to get to your place of work to retrieve a copy of your disaster recovery plan? Do you have an out-of-band mechanism for contacting all who need to be involved in a disaster recovery? What is your current communication plan?
During Sandy and Katrina, it was physically impossible to get to various data centers. Because of this, many companies were impacted negatively. Some, including companies with petabytes of data, lost everything that was not in waterproof safes. Yet, most disaster recovery plans are within similarly located safes, on people’s desks, or within storage systems that may become inaccessible during a disaster. This implies a need for predisaster planning for recovery. At minimum, you need to know the following:
- Do all interested parties have the full disaster recovery plan? Specifically, do those who will be implementing any section of the plan have a copy of it, and not just at their place of work? Do their alternates?
- Do you have a call tree for use in case of disaster? If someone does not respond due to lack of communication capability, who is the alternate? Never assume that cell towers, satellite, network, and phone lines will be working. How will you handle cases in which a person in the call tree can receive information but cannot respond?
- What communication paths are available, and how will you handle cases in which those paths fail, for whatever reason?
- Are some of the people who will be involved in recovery located outside of the hundred-year disaster radius? Are they primaries or alternates?
- When was the last time you had a fire drill purely to determine whether all communication paths work?
We have plenty of tools to help us with disaster recovery of technical resources, but have we thoroughly thought through communication? What will happen if disaster strikes, and you lose connectivity to the Internet? Access to phone books? Access to phone numbers for your cloud and other providers? Ideally, a recovery should involve just a page (SMS) and a push of the big red button. But we all know the ideal never happens.
We need communication. And this requires planning, documentation, and education. When a disaster hits, politics often come to the fore based on who responds first and what is crucial at that moment. The plan, unless practiced, tends to fall apart immediately. No plan survives the first encounter with the enemy. This is why we have to construct contingency plans. Have you extended your plans and contingency plans to cover communication? Do you maintain external repositories for your disaster recovery plan (which should include a disaster communication and preparation plan)? Perhaps you do so using in Google Docs, Salesforce, Box.net, Dropbox, or other locations?
Assume the worst happens; how do you even communicate to invoke a recover?
Have you not only read your disaster recovery plan lately, but also ensured that you have adequate capability to communicate with your recovery team members during a disaster?
Share this Article:
Latest posts by Edward Haletky (see all)
- Continuous Integration, Deployment, and Testing - July 22, 2016
- Serverless: Business Plan or an Approach to Technology? - July 21, 2016
- Root Cause Analysis Is Not Dead - July 13, 2016