Over the last few weeks, I have been taking a hard look at various data protection tools to determine if they meet the goals for the next generation of tools. Those goals are quite interesting, actually, the main goal being application-centric backup with increased visibility into our methodologies. We need to know not only how well any backup, replica, and recovery operation meets our SLAs, but also whether or not all our data is actually available. This includes determining if there are any dependencies for an application as well as taking a comprehensive look at all the different forms of data protection. The other major goal for the next generation of tools is to preclude the need for a human element: in essence, we need to provide data protection without needing a human to set it up for us.
During a recent Twitter conversation about disaster recovery and business continuity testing, I began to consider how we communicate during a disaster. We do so not with normal communication methods, but more often than not with an interrupting form of communication—one in which constant requests for updates, criticisms, and outright demands for attention are directed at those who are doing the work of recovering a system. During a disaster recovery effort, communication breaks down. Why? Generally, not enough testing has been performed to document communication issues or any other types of issues. How can we improve this communication, or even get the proper people involved, when six feet of snow, water, or mud surrounds our place of work? Continue reading Disaster Recovery Communication
Attending Gigaom Structure was an exercise in getting fire-hosed with the leading edge innovation that public cloud providers are bringing to their customers worldwide. These innovations not only will have a profound effect on public cloud computing, but also will ultimately impact data center architectures, costs, and benefits worldwide.
Backup, disaster recovery, and business continuity have changed quite a bit over the years, and they will continue to change into the future as more capability, analytics, and functionality are added to the general family of data protection tools. As we launch ourselves into the clouds, we need to perhaps rethink how we do data protection, what tools are available for data protection, and how to use our older tools to accomplish the same goals. We need an integrated data protection plan that not only accounts for cloud or data center failures but also accounts for the need to run within the cloud. There is always the need to get your data there and back again. Continue reading The Face of the New Backup
During the last two Virtualization Security Podcasts, the panel discussed backups as well as scripting related to backups and in general. We went further to discuss the security implications surrounding backups, including whether or not a recovery is required when a site is hacked. The latter raises an important question: what constitutes a disaster that requires recovery? Is recovery needed only for catastrophic failure (which TVP has experienced)? Is it required in response to malfeasance from a disgruntled employee? To an external cyber-attack? Do you classify cyber-attacks as disasters requiring restoration from known-good sources and restoration of data from a backup, or do you use some other means to recover?
Recently, we experienced a fairly catastrophic SAN failure: we lost two drives of a RAID-5 array. Needless to say, recovery was time-consuming, but it also pointed out some general issues with many disaster recovery, business continuity, and general architectures involved with virtual environments. Luckily, we were able to start one of the drives, let the hot-spare take over for the second failure, and recover the vast majority of our data. Yes, there was corruption, so that is where our backups came in and the ultimate dependencies for restoration. How do you recover from a catastrophic failure? Do you fail over automatically to a hot-site or cloud environment? Even if you fail over, how do you recover from a catastrophic failure? Continue reading Recovery Lessons Learned from Storage Failure