During the last two Virtualization Security Podcasts, the panel discussed backups as well as scripting related to backups and in general. We went further to discuss the security implications surrounding backups, including whether or not a recovery is required when a site is hacked. The latter raises an important question: what constitutes a disaster that requires recovery? Is recovery needed only for catastrophic failure (which TVP has experienced)? Is it required in response to malfeasance from a disgruntled employee? To an external cyber-attack? Do you classify cyber-attacks as disasters requiring restoration from known-good sources and restoration of data from a backup, or do you use some other means to recover?
Keeping these thoughts in mind, we discussed not the reasons for backing up, but rather the importance of testing those backups regularly. We also discussed how scripting falls within the boundaries of restoration, how overscripting can be avoided, and business continuity. The real “gotcha” for any restoration made necessary due to malfeasance or disaster is solving the dependency problem.
A dependency problem is one in which backup administrators, virtualization administrators, and perhaps even application administrators and developers just do not know what is really dependent upon what within their environments. Some dependencies we can guess at and feel very confident that we are correct, but others are not that easy to determine. For example, we can most likely say that our entire environment is dependent upon Active Directory (AD) and Domain Name Services (DNS)—but is that 100% accurate? Is it even required? If AD is dependent upon DNS, and DNS is dependent upon AD, then in which order should they be installed? Should some things be foundational and not dependent upon anything else? How do you know what is foundational within your environment?
Tools are needed to help us answer these questions, and with those tools some level of analytics to let us determine whether we have backed up everything associated with an application, including dependencies, as well as whether or not our SLAs are being met with regard to retention, backup time, restoration testing, etc. All of these items can be scripted, and they probably should be. We know that the data exists, and now we need to get it into some usable form with the help of a backup tool. Here are two diagrams taken from VMware Virtual Infrastructure Navigator (VIN). If somehow this information could make it into a backup tool, all of my dependencies would be known. Many other tools can gather this information as well; for instance, practically all Application Performance Management (APM) tools can assist.
I have even more complex applications (according to VIN) within my environment. Using a tool like this as the basis for backup and business continuity scripts and tools would allow us to automatically gather information about our environments (including physical hosts), determine dependencies, and include those dependencies as part of our backup plans. Such a tool could then be used to automatically generate the restoration scripts needed to fully restore an environment.
However, when to restore will always be human decision, and as such, the security team should be involved. The main premise of security and backups is that you need to restore the operating system and applications from known-good sources; generally, this implies install media. However, you will need to know your supply chain in order to determine whether the media is considered to be good. That is another tool that could be used: one to inspect your supply chain. Several are available, including a system hosted by the federal government. Once everything is installed properly, then you can restore your data.
Ah, but there is the rub: how do you script installation and data restoration? Do you even know if your data is good? We can help with the first question, but the second is more difficult. The real goal of automation—automation for security reasons as well as for business continuity and recovery—is to have a repeatable process. All of the backup tools I see are effectively large scripting engines with APIs wrapped with a GUI. It is possible to treat these tools as part of a third-generation application for restoration. Use a script to install from media, apply patches, and then restore the data from the backup tool using the backup tool itself, accounting for all dependencies. Security needs to be part of this scripting process, as patching, patch repositories, and security policies still need to be applied, even when recovering from disaster.
The goal here is to have a repeatable process that can restore your data without human intervention, except to start and monitor the process. The decision regarding when to restore may be based on any number of reasons, including employee or third-party malfeasance, natural disaster, or human error. This repeatable process must include security controls and may be prompted by security issues.
Do you include security as part of your backups and as part of your disaster recovery and business continuity plans?