Data Protection for the Hybrid Cloud

In many cases, when we mention Data Protection for the Hybrid Cloud, we are usually talking about backing up to the cloud. The cloud becomes a repository of our backup images and in some cases those backup images can be launched within clouds that use the same technology. Being able to send data to the cloud is becoming table stakes for infrastructure as a service (IaaS) data protection. However, once we move outside the realm of IaaS to Platform or Software as a Service (PaaS or SaaS), data protection is hit or miss.

Data Protection is a piece of the larger hybrid cloud security story as we discussed in Securing the Hybrid Cloud and as seen in Figure 1. In many cases Data Protection is used to transition virtual machines and data to the cloud, and in some cases back again. For this to work, the data protection tool must use an identity to contact the cloud, authenticate and authorize itself, then speak the appropriate API to move the data to the cloud. In most cases, the API used, is the one provided by the data protection tool. Yet the identity is the identity of the user with rights to access the data center as well as the cloud and is most likely an administrative user in both. That identity is controlled by just the use of user name, password, and perhaps a secondary password (or authentication key).

Data Protection in Secure Hybrid Cloud
Figure 1: Data Protection in Secure Hybrid Cloud (click to expand)

However, there is also a need to get your data back out of each cloud into which it is placed. This is the ‘There and Back Again’ requirement for data protection. For IaaS and Storage Clouds, the traditional ways to move data into and out of those cloud solutions works as expected. For IaaS, the traditional way is to set up a data protection server within the cloud and move the data out to another location, perhaps your data center or even another cloud.  However, for SaaS and PaaS data protection is lacking, specifically there are three ways to get data from a PaaS or SaaS cloud solution:

  • Protect data as it enters the PaaS or SaaS using some sort of data protection proxy
  • Use a built in to the PaaS or SaaS data protection mechanism such as available from SalesForce (for an extra fee)
  • Use the APIs to retrieve data from the PaaS or SaaS on regular intervals.

Hybrid Cloud Data Protection Tools

There are many solutions for data protection and some can not only work in the data center but all the clouds in question, so which ones can backup into and out of clouds?

Veeam – Veeam replicates and backs up data between clouds, data center to data center, and data center to clouds (replication receiver clouds) as long as Veeam is running in all areas. In addition, Veeam will backup multiple hypervisors. In many ways Veeam has set the table stakes for this type of backup and recovery: multi-hypervisor support, agentless backup and recovery, backup to disk, backup to tape, backup from storage snapshot, recovery testing, source side deduplication, target deduplication, change block tracking, zero block tracking, cross-VM deduplication, VMware vApp Aware, and recovery/boot from backup store.

Symantec – Symantec has three products within the data protection space for data centers and clouds and their approach is different than traditional virtualization and cloud backup tools, Symantec is looking at the entire data center including those systems that are not virtualized yet. For those systems and even virtual machines Symantec can use agents to perform backups of data towards a backup server and from that server they can move the data to other data centers, tape, or even a cloud. Symantec has also built their own replication receiver cloud for this purpose. Like Veeam, you need to have Symantec in the cloud in question to receive the data. Symantec has multi-hypervisor support (via agents), agentless backup and recover (for vSphere), backup to disk, backup to tape, change block tracking (for agentless backup), and target deduplication.

Quantum – Quantum VMpro takes a different approach than the others in that it will hook into the hypervisor much like the others but it is all about deduplication of the data as it is being written (target side cross VM deduplication) while using change block and zero block tracking to reduce the amount of data transferred between the source and the target. They have also built their own purpose built replication receiver cloud using a Xerox cloud. They have also concentrated on a restore anywhere mechanism: they can restore to a cloud, data center, data center in a box (laptop), or where ever the data is needed.

Zerto – Zerto is not a backup tool but a replication tool, that will replicate its data between VMware vSphere and a cloud or another datacenter running VMware vSphere. Zerto’s replication is based on tying into a storage introspection layer to perform the replication and as such will only run within VMware vSphere and this is required for both sides. Zerto can replicate to many different clouds that support VMware vCloud and is partnered with a wide variety of clouds. The Zerto features include replication from data center to data center, cloud to cloud, cloud to data center, and data center to cloud, change block tracking, vSphere vApp aware, and boot from replication store.

Asigra – Asigra provides a different type of backup of data. Unlike the others which concentrate on virtual machines and data going to a cloud that has the same data protection tools in use, Asigra uses the cloud APIs to backup data to a cloud but also from the cloud. Asigra fills the gap for PaaS and SaaS solutions missing from the other types of tools. By using the SaaS APIs directly, Asigra does not require the cloud provider to have their software within the cloud.  This approach limits Asigra’s footprint while providing access to data not normally exported from the SaaS provider. Once the APIs for a PaaS cloud are known, it is also possible for Asigra to automate the export of PaaS cloud data.

EMC Avamar – EMC Avamar is the basis for VMware VDP and as such has recently added some new but important functionality. Avamar’s approach is similar to the Symantec approach and is designed to backup the entire data center, however it recently improved its existing functionality for VMware vSphere (agentless backup and recovery, backup to disk, backup to tape,  target deduplication, change block tracking,  cross-VM deduplication) by adding the following features: Inheritance of Backup Policy within the vCenter Folder hierarchy, and recovery/boot from backup store. The same set of features have also been added to EMC Networker.

Closing Thoughts

The list of data protection tools grows daily with some being developed for nearly every cloud and virtualization service available, however the goal of data protection is to ensure the availability of your data when you need it and how you need it and not to keep your data locked within an unreachable cloud or data center. Disasters happen and disaster recovery is a part of doing business but there is a growing need to maintain data in multiple places for business continuity reasons. Can we trust the clouds to maintain our data or should we be the ones in charge of that maintenance (that depends on your contract and SLA). Data Protection is the responsibility of the data owner, and where that data lives needs to be well understood as you work through any data protection scheme.

I was recently asked, is a cloud a good place to put data, and the answer is that you need to first understand your data and how it is classified, then determine how that data should be stored to meet those classification policies. You may find that you need a gateway that encrypts your data before placing it in your cloud, but you always need to be able to get your data out of a cloud. Always think, “there and back again”.

How do you protect your data? If using a cloud, how do you get your data “there and back again”?