Copy data software is becoming much more prevalent and could be a replacement for many data protection products. But is it? Do copy data solutions provide data protection or just movement of data around the cloud? That is really the crux of the discussion. Is having multiple copies of data out in the cloud sufficient for data protection, or do we need more?
What Is Copy Data?
First we need a definition. What is copy data? Some equate it with automated data management (Catalogic) or with a data platform (Teradata). But in reality, it can be considered to be both. The requirements to be a copy data platform are:
- Automation, as in automating data movement between like or unlike targets
- Data target, as in being a target for data usages such as data protection
- Data management, as in being able to manage the data to ensure that the proper number of copies are available and security policy is followed, in addition to being able not only to encrypt data but to report compliance up to GRC platforms
- Versioning of the data so that no single corrupted file removes the data from use
- Above all, knowing where all data is at any given time
Being a target allows a copy data platform to be in the path of the data being written. That target could be a transparent proxy, a physical or virtual device, or any other mechanism to get between the data being written and the target device. In many cases, this allows for replication of data to multiple targets such as the cloud. Most importantly, it allows the data to be properly stored (such as encrypted) based on security policy.
Copy data platforms also include a large amount of automation, as they need to automate movement of data between multiple locations (on-premises and off), as well as to be part of every copy of the data being made, whether that is to Dropbox, cloud storage, or other storage within the data center. This is why a copy data platform must be able to tell anyone where the data is at any given time.
Is Copy Data Actually Data Protection?
Let us think about this for a moment. Copy data allows us to automatically copy or move data from one location to another while adhering to a predefined policy and reporting on policy failures. For instance, Veeam’s 3-2-1 backup rule requires:
- 3 Copies of Your Data
- 2 Different Media
- 1 Offsite Copy
Does this imply that copy data meets our needs?
- 3 Copies of Your Data: Absolutely copy data meets these needs, as copy data platforms inherently copy data.
- 2 Different Media: I’m not sure on this one. It depends on how you define media, but to me, disk is disk, regardless of type. This is not about using spinning rust and SSD, but rather it is about using media that is hot ready versus media that needs to be loaded, such as Blu-ray or tape.
- 1 Offsite Copy: Absolutely copy data meets this need, as you can have as many offsite, in-the-cloud, in-a-hot-site copies as you wish to have.
The real issue here is the two different media. Copy data can use two different media if you set it up to do so, but there is no requirement to do so for copy data. There is for data protection. Why? Because data protection is about being able to recover data, not just to copy it about.
If you use something like Facebook’s Blu-Ray Cold Storage subsystem, then you have not only hard disk based media, but Blu-ray media. Copy data could be used for that, but copy data does not require it intrinsically.
To be frank, neither does most data protection software. However, the key is recoverability. What would we need to modify within copy data to recover virtual or physical machines, databases, etc.?
Manage Your Data
In the end, copy data solutions provide a mechanism that facilitates management of your data and knowledge of your data in new and interesting ways. It allows you to apply policy to the data outside of the application. Data protection is about recoverability: the ability to recover your data to be used again. If your copy data solution is to team up with a data protection technology, then you may have an incredible combination. Such a solution could be multisite distributed systems with hot failover capabilities, placement of data in cold storage on tape or Blu-ray, or any number of key items.
Recoverability, however, requires that data be available not just for instant-on, but historically in case of a major disaster. Do copy data solutions keep multiple recover points for files if all they do is copy a corrupted file around the copy data network?
How do you use copy data today? Do you make it a target for point-in-time recovery points for your data, or do you try to ensure data is copied on a schedule with versioning?