Data protection of the future: What will it look like? Will we have huge amounts of storage in just one place, or will we have myriad data everywhere? The more copies the better, for example? Or are we moving toward a combination of the two? Can what we are doing today actually be used for data protection in the future? Think about how hybrid clouds are used today: do they grant us new forms of data protection? Within any hybrid cloud today, data is proliferated everywhere, even into places we do not yet comprehend. Is this a form of data protection? Let us look at a simple example, the company phone book.
We know that is it located on-site in one specific area and that, through traditional backup mechanisms, the phone book is also located within some form of offline storage (such as tape).
Now, where else is the phone book?
- Partly within individual users’ smartphones and tablets
- Partly within a SaaS CRM system
- Partly within myriad email(s) (and therefore email systems)
- Perhaps within a hot-site or hot-ready cloud instance
- Perhaps within a ready-to-use cloud database, document repository, etc.
- Perhaps within a Storage as a Service instance
It is pretty much anywhere within the secure hybrid cloud, I would say, as shown in Figure 1. But the question then becomes, “Can we ever get our data back again so that we have it in-full somewhere usable?”
If we could get our data from any cloud or transition service regardless of type, then we would end up with a myriad of backup locations, each containing a shard of data that is important to an individual contributor. Use of these shards for data protection requires us first to know where our data is and then to understand how to recover it. It would be very difficult, for example, to request that all the smart devices from a large organization recover all of the phone book entries in the company phone book.
Actually, such a request, unless covered under a data sharing policy (law and jurisdiction) that protects individual privacy, would be in violation of many privacy statutes. The legal issues surrounding such a request make it practically impossible to implement. Does this imply that this bit of data is not properly protected?
Perhaps EUC devices are off-limits due to the legal ramifications. Does this mean that other places within our secure hybrid cloud also can not store the data? What about Desktop as a Service; is the phone book not also split up to be within the contact lists of a myriad of desktop email programs? Or perhaps stored within myriad SaaS applications? Can we not recover this as necessary as well?
Data Protection: Know Where Your Data Is
The first part of data protection is to know where your data resides. Assume it already left the barn, but where is it? We could probably assume that some is outside our hands (legally and jurisdictionally) but nonetheless could be available to us. For example, our phone book could be in any one of these places (and probably many more):
- SaaS CRM
- SaaS email
- SaaS document sharing/repository
- Cloud-based storage facility
- IaaS instance of groupware
- Individual contributors’ desktops
Once we find out where our data resides, we need to understand if all the data is available to be reconstituted or if we need another mechanism to store the data.
Data Protection: Is Our Data Protected?
In many cases, our data may be so split apart that we may not understand if it is all there, and if that is the case, then the data is not entirely protected. To ensure this does not happen, traditional data protection tools use checksums for comparison. This type of integrity check is crucial to determining if the data is actually protected. But more to the point, our definition of “protected” is also changing. Just storing data for restoration is no longer “de rigueur”; we are now required to have that data someplace where it is actually usable within moments, or perhaps is in sync with the primary location. This way, the data is always available for use regardless of state. If we had to reclaim all the bits of data stored around the users’ devices, this reclamation would take time, there would be holes in our data, etc. That is, unless the data protection tool incorporates all these bits into the mix and the recovery time is acceptably low.
What is missing is an understanding of whether our data is actually protected, not just a few assumptions based on where our data could have possibly ended up. Into this gap has been launched a new breed of analytics tools that tie into traditional backup tools to determine a) if data has been backed up and b) whether recovery will meet our goals. I would like to see covered in such tools the data sharding that happens on a regular basis. Data sharding happens within large-scale databases to ensure quick response time, but can we recover from these shards to the full set of data? Today, I do not see that in use.
Data sharding is a major component of the secure hybrid cloud today: not a great component, but one that happens whether we know it or not. We need tools that can incorporate this to determine if the shards can be captured once more to recover the critical data of the organization. If they can, then we have one more tool in our data protection toolbox. But we also need to determine where each shard resides, what is in each shard, and how those shards can be reconstituted.
This sounds like a big data problem to me. Neverfail has a start on this with their IT Continuity Architect, which determines what data has been backed up using traditional virtual and physical data protection tools. Employing such a tool will give you a clear picture from a traditional sense of whether all your data is protected or not. The future of data protection and such tools should include the concept of data sharding and recovery of all those shards to reconstitute the whole. This could in effect reuse storage instead of adding even more storage purely for data protection.
Do you know where all the shards of data are within your organization? Can you recover them?