Recently, the xkcd comic (https://xkcd.com/1909/) mentioned digital lifespan with a pithy comment about digital resources disappearing quickly. This is quite prevalent in the project to restore NASA records from Apollo missions, such as LOIRP. NASA participated in unplanned obsolescence as well as misunderstanding the value of its data. It picked data formats that were not common and overwrote tapes and data with newer data from newer projects. The digital lifespan of the data was far lower than anyone had expected.
I used to have an archive of programs on 5.25 inch floppies, then 3.5 inch floppies. Often, I would find the data on them corrupt or missing, even though I took extreme care. I also have two mag tapes I have not been able to read in decades. Even finding the equipment to read them would be difficult to do. How much other digital media exists that we no longer have the equipment to read? Many tape devices fall into this category, but so do other removable media. The xkcd comic hit the nail on the head: without upkeep, reading our older digital media will just not be possible. So, is the data lost? Well, if we capture it in time, no, but often we overlook getting spare systems or neglect to copy it to other media. Then, it is a struggle.
Is the cloud the solution to this problem? Perhaps, but the more important thing is noticing that keeping data available often requires migrating that data between disparate systems, such as from tape to disk and then to the cloud. Then, once data is in the cloud, will the tools that can read the data still exist in a few years? Don’t we need to bundle those tools with the data? For some data protection formats, this is a requirement. This is also the basis for Digital Vellum.
Digital Vellum, in short, is bundling data with the application for reading the data, and then keeping that application updated with patches and everything that is needed for it to continue to be able to read the data. The ideal way to make this work is to store our data in containers—executable containers that look surprisingly like tools from VMware (ThinApp), Microsoft (App-V), and perhaps even Docker. A tool plus data stored within the cloud and then elsewhere would make for a way to recover data quickly, with no need to reverse engineer as was required by the LOIRP project.
How does this impact us today? As we think through the process of continuous integration and continuous delivery, we should also consider continuous data protection: the use of the same process we use to create our applications to also create data protection bundles. As we roll out more and more functionality, the tools to read and manipulate our data also change, so creating Digital Vellum containers at the same time as a form of data protection saves quite a bit of struggle during disaster recovery.
This would transform the data protection industry, as no longer would we care about the operating system, but solely about the data. We may even put an identity access manager around our data to prevent those who should not view the data from viewing it. We need to concentrate on our data, as it is ending up in myriad locations that are outside our control. As we deploy, that is the time to capture things. Sort of a branch off the normal CI/CD process.
Digital lifespan is a major concern for any data you wish to keep for long periods of time, such as your own lifetime. It ties directly into your digital life and potentially all the legal issues around said digital life. Digital Vellum tied to continuous delivery would provide availability of your data over a longer time frame. Our data protection tools should recognize the data and wrap the data in Digital Vellum to preserve it, thereby defeating unplanned obsolescence.
We seem to be far away from that goal, but software-defined storage solutions are tackling some of the same problems. Data transformation is crucial, and this type of data transformation would save many people and companies major issues. Just make sure your Digital Vellum copies are not stored in the same place where the data originally lives. It is a perfect job for multicloud scenarios.
What are your data protection goals? Would Digital Vellum save your personal data and your business data?