Many times we virtualization experts push for backups without the agents as these backups tend to be in our opinion, cleaner and faster. But what if you could get the benefits of your existing backup tools (such as Tivoli) but gain the power and advantages of using all the possibilities within the virtual environment. For VMware vSphere this is possible using the Pancetera backup tools.Current backup agents must be installed inside the VMs you wish to backup. For full backups, they backup the entire VMs used disk space making use of Microsoft VSS drivers to shadow copy important application data such as MSSQL, Exchange, etc. VSS makes it easier to backup the bits of in-use application data without the nasty effect of having crash consistent data. Crash consistent data is data that may or may not be complete due to the fact that not all the data could be written to disk from memory. VSS ensures such data is written to disk and the data is safe. So the steps are:
- Initiate VSS for specific application
- Transmit all used disk data to the backup server (for interval backups on the changed files since the last backup) This is all used blocks of a file or disk.
- Stop VSS for the specific application
Virtualization backup tools on the other hand use the following steps:
- Communicate with the VM to initiate VSS if necessary
- Create a virtual disk snapshot (now all new block changes are written to the snapshot file)
- Mount the virtual disk to a VM using the vStorage API (if using VMware ESX)
- Use virtualization aware backup software to transmit the blocks of the virtual disk to the backup server taking into account change block tracking, active block management, and source deduplication.
- Unmount the virtual disk
- Commit the Snapshot
- Communicate with the VM to stop VSS if necessary
It looks like a virtualization backup is more complex than a standard backup but the major difference is in Step 4 above. Step 4 is where the backup magic happens within the virtual environment. The technologies used in Step 4 are there to reduce the overall network utilization of a backup.
Change Block Tracking (CBT)
CBT will allow a backup tool to only transmit those blocks of a file that have actually changed. The hypervisor knows this as it is the one controlling which blocks get updated when the virtual disk is written and is a feature of snapshot creation. CBT reduces the overall transmittal to only those blocks that change.
Active Block Management (ABM)
ABM takes CBT one step further and looks within the VM to the actual filesystem to determine if the changed block is an in use block. If the block is in use and not a zero block, ABM allows the block to be backed up. ABM further reduces the overall transmittal of data to only those changed blocks actually in use.
Source deduplication, removes duplicate blocks from the overall transmittal of data. Source deduplication can do this for a single VM or multiple VMs at the same time. The idea behind multiple VMs is that many VMs have the same data written to the disk and you can further reduce the transmittal of data by removing duplicate blocks.
In essence, Step 4 is all about reduction in the amount of data you transmit during a backup so that the backup window can be as small as possible. When you have 1000s of VMs, backing up all those VMs within a single backup window may now be possible, where in the past this was not possible due to the share volume of data. However, nearly everyone has an investment in some form of existing enterprise class backup solution.
Until Pancetera became available, there were only three methods to hook an existing backup tool into virtualization:
- Use per VM backup agents
- Use VMware Consolidated Backup to proxy the virtual disk to the backup software
- Use Virtualization Specific Backup tool to get the backup to a share from which the data can be backed up using the backup software. A disk to disk to tape approach.
- Use a legacy tool version with virtualization specific integrations.
Into this mix of options comes the Pancetera solution. It does all the goodness of CBT, ABM, and source deduplication but presents this data to the backup engines as a share (either NFS or CIFS), there by better integrating legacy backup tools into the environment. So how is this different?
- Using per VM backup agents is slow and expensive, but may be necessary depending on the application and what the application supports. Oracle is a case in point.
- Using VMware Consolidated Backup only works for VMware ESX or ESXi and proxies the disk so none of the data reduction tools are available.
- Using Virtualization Specific Backup tools requires a location to store the backups and they are generally in the Virtualization Specific backup software’s format. So using your legacy tools to restore files implies you need to first restore the Virtualization Specific Backup tools backup, then restore the file.
- Using a legacy tool with virtualization specific integrations could be an expensive option if you already have a different legacy tools. Symantec NetBackup 7 is an example of this type of tool.
Pancetera takes option 3 and removes the intermediary data storage and allows the legacy tools to see the contents of the virtual disk while allowing for CBT, ABM, and Source Deduplication. IN essence lets you go from Disk to Tape via the Pancetera appliance with no intermediary storage requirements.
Nearly all Virtualization Specific Backup tools support CBT, ABM, and Source Deduplication, but Pancetera is the only one with a unique presentation method for use by legacy tools which will still make use of your legacy backup tools licenses. Pancetera costs less than new licenses when compared to IBM Tivoli Backup tool suite.
Available Tools for Virtualization Backup and their options as discussed in this article (we know there are many more options available):
|Tool||CBT||ABM||Source Deduplication||Legacy Support||File Level Restore||Backup Testing|
|Symantec NetBackup 7|