At VMworld 2012, VMware presented a Tech Preview of one of their latest ideas, which they termed “Distributed Storage”. So what exactly is this new technology? It is basically locally attached storage.
How is Distributed Storage new?
We know that VMware has allowed the use of local storage since inception; however, access to that storage was limited to the local host only, so it was only utilised for guests that were node-based, e.g., vShield appliances. To delve a little deeper you could use the analogy of a VSA (Virtual Storage Appliance), but unlike the VSA, which is a fully functioning virtual machine, this technology is actually part of the hypervisor.
So as already alluded too, what are we seeing here is VMware is adding a storage layer into the hypervisor and doing for locally attached storage what they did for virtual switches, creating a distributed layer of local storage that can be accessed from any host in a cluster as an attached datastore; this is even true if the host itself does not have any locally attached storage.
Where is the magic sauce?
Yes, VMware can do something similar now with the VMware Storage Appliance, but it is an appliance that will take up resources that could better be utilized by production-based VMs rather than an infrastructure-based VM.
Adding it to the hypervisor level allows the creation of a single aggregated datastore from the local storage that will span all hosts in the cluster, including those hosts that do not even have local storage. It can include SSD and traditional HDDs, and it will support VMware’s new policy-driven storage technology too, so this datastore is effectively tiered, as well.
OK, what about resilience?
That, too, is handled, as each VM is replicated to another host; the HyperVisor notes the location of the replicated machine to enable failover on a host outage.
How will the storage vendors view this?
OK, this seems like a nice-to-have but what about EMC, NetApp, and the others? Won’t they have their noses put out-of-joint by this?
Not really; the maximum size that this datastore could possibly be is 64TB, this being the maximum size of a VMFS5 partition. Remembering that is distributed, shared, and subject to a maximum cluster size of 32 hosts, this means that we are talking a maximum of 2TB of local storage per host. Now local storage capacity could be increased by lowering cluster size, but this will not increase the total capacity of 64TB and will reduce the number of hosts, resulting in greater guest density and less agility.
Another thing to remember is that even though the datastore size can be a maximum of 64TB, each guest is replicated to another host’s storage for resilience, so maximum capacity is 32TB usable, and a further limitation is that only one datastore can be created per cluster.
This will not sound the death knell of the SAN or NAS. The technology’s current limitation of a single datastore per cluster and the prevalence of big data applications moving to the cloud or even traditional virtualized environments will ensure that.
So what is the use case for this technology?
I personally see this being used for very high performance guests as the storage is local to the host or maybe VDI and Cloud vDCs, as each datastore could be an effective customer boundary. Other uses could include Hadoop nodes, database nodes, etc., where the OS and application stacks are run off the distributed storage and the data stored on a traditional SAN or NAS datastore.
Personally I think at this is a very exciting technology that I can already see many use cases for.