Are you using or considering implementation of a storage hypervisor?

StorageNetworkingBy Greg Schulz, Server and StorageIO @storageio

Depending upon what your or somebody else’s definition of a storage hypervisor is, you may or may not be using one or realize it.

If your view of a storage hypervisor is a storage IO optimization technology to address performance and other issues with virtual machines (VMs) and their hypervisors, such as Virsto or ScaleIO along with others, you might be calling those storage hypervisors as opposed to middleware, management tools, drivers, plug-in, shims, accelerators, or optimizers.

On the other hand if you are using a storage solution that supports multi-tenant capabilities such as those from EMC, HP 3PAR, IBM or NetApp (MultiStore), you may or may not be using a storage hypervisor.  If your view of storage hypervisors and storage virtualization is based on hardware that can support third party storage arrays or JBOD shelves then solutions from EMC (VMAX and ATMOS), HDS, IBM and NetApp among others are storage hypervisors given their virtualization abstraction capabilities?

On the other hand, if you are using tin wrapped software on an appliance-based technology that abstracts underlying storage, you might be using a storage hypervisor. Tin wrapped software refers to software that is architecturally independent of underlying hardware and that is made available as an appliance on a physical server with storage. One of the value benefits is that for organizations who have a different set of purchasing or acquisition requirements for software vs. hardware, the software as an appliance may count as hardware. Another benefit of tin wrapped software is the pre-configuration, setup and somebody else does testing or systems integration.

Some tin wrapped software or appliances can architecturally support different vendors underlying storage technology, however for sales and marketing purposes may be limited to single vendor’s solution while others can be more open.  Storage options can include vendor specific, or third party storage systems or JBOD attached via iSCSI, SAS, SCSI_FCP (e.g. Fibre Channel or FC), FCoE, InfiniBand SRP (SCSI Remote Protocol) or NFS and CIFS NAS. There are many examples of tin wrapped software more commonly known as appliances including Actifio, EMC DLm, Vplex, ATMOS and Recoverpoint, FalconStor, Inmage, and NetApp StorageGrid among others. Let us not forget about physical server or virtual machine based volume managers that can also virtualize or abstract underlying storage resources.

What if you are using a virtual storage array (VSA) like VMware vSphere Storage Appliance, or HP (LeftHand), NetApp Edge (Ontap) among many others? Alternatively, how about storage systems that also support virtual machines such as Pivot3, Simplivity or Nutanix among others, then you to may be using a storage hypervisor.

How about software defined storage (SDS) the up and coming buzzword bandwagon description to play off software-defined networking (SDN), would those solutions or services be a storage hypervisor?

Granted storage hypervisor is a newer, trendier, cooler name and term for what some might see as old, tired and boring (e.g. storage virtualization or virtual storage). On the other hand, if storage hypervisor is not new enough, then feel free to jump on the software defined storage (SDS) bandwagon. In other words, what some are calling storage hypervisors have been called storage virtualization (figure 1), virtual storage, and management tools among others by others for several years if not decades now.

Storage Hypervisor: Many Faces

Storage virtualization, virtual storage and storage hypervisors for that matter similar to server hypervisors and virtualization can be implemented in different ways, in various locations. They also provide abstraction of physical resources along with emulation as basic tenants that can then be used for enabling aggregation (consolidation or pooling), agility and flexibility, BC and DR, replication, snapshots and other data services.

Underlying storage based on specific vendor implementation might be heterogeneous (any vendors products) including JBOD or RAID based systems, or homogenous (specific vendors’ products). Physical storage can be internal dedicated direct attached (DAS), external dedicated or shared DAS, along with shared iSCSI, SAS, FC, FCoE or IBA SRP for block based, or NAS file or object accessed with SSD, HDD or in some cases PCIe flash cards, tape and cloud services such as Amazon S3 or Glacier, HP, Rackspace among others.

Functionality can also vary with some solutions being very thin or light relying on leveraging underlying storage systems functionality such as EMC Vplex or they can be rich and deep essentially being able to eliminate or give same capabilities of a traditional storage system (e.g. Actifio, Falconstor, IBM SVC and Datacore among others).

In addition to being implemented in different locations, storage virtualization, or storage hypervisors can be in-band, out of band, or fast path control path. In-band architectures are as their name implies, the solution (e.g. software or appliance or storage system) sit between the applications servers and the underlying storage in the data stream. Most solutions are in-band based with some fast path control path. In-band sits in the data path adding some overhead in exchange for extra functionality. Some vendors might claim that there is no overhead or added latency or performance impact, however I tend to believe those who say or claim minimal or no perceived impact.

Out of band as its name implies relies on an additional appliance and or drivers and agents installed on application servers. Another variation is fast path control path that provides a balance between minimizing impact to the data stream getting involved only when needed, then getting out of the way when not needed. In-band based architectures are more common as they are relatively easier to implement, where fast-path control path have been slower to evolve given dependencies on switches and other technology. Fast-path control path solutions tend to be more focused on adding value and leveraging underlying storage systems capabilities such as EMC Vplex vs. in-band systems that can be used to replace storage systems.

Keep in mind, feature, functionality, and vendor specific capabilities aside and preferences aside, the basic fundamental ability of virtualization is around abstraction and emulation. From those two basic capabilities are built the ability to combine, add more feature and functionalities, and be deployed in different places for various reasons (consolidation, mask hardware complexities, flexibility, interoperability, improved hardware use).  In other words, the basic and fundamental capabilities often discussed as features or benefits of virtualization (server, storage, networking, IO and desktop) tie back abstraction and emulation.

Both server and storage virtualization are moving into life beyond consolidation where the focus expands from consolidation and pooling, to enabling flexibility for different benefits. In the case of storage virtualization a decade ago, a primary value proposition of proponents was around LUN or volume pooling or using commodity hardware. While there have been customer success stories, many have found the increased flexibility of using an abstraction (e.g. virtualization or storage hypervisor) layer across homogeneous (same vendor or make and model) or heterogeneous (different vendors makes and models) storage systems.

This gets back to what was mentioned earlier, if you view of a storage hypervisor or storage virtualization solution is software based then will probably gauge success based on those solutions. On the other hand, if your view of storage virtualization and storage hypervisors is more pragmatic to include homogeneous solutions using either hardware or software, then you will see a different set of success stories.

Here is my point, there is a lot of virtual storage marketing hype around what is or is not a storage hypervisor that is similar to what has occurred in the past for what is or is not storage virtualization. Consequently, depending on your preferences, sphere of influence, or who you listen to or believe, sell or promote what is or is not a storage hypervisor will vary the same as what is or is not storage virtualization.

Now if you are a vendor, var, pundit, surrogate or anybody else who is not feeling comfortable right about now, relax for a moment. Take a deep breath, count to some number (you decide). Then continue reading before you dispatch your truth squads to set me straight for raining on your software defined virtual storage hypervisor parade ;).

There are many forms of storage virtualization, including aggregation or pooling, emulation, and abstraction of different tiers of physical storage providing transparency of physical resources. Storage virtualization can be found in different locations (figure 1) such as server software bases, network, or fabric, using appliances, routers, or blades, with software in switches or switching directors. Storage virtualization (excuse me, storage hypervisor) functionality can also be found running as software on application servers (physical machines or virtual machines) or operating systems, in network based appliances, switchers, or routers, as well as in storage systems.

Various storage virtualization services are implemented in different locations to support various tasks. Storage virtualization functionality includes pooling or aggregation for both block and file-based storage, virtual tape libraries for coexistence and interoperability with existing IT hardware and software resources, global or virtual file systems, transparent data migration of data for technology upgrades, maintenance, and support for high availability, business continuance, and disaster recovery.

Storage virtualization functionalities include:

  • Pooling or aggregation of storage capacity
  • Transparency or abstraction of underlying technologies
  • Agility or flexibility for load balancing and storage tiering
  • Automated data movement or migration for upgrades or consolidation
  • Heterogeneous snapshots and replication on a local or wide area basis
  • Thin and dynamic provisioning across storage tiers
  • And many others

Aggregation and pooling for consolidation of LUNs, file systems, and volume pooling and associated management are intended to increase capacity use and investment protection, including supporting heterogeneous data management across different tiers, categories, and price bands of storage from various vendors. Given the focus on consolidation of storage and other IT resources along with continued technology maturity, more aggregation and pooling solutions can be expected to be deployed as storage virtualization matures.

While aggregation and pooling are growing in popularity in terms of deployment, most current storage virtualization solutions are forms of abstraction. Abstraction and technology transparency include device emulation, interoperability, coexistence, backward compatibility, transitioning to new technology with transparent data movement, and migration and supporting HA, BC, and DR. Some other types of virtualization in the form of abstraction and transparency include heterogeneous data replication or mirroring (local and remote), snapshots, backup, data archiving, security, compliance, and application awareness.

This is not to say that there are not business cases for pooling or aggregating storage, rather that there are other areas where storage virtualization techniques and solutions can be applied. This is not that different from server virtualization expanding from a just around consolidation. The next wave (we are in it now) for server, storage and other forms of virtualization is life beyond consolidation where focus expands enablement and agility in addition to aggregation.

The best type of storage virtualization and the best place to have the functionality will depend on your preferences. The best solution and approach are the ones that enable flexibility, agility, and resiliency to coexist with or complement your environment and adapt to your needs. Your answer might be one that combines multiple approaches, as long as the solution that works for you and not the other way around.

How about volume managers and global name spaces or file systems? Again, IMHO yes, after all a common form of storage virtualization is volume managers that abstract physical storage from applications and file systems. This is about where some become as relaxed as a long tail cat next to a rocking chair (Google it if you do not know) when mentioning virtualization let alone storage hypervisor and volume managers, global name space or global file systems. In addition to providing abstraction of different types, categories, and vendors’ storage technologies, volume managers can also be used to support aggregation, performance optimization, and most other functions commonly found in storage or virtual storage appliances or systems. For example, volume managers can aggregate multiple types of storage into a single large logical volume group that is subdivided into smaller logical volumes for file systems.

In addition to aggregating physical storage, volume managers can do RAID mirroring or striping for availability and performance. Volume managers also give a layer of abstraction to allow different types of physical storage to be added and removed for maintenance and upgrades without impacting applications or file systems. Common management functions supported by volume managers include storage allocation, provisioning, and data protection operations, such as snapshots and replication all of which vary by specific vendor implementation. File systems, including clustered and distributed systems, can be built on top of or in conjunction with volume managers to support scaling of performance, availability, and capacity.

What about Open Stack swift or other cloud and object storage software, can they qualify as being storage hypervisors? My guess is that if you are looking at things from the perspective of server virtualization, or a VMware or Microsoft Hyper-V, KVM or Xen viewpoint, the cloud and object solutions might not quality for storage hypervisor hype status. However keep in mind that most of these solutions support various vendors underlying hardware (software, servers, storage) while providing an abstraction layer, adding capabilities or complimenting those enabled by the underlying technology. Thus IMHO given what some have stretched to be called or included as storage hypervisors, virtual storage or storage virtualization, sure, the cloud and object solutions easily can qualify. Examples include Open Stack Swift, Bash Riak CS, EMC ATMOS, which is now available as a VSA, as well as supporting external third-party storage, Ceph and Cleversafe among many others based on their ability to abstract, emulate and enable added services.

How about gateways including disk and tape libraries that can support and virtualize different types of local and remote cloud services storage while emulating different devices? Sure. If they are providing some emulation, abstraction and more capabilities, either running on a dedicated server (as an appliance or tin wrapped software), or in a virtual machine or as a VSA, why not. For example the Amazon AWS Cloud gateway, or those from Zadara, EMC Cloud Tiering Appliance, Gladient, Avere, Microsoft Storsimple, Nasuni, or Twinstrata among others, not to mention VTLs and data protection appliances from Actifio, EMC, IBM, HP, Falconstor, Fujitsu and Quantum among others.

What this all means

Keep in mind that there are many different types of storage devices, systems, and solutions for addressing various needs, with varying performance, availability, capacity, energy, and economic characteristics. Likewise different tools, technicalities and approaches for abstracting storage resources for various purposes from emulation to agility to consolidation to interoperability among others.

Rest assured there is plenty of hype around storage hypervisors including what is or is not along with storage virtualization and virtual storage. However, there is also plenty of reality in and around storage hypervisors, virtual storage and storage virtualization all of which can be used for different tasks, implemented in various ways with diverse feature, functionality and architectures.

Storage virtualization (and storage hypervisor) considerations include:

  • What are the various application requirements and needs?
  • Will it be used for consolidation or facilitating IT resource management?
  • What other technologies are  in place or planned?
  • What are the scaling (performance, capacity, availability) needs?
  • Will the point of vendor lock-in be shifting or costs increasing?
  • What are some alternative and applicable approaches?
  • How will a solution scale with stability (performance, availability, capacity)?
  • What is the management tools, along with plug-in and feeds for other tools?
  • What are the hardware and software dependencies, service and support?
  • Who is called and will answer the phone when something breaks?
  • What APIs and interfaces are supported (VAAI, VASA, VADP, ODX, CDMI)
  • Are any drivers or other software required on servers accessing the storage?
  • What does the vendor interoperability certified tested support matrix look like?

The above is not an exhaustive list, rather things to think about and consider which will vary based on your needs, preferences and wants. Take a step back and understand what it is that you need (requirements) vs. what you want (like to have or preferences) to meet different scenarios. Some of the tools and technologies mentioned among their other peers have the flexibility to be used in different ways, however just because something can be stretched or configured to do a task may not mean it is the best for a given situation. In addition, when it comes to storage hypervisors, virtual storage and storage virtualization, look for solutions that work for you, vs. you having to work for them. This means that they should remove or mask complexity instead of adding more layers to manage and take care of.

Storage Hypervisor Wrap-up

In the end, there is virtual storage, storage virtualization, storage hypervisors, hyped storage, along with the growing trend to software define everything as part of playing buzzword bingo.

Wonder if 2013 will be the year of software defined virtual storage hypervisor with cloud (public, private and hybrid) multi-tenant capabilities. In the meantime, welcome to the wide world of storage hypervisors, virtual storage, storage virtualization and related topics, themes, technologies and services for cloud, virtual and data storage networking environments.

Ok, nuff said (for now).

Cloud Security: On Moats

VirtualizationSecurityAfter a recent snowstorm, and due to pending work on our generator, I had to dig out paths to the generator, the propane tank, etc. We normally dig out a few paths for moving wood around our yard, access to oil, the driveway, etc. But when we finished, we dug a moat around our entire house. This got me thinking about cloud security. The ongoing desire to put moats between us and the attackers. But what is us, in the cloud? Can we prevent the attacks? What are the current moat style technologies in play today? Continue reading Cloud Security: On Moats

Nivio – Has Someone got Desktop-As-A-Service for Small Business Right?

niviologoniviologoNivio have announced a DaaS solution aimed at SME space. Offering access to Microsoft Windows on any device, rentable applications, and data storage in the cloud, it sounds as if Nivio’s service could be just the ticket for the tablet wielding, dead-PC shunning organisations with a workforce who have their own devices, and need to team collaboration with access to Windows based applications.

The thing is, this road has been trodden before: it is a rocky one. OnLive attempted to offer a solution and failed. Even Desktone had a strategy that attempted to directly appeal to this segment but found the return on effort too miserly.

Yet, Nivio have created a service offering delivering Windows applications to Windows, Mac, iOS and Android devices. A web service providing common file storage to store user and group files for that can be syncronised to devices to work offline for editing directly, or automatically made available within the public cloud hosted Windows desktop service. A desktop service that has an on-demand, rentable application interface. User management is in your own hands. While Nivio are targeting their market at the 20-50 user sized organisation space which suggests small business, Nivio are getting a number of calls from project teams in larger organisations.

What are Nivio doing that is different? Will this model be successful? What, if anything, can be learned by other DaaS providers, and what in turn could be learned by Nivio?

Continue reading Nivio – Has Someone got Desktop-As-A-Service for Small Business Right?

Is Amazon Ruining Public Cloud Computing?

CloudComputingHere is an interesting question. How can the undisputed leader in a category, who is experiencing rapid growth, also be guilty of some combination of neglect and arrogance that may damage the reputation and therefore the future success of the category in its entirety? First of all, the details. On Monday night (Christmas Eve) starting at around 3:30 PM US Eastern Time, applications using the Elastic Load Balancing Service (EBS) at Amazon’s US East data center in Virginia experienced outages. Those applications included Netflix, Scope, and the PaaS cloud Heroku.

Amazon’s Position in the Public Cloud Computing Market

The Wall Street Journal quoted some research from Baird Equity Research that said estimated the AWS contributed $1.5B in revenue to Amazon this year, about triple what it contributed in 2010, and Baird further estimated that AWS revenue will double to $3B in two years. Although comparable numbers for other public cloud computing vendors are hard to come by, these numbers arguably make AWS into both the revenue share and unit share leader of the public cloud computing market. Netflix is quoted in the same WSJ article that it relies upon AWS for 95% of it needs for computation and cloud storage. It has been separately reported that Netflix runs over 5,000 concurrent Amazon images in various Amazon data centers. Other high profile online web properties like Foursquare, Pinterest, and Scope also apparently rely either heavily or exclusively upon AWS.

So we have a very interesting situation. We have a vendor, Amazon, whose service is so flexible and affordable that putting tactical workloads that do not need constant availability and constant excellent response time on that service is nearly a no-brainer. And we have companies whose very revenue and existence depends upon continuous availability and excellent user experience relying almost exclusively upon this service.

Amazon’s SLA

These issues need to be looked at in light of Amazon’s SLA. Amazon’s SLA was last updated in October of 2008 (which in and of itself indicates a problem), and states “AWS will use commercially reasonable efforts to make Amazon EC2 available with an Annual Uptime Percentage (defined below) of at least 99.95% during the Service Year“. Let’s analyze this SLA in light of the Christmas Eve outage:

  • Amazon states that it will use “reasonable commercial efforts” to meet this SLA. That give Amazon an escape for any outage. Amazon can simply say that it used reasonable commercial efforts and the outage happened anyway, so tough luck. It is not known whether or not Amazon has ever invoked this excuse to avoid giving service credits, but the escape clause exists.
  • Amazon states that it will provide 99.95% up time for a calendar year. That allows for (1-.9995)*365*24 or 4.38 hours of downtime in a year. The Christmas Eve outage apparently lasted a day and a half (36 hours). So we have to assume Netflix and other customers got some service credits. But obviously, the value of those credits pale in comparison to the damage in terms of revenue and reputation that occurred to Netflix and other online properties.

However the fact that your service can be down for 4.38 hours a year on Amazon and that Amazon stays within its SLA under these circumstances is not the real problem. The real problem is that Amazon has no SLA for performance. So Amazon can be up, but if resource contention of any kind in the Amazon infrastructure is at fault for the poor response time of an application running in the Amazon cloud, Amazon entirely washes its hands of any responsibility on that front.

Customer Reaction to Amazon Outages

The same WSJ articles that reported on the outages also reported that Amazon customers like Scope whose CEO was quoted as saying “I am looking into what options I have” are clearly looking to insulate themselves from the impact of Amazon outages upon their businesses. This is where the potential damage to Amazon in particular and public cloud computing in general starts to get real. At the other end of the spectrum from running in the Amazon cloud lies the option of standing up your own data center and taking control of your operational reliability and performance into your own hands.  Many enterprises already pursue a strategy of “develop and test and Amazon, and then deploy internally”. In support of this approach Hotlink offers a management solution that allows for the seamless management of instances across VMware, Hyper-V and Amazon, and the seamless migration of instances between the three environments.

There is one other customer reaction to these outages which is even more dangerous to public cloud computing. That reaction on the part of the customer is to assume that it is the customer’s responsibility to code around the unreliability in the Amazon infrastructure. In the Netflix blog “Chaos Monkey Released Into The Wild“, Netflix chronicles how it tries to make its code resilient to failure, and how it has written a “Chaos Monkey” whose job it is to randomly take down individual Netflix services to ensure that the entire Netflix service is not vulnerable to any single point of failure. This same blog speculates that what Netflix really needs is a “Zone Monkey” that takes down an entire Netflix instance on an Amazon Zone and make sure that an entire Zone failure is a recoverable event (which it was not on Christmas Eve).

Public Cloud Computing Reliability is Not the Customer’s Problem

This is where Amazon’s apparent approach to reliability and performance endangers the whole notion of public cloud computing. Imagine if your electricity company said that it was up to you to buy a generator to cover your needs for electricity if the power went out. Imagine if your water utility said that it was up to you to keep a water tank in your back yard, in case the water supply went out. This entire idea that the vendor of the service does not stand behind the availability and quality of that service (as evidenced in Amazon’s worthless SLA), and that this it is somehow the customer’s responsibility to code and or design around the vagaries of the public cloud infrastructure is wrong and dangerous to the future of public cloud computing.

It is wrong and dangerous to the future of public cloud computing because it is going to create the perception in the minds of enterprise customers (who are somewhat skeptical of running important applications in public clouds anyway) that public clouds are not to be trusted with important workloads. Since Amazon is the high profile market leader that it is in the public cloud market, Amazon’s failures to step up with a quality SLA is going to damage not just Amazon, but the entire notion of public cloud computing. The fact that a vendor like Virtustream offers a response time based SLA for SAP running in its cloud is just not going to matter if Amazon ruins the reputation of the entire public cloud computing concept.

Update – Amazon Explanation and Apology

On its blog, Amazon has issued an explanation and apology for the December 24 2012 ELB Service Event. The upshot is that a developer deleted state data from production servers thinking that he was only deleting it from non-production servers. Amazon has admitted that this occurred because of a flaw in their change management procedures (they did not require change management approval prior to the incident and now do), and have apologized for the mistake. This leaves Amazon struggling with the tradeoff between agility and change management just like many enterprises do, and also does not resolve the issue of the lack of a truly useful and meaningful SLA.


The Christmas Eve Amazon outage that resulted in Netflix being unavailable for 36 hours results from an unacceptable attitude on the part of Amazon towards reliability and performance. Unless Amazon steps up to the plate with a meaningful SLA, Amazon risks damaging its own growth, and the entire concept of public cloud computing.