Is Amazon Ruining Public Cloud Computing?

CloudComputingHere is an interesting question. How can the undisputed leader in a category, who is experiencing rapid growth, also be guilty of some combination of neglect and arrogance that may damage the reputation and therefore the future success of the category in its entirety? First of all, the details. On Monday night (Christmas Eve) starting at around 3:30 PM US Eastern Time, applications using the Elastic Load Balancing Service (EBS) at Amazon’s US East data center in Virginia experienced outages. Those applications included Netflix, Scope, and the PaaS cloud Heroku.

Amazon’s Position in the Public Cloud Computing Market

The Wall Street Journal quoted some research from Baird Equity Research that said estimated the AWS contributed $1.5B in revenue to Amazon this year, about triple what it contributed in 2010, and Baird further estimated that AWS revenue will double to $3B in two years. Although comparable numbers for other public cloud computing vendors are hard to come by, these numbers arguably make AWS into both the revenue share and unit share leader of the public cloud computing market. Netflix is quoted in the same WSJ article that it relies upon AWS for 95% of it needs for computation and cloud storage. It has been separately reported that Netflix runs over 5,000 concurrent Amazon images in various Amazon data centers. Other high profile online web properties like Foursquare, Pinterest, and Scope also apparently rely either heavily or exclusively upon AWS.

So we have a very interesting situation. We have a vendor, Amazon, whose service is so flexible and affordable that putting tactical workloads that do not need constant availability and constant excellent response time on that service is nearly a no-brainer. And we have companies whose very revenue and existence depends upon continuous availability and excellent user experience relying almost exclusively upon this service.

Amazon’s SLA

These issues need to be looked at in light of Amazon’s SLA. Amazon’s SLA was last updated in October of 2008 (which in and of itself indicates a problem), and states “AWS will use commercially reasonable efforts to make Amazon EC2 available with an Annual Uptime Percentage (defined below) of at least 99.95% during the Service Year“. Let’s analyze this SLA in light of the Christmas Eve outage:

  • Amazon states that it will use “reasonable commercial efforts” to meet this SLA. That give Amazon an escape for any outage. Amazon can simply say that it used reasonable commercial efforts and the outage happened anyway, so tough luck. It is not known whether or not Amazon has ever invoked this excuse to avoid giving service credits, but the escape clause exists.
  • Amazon states that it will provide 99.95% up time for a calendar year. That allows for (1-.9995)*365*24 or 4.38 hours of downtime in a year. The Christmas Eve outage apparently lasted a day and a half (36 hours). So we have to assume Netflix and other customers got some service credits. But obviously, the value of those credits pale in comparison to the damage in terms of revenue and reputation that occurred to Netflix and other online properties.

However the fact that your service can be down for 4.38 hours a year on Amazon and that Amazon stays within its SLA under these circumstances is not the real problem. The real problem is that Amazon has no SLA for performance. So Amazon can be up, but if resource contention of any kind in the Amazon infrastructure is at fault for the poor response time of an application running in the Amazon cloud, Amazon entirely washes its hands of any responsibility on that front.

Customer Reaction to Amazon Outages

The same WSJ articles that reported on the outages also reported that Amazon customers like Scope whose CEO was quoted as saying “I am looking into what options I have” are clearly looking to insulate themselves from the impact of Amazon outages upon their businesses. This is where the potential damage to Amazon in particular and public cloud computing in general starts to get real. At the other end of the spectrum from running in the Amazon cloud lies the option of standing up your own data center and taking control of your operational reliability and performance into your own hands.  Many enterprises already pursue a strategy of “develop and test and Amazon, and then deploy internally”. In support of this approach Hotlink offers a management solution that allows for the seamless management of instances across VMware, Hyper-V and Amazon, and the seamless migration of instances between the three environments.

There is one other customer reaction to these outages which is even more dangerous to public cloud computing. That reaction on the part of the customer is to assume that it is the customer’s responsibility to code around the unreliability in the Amazon infrastructure. In the Netflix blog “Chaos Monkey Released Into The Wild“, Netflix chronicles how it tries to make its code resilient to failure, and how it has written a “Chaos Monkey” whose job it is to randomly take down individual Netflix services to ensure that the entire Netflix service is not vulnerable to any single point of failure. This same blog speculates that what Netflix really needs is a “Zone Monkey” that takes down an entire Netflix instance on an Amazon Zone and make sure that an entire Zone failure is a recoverable event (which it was not on Christmas Eve).

Public Cloud Computing Reliability is Not the Customer’s Problem

This is where Amazon’s apparent approach to reliability and performance endangers the whole notion of public cloud computing. Imagine if your electricity company said that it was up to you to buy a generator to cover your needs for electricity if the power went out. Imagine if your water utility said that it was up to you to keep a water tank in your back yard, in case the water supply went out. This entire idea that the vendor of the service does not stand behind the availability and quality of that service (as evidenced in Amazon’s worthless SLA), and that this it is somehow the customer’s responsibility to code and or design around the vagaries of the public cloud infrastructure is wrong and dangerous to the future of public cloud computing.

It is wrong and dangerous to the future of public cloud computing because it is going to create the perception in the minds of enterprise customers (who are somewhat skeptical of running important applications in public clouds anyway) that public clouds are not to be trusted with important workloads. Since Amazon is the high profile market leader that it is in the public cloud market, Amazon’s failures to step up with a quality SLA is going to damage not just Amazon, but the entire notion of public cloud computing. The fact that a vendor like Virtustream offers a response time based SLA for SAP running in its cloud is just not going to matter if Amazon ruins the reputation of the entire public cloud computing concept.

Update – Amazon Explanation and Apology

On its blog, Amazon has issued an explanation and apology for the December 24 2012 ELB Service Event. The upshot is that a developer deleted state data from production servers thinking that he was only deleting it from non-production servers. Amazon has admitted that this occurred because of a flaw in their change management procedures (they did not require change management approval prior to the incident and now do), and have apologized for the mistake. This leaves Amazon struggling with the tradeoff between agility and change management just like many enterprises do, and also does not resolve the issue of the lack of a truly useful and meaningful SLA.


The Christmas Eve Amazon outage that resulted in Netflix being unavailable for 36 hours results from an unacceptable attitude on the part of Amazon towards reliability and performance. Unless Amazon steps up to the plate with a meaningful SLA, Amazon risks damaging its own growth, and the entire concept of public cloud computing.

Greenbytes Addresses VDI IO Without Changing Your Storage

StorageNetworkingParticipate in any virtual desktop design session and you will know that the discussion almost always moves immediately to how many IOPS per virtual desktop session should be expected. More often than not, the leader of these conversations will answer “it depends”. This is a statement that does not give most end users a warm a fuzzy feeling because it usually comes with a pretty heavy storage price tag. Unfortunately, there are many factors that affect overall performance. Within the virtual desktop session, the number and type of applications you have running, the layers of security configuration and policy that are applied, and how you are handling user personalization have an impact on IOPS. Many of these challenges can be addressed by applying good standard virtual desktop practices, which are often different from the way physical desktops are traditionally architected. Continue reading Greenbytes Addresses VDI IO Without Changing Your Storage

Change: Moving to the Cloud

CloudComputingThe Virtualization Practice will be moving from our internal virtual environment and cloud configuration to an external hosted cloud configuration, at least temporarily. However, what we have found is that not all clouds are alike (we all knew that), and that some of our processes were not cloud friendly but what does it mean for moving to the cloud? How do we manage our change to these processes as we move to the cloud? Continue reading Change: Moving to the Cloud

Big Data Operations Management

PerformanceManagementVirtualization and cloud computing are not just innovations that require the support of new environments in existing operations management solutions. Instead, virtualized and cloud based environments are so different from their predecessors that an entirely new management stack will have to be built in order to effectively manage these environments. This new stack will be so different that it will replace, instead of augment the legacy/incumbent management stacks from legacy vendors. This ushers in the era of Big Data Operations Management. Continue reading Big Data Operations Management

News: Red Hat Acquires ManageIQ – Another Cloud Management Acquisition

CloudComputingOn December 20th 2012, Red Hat has announced the it has entered into a definitive agreement to acquire ManageIQ. This move has broad ramifications for the virtualization platform business and for the management software business.

Recent Cloud Management Acquisitions

The cloud management space certainly has been a hotbed of acquisition activity recently. Recent deals include:

  • VMware’s acquisition of DynamicOps and the subsequent rebranding of DynamicOps as vCloud Automation Center, and the inclusion of vCloud Automation Center in the Enterprise Edition of the vCloud Suite.
  • Cisco’s acquisition of Cloupia
  • Dell’s acquisition of Gale Technologies
  • And now Red Hat’s acquisition of ManageIQ

Clearly it has suddenly become important for a lot of large companies to own a viable cloud management software player and to have a viable cloud management offering. So what is so strategic about cloud management? The answer is that the definition and value of cloud management has undergone a subtle but profound transformation in the last 24 months. Two years ago cloud was all about self-service. Private cloud was all about letting an IT department put up its own competitor to Amazon EC2 so that they could stop leaking transient workloads to out an out-sourced IT department.

Now cloud is all about automation. Automation of the entire lifecyle of the deployment and updating of every application that runs in the data center. Furthermore cloud is not just about the automation of that deployment and update cycle in the internal data center, it is about brokering that deployment across internal and external data centers as appropriate. So Cloud Management has become the crucial layer of software that allows an IT department to become that broker of services to their business constituencies.

Ramifications for the Virtualization Platform Vendors

When VMware acquired DynamicOps, renamed it vCloud Automation Center and then combined vSphere, vCloud Automation Center, vCenter Operations, and vFabric Application Director into the vCloud Suite, VMware ran a Microsoft Office play. The gist of that play is that if all you had was a word processor (WordPerfect) or a spreadsheet (Lotus), all you had was a feature and the vendor of the suite had a solution. This play created the inevitable set of questions at Microsoft and Red Hat. Those questions started with, are we serious about being a virtualization platform vendor? If so, are we then serious about competing head-to-head with the vCloud Suite from VMware? If so, where are our components that match up with the components in the VMware vCloud Suite?

Red Hat has now answered one of these questions. Red Hat’s answer to vCloud Automation Center is ManageIQ. Red Hat has at least two more important questions to answer. One being the answer to vCenter Operations Manager, and the other being the answer to vFabric Application Director. Finally, of course, an answer will also be required as to what Red Hat’s Software Defined Data Center strategy is.

A similar set of questions must now be directed at Microsoft who has neither a Software Defined Data Center strategy, nor another remotely approaching a vCloud Suite. It is highly ironic that at the exact time that Microsoft has arguable achieved parity at the core hypervisor level, VMware has shifted the debate to the SDDC and the vCloud Suite. One detects the fine hand of Paul Maritz’s strategic planning here, and hopes that it will not be missed as he joins the Pivotal Initiative.

Ramifications for the Legacy Management Software Vendors

One of the interesting things about the cloud management business was who these vendors competed with when they were selling their solutions to customers. All of the startups in the space (DynamicOps prior to being acquired by VMware, Embotics, Virtustream, Cloupia, ServiceMesh, FluidOps, and ManageIQ) regularly competed with VMware vCloud Director. After VMware completed the acquisition of DynamicOps, the most frequent competitor for everyone is now VMware vCloud Automation Center. Noticeably missing from most of these competitive situations were the big four – IBM, BMC, HP and CA. BMC was sometimes present in these situations, but was most often quickly rule out due to the complexity and high professional services footprint of its solution.

For the big four to not be present in a market dominated by startups is not a horrible problem. For the big four not to be present in a market where VMware, Dell, Cisco,  and Red Hat now all have compelling solutions means another hole in the side of an already sinking battleship.

Ramifications for the Remaining Cloud Management Vendors

While there are clearly now no shortage of large vendors from whom one can buy a first class cloud management solution the game is far from over for the startups. In particular there are four companies who have extremely compelling solutions each in their own right:

  •  Embotics has focused heavily and specialized in being able to get a customer up and running with a private cloud in less than one hour. This is extremely appealing to the SMB and SME markets where a services heavy footprint does not work either in terms of time to value or cost of success.
  • Virtustream is still the only cloud management vendor that can successfully virtualize SAP and provide the customer with a response time based SLA on such a business critical application.
  • ServiceMesh has kept a relatively low profile while building an impressive list of enterprise class customers.
  • FluidOps has pioneering functionality in the form of its Landscapes that allow for complex multi-tier application systems (like SAP) to be encapsulated, deployed and managed as an entitiy.

The Red Hat Announcement

RALEIGH, N.C. – December 20, 2012 – Red Hat, Inc. (NYSE: RHT), the world’s leading provider of open source solutions, today announced that it has entered into a definitive agreement to acquire ManageIQ, a leading provider of enterprise cloud management and automation solutions that enable organizations to deploy, manage and optimize private clouds, virtualized infrastructures and virtual desktops. With the addition of ManageIQ technologies to its portfolio, Red Hat will expand the reach of its hybrid cloud management solutions for enterprises.

Red Hat has agreed to acquire ManageIQ, a privately-held company, for approximately $104.0 million in cash. The closing of the transaction is subject to customary closing conditions, including approval by the stockholders of ManageIQ.

As an existing member of the Red Hat Enterprise Virtualization Certified Partner program, ManageIQ has worked closely with Red Hat to provide customers with unified monitoring, management and automation solutions that are quick-to-deploy and easy-to-use, which reduce the cost and complexity of enterprise clouds. ManageIQ’s Hybrid Cloud Operations Management technologies complement Red Hat’s existing cloud and virtualization management tools – Red Hat CloudForms and Red Hat Enterprise Virtualization – by providing integrated lifecycle management of activities such as server and storage provisioning, workload optimization, policy-based compliance, chargeback, virtual machine lifecycle management, discovery and control, and analytics across heterogeneous private clouds and virtualized datacenters. With the addition of ManageIQ, Red Hat’s open hybrid cloud management solutions will include:

  • Red Hat CloudForms: a hybrid cloud Infrastructure-as-a-Service (IaaS) solution that enables the management, brokering, and aggregation of capacity across various virtualization and cloud providers as well as the management of applications across hybrid clouds.
  • Red Hat Enterprise Virtualization: a comprehensive virtualization management solution that is an ideal virtualization substrate for organizations to build cloud environments in terms of performance, security and value.
  • ManageIQ’s Hybrid Cloud Operations Management Tools: a cloud operations management solution that provides enterprises operational management tools including monitoring, chargeback, governance, and orchestration across virtual and cloud infrastructure such as Red Hat Enterprise Virtualization, Amazon Web Services, Microsoft and VMware.


By acquiring ManageIQ, Red Hat has thrown its hat into the ring as a vendor of a suite of software comparable to the VMware vCloud Suite. This has broad ramifications for Microsoft and for legacy vendors of management software.