The Christmas Eve Amazon outage that resulted in Netflix being unavailable for 36 hours results from an unacceptable attitude on the part of Amazon towards reliability and performance. Unless Amazon steps up to the plate with a meaningful SLA, Amazon risks damaging its own growth, and the entire concept of public cloud computing.
Amazon’s recent outages remind us that deploying in the cloud doesn’t automatically guarantee high availability. Where you deploy, and how you deploy turns out to really matter.
Public Cloud SLA’s are worthless. They need to be replaced by metrics that measure the responsiveness of what the cloud provider owns to the layer of software from the customer running in the cloud. Developing these metrics will require significant changes to existing APM approaches in order to be able to separate time spent in the application from time spent in the application framework or OS.