The recent Amazon Web Services Simple Storage Service (S3) outage has taught us quite a bit about fragile cloud architectures. While many cloud providers will make hay during the next few weeks, current cloud architectures are fragile. Modern hybrid cloud architectures are fragile. We need to learn from this outage to design better systems: ones that are not fragile, ones that can recover from an outage. Fragile cloud is not a naysayer: it is a chance to do better! What can we do better?
Articles Tagged with SLA
A Service Level Agreement (SLA) is an excellent expectations-managing mechanism, but it’s important to manage your own expectations of what an SLA can realistically accomplish. Just those three words “Service” “Level” and “Agreement” is often an attention turn-off I know: SLAs are to infrastructure bods what documentation is to developers. Yet, when considering taking up cloud and utility services many consider that the SLAs offered aren’t reliable, if they exist at all. So the SLA becomes the blocker – ‘If I move services out of my data centre, how will I guarantee availability and performance’.
Are SLAs for Cloud services really worthless and if they are, will the wider adoption of cloud services be impacted because of this?
As the dust settles on the Amazon Cloud Outage (or the mist lifts, or whatever cloud-related metaphorical cliché you prefer) I’d like to make a number of conclusions related to scalability performance, reliability and openness.
For those of you who haven’t followed the minutiae of the story, it appears that Amazon failed because a network event caused Elastic Block Storage (EBS) to start re-mirroring itself, which in turn saturated the network and caused more mirroring events in a cascade that made EBS unavailable.