TechOpsGuys.com Diggin' technology every day

July 9, 2012

Amazon outages from a Datacenter Perspective

Filed under: Datacenter — Tags: , — Nate @ 2:56 pm

I just came across this blog post (“Cloud Infrastructure Might be Boring, but Data Center Infrastructure Is Hard”), and the author spent a decent amount of time ripping into Amazon from a data center operations perspective –

But on the facilities front, it’s hard to see how the month of June was anything short of a disaster for Amazon on the data center operations side.

Also covered are past outages and the author concludes that Amazon lacks discipline in operating their facilities as a chain of outages illustrates over the past few years

[..]since all of them can be traced back to a lack of discipline in the operation of the data centers in question.

[..]I wish they would just ditch the US East-1 data center that keeps giving them problems.  Of course the vast, vast majority of AWS instances are located there, so that may involve acquiring more floor space.

Sort of reminds me when Internap had their massive outage and then followed up by offering basically free migration to their new data center for any customer that wanted it – so many opted for it that they ran out of space pretty quick (though I’m sure they have since provisioned tons more space since the new facility had the physical capacity to handle everyone + lots more once fully equipped).

This goes back to my post where I ripped into them from a customer perspective, the whole built to fail model. For Amazon it doesn’t matter of a data center goes offline, they have the capacity to take the hit elsewhere and global DNS will move the load over in a matter of seconds.  Most of their customers don’t do that (because it’s really expensive and complex mainly – did you happen to notice there’s really no help for customers that want to replicate data or configuration between EC2 Regions?). As I tried to point out before, at anything other than massive scale it’s far more cost effective(and orders of magnitude simpler) for the vast majority of the applications and workloads out there to have the redundancy in the infrastructure (and of course the operational ability to run the facilities properly) to handle those sorts of events.

Though I’d argue with the author on one point – cloud infrastructure is hard.  (Updated, since the author said it was boring rather than easy, my brain interpreted it as one is hard the other must not be, for whatever reason 🙂 ) Utility infrastructure is easy but true cloud infrastructure is hard.  The main difference being the self service aspect of things. There are a lot of different software offerings trying to offer some sort of self service or another but for the most part they still seem pretty limited or lack maturity (and in some cases really costly). It’s interesting to see the discussions about OpenStack for example – not a product I’d encourage anyone to use in house just yet unless you have developer resources that can help keep it running.

Powered by WordPress