7
Dec/11
3

Impending rolling outages in EC2

TechOps Guy: Nate

I don’t write too much about EC2, despite how absolutely terrible it is, I will be writing about it in more depth soon(months most likely, it’s complicated). Nothing is more frustrating than working with stuff in EC2.

I have told some folks recently that my private rants about EC2 and associated services makes me feel like those folks in 2005-7 screaming about the implosion of the housing market yet for the most part nobody was listening because that’s not what they wanted to hear.

Same goes for EC2.

Anyways, I wanted to mention this, which talks about impending rolling outages across the Amazon infrastructure (within the next week or two).

Oh wait these are not outages, these are “scheduled maintenance events”.

That you can’t opt out of. You can postpone them a bit, but you can’t avoid them entirely, short of getting the hell outta there (which is a project I am working on – finally! Going to Atlanta next week, more than 4 months later than I was originally expecting)

Yeah, good design there. Better design? Take a look at what the folks over at a provider in the UK called UltraSpeed does, it’s clear they are passionate about what they do, and things like 15 minute SLA for restoring a failed server show they take pride in their work(look ma! No hard disks in the servers! Automated off site backups to another country!). Or Terremark – fire in the data center? No problem.

I have little doubt this is in response to critical security flaws which can only be addressed by rebooting the tens or hundreds of thousands of VMs across their infrastructure in a short time before it gets exploited, assuming it’s not being exploited already.

I fully expect that perhaps by the end of this month there will be some security group out there that discloses the vulnerability that Amazon is frantically trying to address now.

PDF Creator    Send article as PDF to

Author: Nate

No description. Please complete your profile.
Tagged as:
Comments (3) Trackbacks (0)
  1. Your forgot to mention that some of those reboots (instance reboots vs. system reboots) will cause your instances’ public DNS name and IP to change (a stop/start reboot). Time to go update any external references to your instances (like monitoring because their CloudWatch monitors can’t monitor anything in the OS). Hope you don’t have hundreds/thousands of instances.

  2. I was not aware of the system reboots! Though from what I see I think the IPs will not change for system reboots..

  3. The IPs changed on our instances that needed system (stop/start) reboots.

Leave a comment


No trackbacks yet.