TechOpsGuys.com Diggin' technology every day

December 18, 2012

Top 10 outages of the year

Filed under: Datacenter — Tags: , — Nate @ 11:02 am

It’s that time of the year again, top N lists are popping up everywhere, I found this list from Data Center Knowledge to be interesting.

Of note, two big cloud companies were on the list with multiple outages – Amazon having at least three outages and Azure right behind it at two. Outages have been a blight on both services for years.

I don’t know about you, but short of a brief time at a poor hosting facility in Seattle (I joined a company in Spring of 2006 that was hosted there and we were moved out by Fall of 2006 – we did go through one power outage while I was there if I recall right), the number of infrastructure related outages I’ve been through over the past decade have been fairly minimal compared to the number experienced by these cloud companies. The number of application related outages (and total downtime minutes incurred by said applications) out numbers infrastructure related things for me I’d say by at least 1,000:1.

Amazon has had far more downtime for companies that I have worked for (either before or since I was there) than any infrastructure related outages at companies I was at where they hosted their own stuff. I’d say it’s safe to say an order of magnitude more outages. Of course not all of these are called outages by Amazon, they leave themselves enough wiggle room to drive an aircraft carrier through in their SLAs. My favorite one was probably the forced reboot of their entire infrastructure.

Unlike infrastructure related outages at individual companies, obviously these large service provider outages have much larger consequences for very large numbers of customers.

Speaking of cloud, I heard that HP brought their own cloud platform out of beta recently. I am not a fan of this cloud either, basically they tried to clone what Amazon is doing in their cloud, which infrastructure wise is a totally 1990s way of doing things (with APIs on top to make it feel nice). Wake me up when these clouds get the ability to pool CPU/memory/storage and have the ability to dynamically configure systems without fixed configurations.

If the world happens to continue on after December 22nd @ 3:11AM Pacific time, and I don’t happen to see you before Christmas – have a good holiday from all of us monkeys at Techopsguys.

New Cloud provider Profitbricks

Filed under: Datacenter — Tags: , — Nate @ 9:02 am

(originally I had this on the post above this but I thought it better to split it out since it morphed into something that suited a dedicated post)

Also on the topic of cloud, I came across this other post on Data Center Knowledge’s site a few days ago talking about a new cloud provider called ProfitBricks.

I dug into their web site a bit and they really seem to have some interesting technology. They are based out of Europe, but also have a U.S. data center somewhere too. They claim more than 1,000 customers, and well over 100 engineers working on the software.

While Profitbricks does not offer pooling of resources they do have several key architectural advantages that other cloud offerings that I’ve come across lack:

They really did a good job at least on paper, I haven’t used this service, though I did play around with their data center designer

ProfitBricks Data Center designer

Their load balancing offering appears to be quite weak (weaker than Amazon’s own offering), but you can deploy a software load balancer like Riverbed Stingray (formerly Zeus). I emailed them about this and they are looking into Stingray, perhaps they can get a partnership going and have it be an offering with their service. Amazon has recently improved their load balancing partnerships and you can now run at least Citrix Netscaler as well as A10 Networks’ SoftAX in EC2, in addition to Riverbed Stingray. Amazon’s own Elastic Load Balancer is worse than useless in my experience. I’d rather rely on an external DNS-based load balancing from the likes of Dynect than use ELB. Even with Stingray it can take several seconds (up to about 30) for the system to fail over with Elastic IPs, vs normally sub second fail over when your operating your own infrastructure.

Anyway back to Proifitbricks, I was playing around with their designer tool and I was not sure how best to connect servers that would be running load balancers(assuming they don’t provide the ability to do IP-takeover). I thought maybe have one LB in each zone, and advertise both data center IP addresses (this is a best practice in any case at least for larger providers). Though in the above I simplified it a bit to a single internet access point and using one of ProfitBricks round robin load balancers to distribute layer 4 traffic to the servers behind it(running Stingray). Some real testing would of course have to go into play and further discussions before I’d run production stuff on it obviously (and I have no need for IaaS cloud right now anyway).

So they have all this, and still their pricing is very competitive. They also claim very high level of support as well which is good to see.

I’ll certainly keep them in mind in the event I need IaaS in the future, they seem to know the failings of first generation cloud companies and are doing good things to address them. Now if they could only address the point of lack of resource pooling I’d be really happy!

Powered by WordPress