TechOpsGuys.com Diggin' technology every day

October 6, 2010

Amazon EC2: Not your father’s enterprise cloud

Filed under: Datacenter — Tags: , — Nate @ 9:00 am

OK, so obviously I am old enough that my father did not have clouds back in his days, well not the infrastructure clouds that are offered today. I just was trying to think of a somewhat zingy type of topic. And I understand enterprise can have many meanings depending on the situation, it could mean a bank that needs high uptime for example. In this case I use the term enterprise to signify the need for 24×7 operation.

Here I am, once again working on stuff related to “the cloud”, and it seems like everything “cloud” part of it revolves around EC2.

Even after all the work I have done recently and over the past year or two with regards to cloud proposals, I don’t know why it didn’t hit me until probably in the past week or so but it did (sorry if I’m late to the party).

There are a lot of problems with running traditional infrastructure in the Amazon cloud, as I’m sure many have experienced first hand. The realization that occured to me wasn’t that of course.

The realization was that there isn’t a problem with the Amazon cloud itself, but there is a problem with how it is:

  • Marketed
  • Targeted

Which leads to people using the cloud for things it was not intended to ever be used for. In regards to Amazon, one has to look no further than their SLA on EC2 to immediately rule it out for any sort of “traditional” application which includes:

  • Web servers
  • Database servers
  • Any sort of multi tier application
  • Anything that is latency sensitive
  • Anything that is sensitive to security
  • Really, anything that needs to be available 24×7

Did you know that if they lose power to a rack, or even a row of racks that is not considered an outage? It’s not as if they provide you with the knowledge of where your infrastructure is in their facilities, they rather you just pay them more and put things in different zones and regions.

Their SLA says in part that they can in fact lose an entire data center (“availability zone”), and that’s not considered an outage.  Amazon describes this as an “availability zone”

Additionally, they are physically separate, such that even extremely uncommon disasters such as fires, tornados or flooding would only affect a single Availability Zone.

And while I can’t find it on their site at the moment, I swear not too long ago their SLA included a provision that said even if they lost TWO data centers it’s still not an outage unless you can’t spin up new systems in a THIRD. Think of how many hundreds to thousands of servers are knocked off line when an Amazon data center becomes unavailable. I think they may of removed the two availability zones clause because not all of their regions have more than two zones(last I checked only us-east did, but maybe more have them now).

I was talking to someone who worked at Amazon not too long ago and had in fact visited the us-east facilities, and said all of the availability zones were in the same office park, really quite close to each other. They may of had different power generators and such, but quite likely if a tornado or flooding hit, more than one zone would be impacted, likely the entire region would go out(that is Amazon’s code word for saying all availability zones are down). While I haven’t experienced it first hand I know of several incidents that impacted more that one availability zone, indicating that there is more things shared between them than customers are led to believe.

Then there is the extremely variable performance & availability of the services as a whole. On more than one occasion I have seen Amazon reboot the underlying hardware w/o any notification (note they can’t migrate the work loads off the machine! anything on the machine at the time is killed!).  I also love how unapologetic they are when it comes to things like data loss. Basically they say you didn’t replicate the data enough times, so it’s your fault. Now I can certainly understand that bad things happen from time to time, that is expected, what is not expected though is how they handle it. I keep thinking back to this article I read on The Register a couple years ago, good read.

Once you’re past that, there’s the matter of reliability. In my experience with it, EC2 is fairly reliable, but you really need to be on your shit with data replication, because when it fails, it fails hard. My pager once went off in the middle of the night, bringing me out of an awesome dream about motorcycles, machine guns, and general ass-kickery, to tell me that one of the production machines stopped responding to ping. Seven or so hours later, I got an e-mail from Amazon that said something to the effect of:

There was a bad hardware failure. Hope you backed up your shit.

Look at it this way: at least you don’t have a tapeworm.

-The Amazon EC2 Team

I’m sure I have quoted it before in some posting somewhere, but it’s such an awesome and accurate description.

So go beyond the SLAs, go beyond the performance and availability issues.

Their infrastructure is “built to fail” which is a good concept at very large scale, I’m sure every big web-type company does something similar. The concept really falls apart at small scale though.

Everyone wants to get to the point where they have application level high availability and abstract the underlying hardware from both a performance and reliability standpoint. I know that, you know that. But what a lot of the less technical people don’t understand is that this is HARD TO DO. It takes significant investments in time & money to pull off. And at large scale these investments do pay back big. But at small scale they can really hurt you. You spend more time building your applications and tools to handle unreliable infrastructure when you could be spending time adding the features that will actually make your customers happy.

There is a balance there, as with anything. My point is that with the Amazon cloud those concepts are really forced upon you, if you want to use their service as a more “traditional” hosting model. And the overhead associated with that is ENORMOUS.

So back to my point as to the problem isn’t with Amazon itself, it’s with whom it is targeted to and the expectations around it. They provide a fine service, if you use it for what it was intended. EC2 stands for “elastic compute”, the first thing that comes to my mind when I hear that kind of term I think of HPC-type applications, data processing, back end type stuff that isn’t latency sensitive, and is more geared towards infrastructure failure.

But even then, that concept falls apart if you have a need for 24×7 operations. The cost model even of Amazon, the low cost “leader” in cloud computing doesn’t hold water vs doing it yourself.

Case in point, earlier in the year at another company I was directed to go on another pointless expedition comparing the Amazon cloud to doing it in house for a data intensive 24×7 application. Not even taking into account the latency introduced by S3, operational overhead with EC2, performance and availability problems. Assuming everything worked PERFECTLY, or at least as good as physical hardware – the ROI for the project for keeping it in house was less than 7 months(I re-checked the numbers and revised the ROI from the original 10 months to 7 months, I was in a hurry writing this morning before work). And this was for good quality hardware with 3 years of NBD on site support. This wasn’t scraping bottom of the barrel. To give you an idea on the savings after those 7 months it could more than pay for my yearly salary and benefits, and other expenses a company has for an employee for each and every month after that.

OK so we’re passed that point now. Onto a couple of really cool slides I came up for a pending presentation, which I really thing illustrate the Amazon cloud quite well, another one of those “picture is worth fifty words” type of thing. The key point here is capacity utilization.

What has using virtualization over the past half decade (give or take..) taught us? What has the massive increases in server and storage capacity taught us? Well they taught me that applications no longer have the ability to exploit the capacity of the underlying hardware. There are very rare exceptions to this but in general over the past  I would say at least 15 years of my experience applications really have never had the ability to exploit the underlying capacity of the hardware. How many systems do you see averaging under 5% cpu? Under 3%? Under 2% ? How many systems do you see with disk drives that are 75% empty? 80%?

What else has virtualization given us? It’s given us the opportunities to logically isolate workloads into different virtual machines, which can ease operational overhead associated with managing such workloads, both from a configuration standpoint as well as a capacity planning standpoint.

That’s my point. Virtualization has given us the ability to consolidate these workloads onto fewer resources. I know this is a point everyone understands I’m not trying to make people look stupid, but my point here with regards to Amazon is their model doesn’t take us forward — it takes us backward. Here are those two slides that illustrate this:

(Click image for full size)

And the next slide

(Click image for full size)

Not all cloud providers are created equal of course. The Terremark Enterprise cloud (not vCloud Express mind you), for example is resource pool based. I have no personal experience with their enterprise cloud (I am a vCloud express user for my personal stuff[2x1VCPU servers – including the server powering this blog!]). Though I did interact with them pretty heavily earlier in the year on a big proposal I was working on at the time. I’m not trying to tell you that Terremark is more or less cost effective, just that they don’t reverse several years of innovation and progress in the infrastructure area.

I’m sure Terremark is not the only provider that can provide resources based on resource pools instead of hard per-VM allocations. I just keep bringing them up because I’m more familiar with their stuff due to several engagements with them at my last company(none of which ever resulted in that company becoming a customer). I originally became interested in Terremark because I was referred to them by 3PAR, and I’m sure by now you know I’m a fan of 3PAR, Terremark is a very heavy 3PAR user. And they are a big VMware user, and you know I like VMware by now right?

If Amazon would be more, what is the right word, honest? up front? Better at setting expectations I think their customers would be better off, mainly they would have less of them because such customers would realize what that cloud is made for. Rather than trying to fit a square peg in a round hole. If you whack it hard enough you can usually get it in, but well you know what I mean.

As this blog  entry exceeds 1,900 words now I feel I should close it off. If you read this far, hopefully I made some sense to you. I’d love to share more of my presentation as I feel it’s quite good but I don’t want to give all of my secrets away 🙂

Thanks for reading.

September 22, 2010

The Cloud: Grenade fishing in a barrel

Filed under: Datacenter — Tags: — Nate @ 10:03 pm

I can’t help but laugh. I mean I’ve been involved in several initiatives surrounding the cloud. So many people out there think the cloud is efficient and cost effective. Whoever came up with the whole concept deserves to have their own island (or country) by now.

Because, really, competing against the cloud is like grenade fishing in a barrel. Shooting fish in a barrel isn’t easy enough, really it’s not!

Chuck from EMC earlier in the year talked to the folks at Pfizer around their use of the Amazon cloud, and the real story behind it. Interesting read, really shows the value you can get from the cloud if you use it right.

R+D’s use of HPC resources is unimaginably bursty and diverse, where on any given day one of 1000 different applications will be run. Periodically enormous projects (of very short duration!) come up very quickly, driven by new science or insights, which sometimes are required to make key financial or  strategic decisions with vast amounts of money at stake for the business.

As a result, there’s no real ability to forecast or plan in any sort of traditional IT sense.  The HPC team has to be able to respond in a matter of days to huge requests for on-demand resources — far outside the normal peaks and valleys you’d find in most traditional IT settings.

But those use cases at the moment really are few and far between. Contrasted by use cases of having your own cloud (of sorts) lots more use there. It would not surprise me if over time Pfizer continues to expand it’s internal HPC stuff as it gets more of a grasp as far as what the average utilization rate is and host more and more stuff internally vs going to Amazon. It’s just that in the early days of this they don’t have enough data to predict how much they need. They may never get completely out of the cloud I’m just saying that the high watermark(for lack of a better term) can be monitored so that there is less significant “bursting” to the cloud.

Now if Pfizer is unable to ever really get a grip on forecasting their HPC requirements well then they might just keep using the cloud, but I suspect at the end of the day they’ will get better forecasting. They obviously have the talent internally to do this very tricky balance of cloud and internal HPC. The cloud people would have you believe it’s a simple thing to do, it’s really not. Especially for off the shelf applications. If you have seen the numbers I have seen, you’d shake your head too. Sort of the response I had when I did come across a real good use case for the cloud earlier this year.

I could see paying a lot more for premium cloud services if I got more, but I don’t get more, in fact I get less, a LOT less, than doing it myself. Now for my own personal “server” that is in the Terremark cloud I can live with it, not a big deal my  needs are tiny(though now that I think about it they couldn’t even give me a 2nd NAT address for a 2nd VM for SMTP purposes, I had to create a 2nd account to put my 2nd VM in it to get my 2nd NAT address, costs for me are the same regardless but it is a bit more complicated than it should be, and opening a 2nd account in their system caused all sorts of problems with their back end which seemed to get confused by having two accounts with the same name, had to engage support on more than one occasion to get all the issues fixed). But for real work stuff, no way.

Still so many sheep out there still buy the hype – hook, line and sinker.

Which can make jobs for people like me harder, I’ve heard the story time and time again from several different people in my position, PHB’s are so sold on the cloud concept they can’t comprehend why it’s so much more expensive then doing it yourself, so they want you to justify it six ways from Sunday (if that’s the right phrase). They know there’s something wrong with your math but they don’t know what it is so they want you to try to prove yourself wrong when your not. At the end of the day it works out though, just takes some time to break that glass ceiling (again it sounds like the right term but it might not be)

Then there’s the argument the cloud people make, I was involved in one deal earlier in the year, usual situation, and the cloud providers said “well do you really have the staff to manage all of this?” I said “IT IS A RACK AND A HALF OF EQUIPMENT, HOW MANY PEOPLE DO I NEED, REALLY? They were just as oblivious to that as the PHB’s were to the cloud costs.

While I’m thinking of wikipedia anyone else experience massive slowdowns with their DNS infrastructure? It takes FOREVER to resolve their domains for me. All other domains resolve really fast. I run my own DNS, maybe there is something wrong with it I’m not sure, haven’t investigated.

May 3, 2010

Terremark vCloud Express: First month

Filed under: Datacenter,Virtualization — Tags: , , — Nate @ 3:02 pm

Not much to report, got my first bill for my first “real” month of usage (minus DNS I haven’t gotten round to transferring DNS yet but I do have the ports opened).

$122.20 for the month which included:

  • 1 VM with 1VPU/1.5GB/40GB – $74.88
  • 1 External IP address – $0.00 (which is confusing I thought they charged per IP)
  • TCP/UDP ports – $47.15
  • 1GB of data transferred – $0.17

Kind of funny the one thing that is charged as I use it (the rest being charged as I provision it) I pay less than a quarter for. Obviously I slightly overestimated my bandwidth usage. And I’m sure they round to the nearest GB, as I don’t believe I even transferred 1GB during the month of April.

I suppose the one positive thing from a bandwidth and cost standpoint if I ever wanted to route all of my internet traffic from my cable modem at home through my VM (over VPN) for paranoia or security purposes, I could. I believe Comcast caps bandwidth at ~250GB/mo or something which would be about $42/mo assuming I tapped it out(but believe me my home bandwidth usage is trivial as well).

Hopefully this coming weekend I can get around to assigning a second external IP, mapping it to my same DNS and moving some of my domains over to this cloud instead of keeping them hosted on my co-located server. Just been really busy recently.

April 9, 2010

Found a use for the cloud

Filed under: News,Virtualization — Tags: — Nate @ 1:42 pm

Another interesting article on Datacenter Knowledge and mentioned the U.S. Government’s use of the Terremark cloud, I recall reading about it briefly when it first launched but seeing the numbers again made me do another double take.

”One of the most troubling aspects about the data centers is that in a lot of these cases, we’re finding that server utilization is actually around seven percent,” Federal Chief Information Officer Vivek Kundra said

[..]

Yes, you read that correctly. A government agency was going to spend $600,000 to set up a blog.

[..]

The GSA previously paid $2.35 million in annual costs for USA.gov, including $2 million for hardware refreshes and software re-licensing and $350,000 in personnel costs, compared to the $650,000 annual cost to host the site with Terremark.

For $650k/yr I bet the site runs on only a few servers(dozen or less) and has less than a TB of total disk space.

April 3, 2010

Terremark vCloud Express: Day 1

Filed under: Virtualization — Tags: , , — Nate @ 12:19 pm

You may of read another one of my blog entries “Why I hate the cloud“, I also mentioned how I’ve been hosting my own email/etc for more than a decade in “Lesser of two evils“.

So what’s this about? I still hate the cloud for any sort of large scale deployment, but for micro deployments it can almost make sense. Let me explain my situation:

About 9 years ago the ISP I used to help operate more or less closed shop, I relocated what was left of the customers to my home DSL line (1mbps/1mbps 8 static IPs) on a dedicated little server. My ISP got bought out, then got bought out again and started jacking up the rates(from $20/mo to ~$100/mo + ~$90;/mo for Qwest professional DSL). Hosting at my apartment was convienant but at the same time was a sort of a ball and chain, as it made it very difficult to move. Co-ordinating the telco move and the ISP move with minimal downtime, well let’s just say with DSL that’s about impossible. I managed to mitigate one move in 2001 by temporarily locating my servers at my “normal” company’s network for a few weeks while things got moved.

A few years ago I was also hit with what was a 27 hour power outage(despite being located in a down town metropolitan area, everyone got hit by that storm). Shortly after that I decided longer term a co-location is the best fit for me. So phase one was to virtualize the pair of systems in VMware. I grabbed an older server I had laying around and did that, and ran it for a year, worked great(though the server was really loud).

Then I got another email saying my ISP was bought out yet again, this time the company was going to force me to change my IP addresses, which when your hosting your own DNS can be problematic. So that was the last straw. I found a nice local company to host my server at a reasonable price. The facility wasn’t world class by any stretch, but the world class facilities in the area had little interest in someone wanting to host a single 1U box that averages less than 128kbps of traffic at any given time. But it would do for now.

I run my services on a circa 2004 Dual Xeon system, with 6GB memory, ~160GB of disk on a 3Ware 8006-2 RAID controller(RAID 1). I absolutely didn’t want to go to one of those cheap crap hosting providers where they have massive downtime and no SLAs. I also had absolutely no faith in the earlier generation “UML” “VMs(yes I know Xen and UML aren’t the same but I trust them the same amount – e.g. none). My data and privacy are fairly important to me and I am willing to pay extra to try to maintain it.

So early last year my RAID card told me one of my disks was about to fail and to replace it, so I did, rebuilt the array and off I went again. A few months later the RAID card again told me another disk was about to fail(there are only two disks in this system), so I replaced that disk, rebuilt, and off I went. Then a few months later, the RAID card again said a disk is not behaving right and I should replace it. Three disk replacements in less than a year. Though really it’s been two, I’ve ignored the most recent failing drive for several months now. Media scans return no errors, however RAID integrity checks always fail causing a RAID rebuild(this happens once a week). Support says the disk is suffering from timeouts.  There is no back plane on the system(and thus no hot swap, making disk replacements difficult). Basically I’m getting tired of maintaining hardware.

I looked at the cost of a good quality server with hot swap, remote management, etc, and something that can run ESX, cost is $3-5k. I could go $2-3k and stick to VMware server on top of Debian, a local server manufacturer has their headquarters literally less than a mile from my co-location, so it is tempting to stick with doing it on my own, and if my needs were greater than I would fo sure, cloud does not make sense in most cases in my opinion but in this case it can.

If I try to price out a cloud option that would match that $3-5k server, purely from a CPU/memory perspective the cloud option would be significantly more. But I looked closer and I really don’t need that much capacity for my stuff. My current VMware host runs at ~5-8% cpu usage on average on six year old hardware. I have 6GB of ram but I’m only using 2-3GB at best. Storage is the biggest headache for me right now hosting my own stuff.

So I looked to Terremark who seem to have a decent operation going, for the most part they know what they are doing(still make questionable decisions though I think most of those are not made by the technical teams). I looked to Terremark for a few reasons:

  • Enterprise storage either from 3PAR or EMC (storage is most important for me right now given my current situation)
  • Redundant networking
  • Tier IV facilities (my current facility lacks true redundant power and they did have a power outage last year)
  • Persistent, fiber attached storage, no local storage, no cheap iSCSI, no NFS,  no crap RAID controllers, no need to worry about using APIs and other special tools to access storage it is as if it was local
  • Fairly nice user interface that allows me to self provision VMs, IPs etc

Other things they offer that I don’t care about(for this situation, others they could come in real handy):

  • Built in load balancing via Citrix Netscalers
  • Built in firewalls via Cisco ASAs

So for me, a meager configuration of 1 vCPU, 1.5GB of memory, and 40GB of disk space with a single external static IP is a reasonable cost(pricing is available here):

  • CPU/Memory: $65/mo [+$1,091/mo if I opted for 8-cores and 16GB/ram]
  • Disk space: $10/mo [+$30/mo if I wanted 160GB of disk space]
  • 1 IP address: $7.20/mo
  • 100GB data transfer: $17/mo (bandwidth is cheap at these levels so just picked a round number)
  • Total: $99/mo

Which comes to about the same as what I’m paying for in co-location fees now, if that’s all the costs were I’d sign up in a second, but unfortunately their model has a significant premium on “IP Services”, when ideally what I’d like is just a flat layer 3 connection to the internet. The charge is $7.20/mo for each TCP and UDP port you need opened to your system, so for me:

  • HTTP – $7.20/mo
  • HTTPS – $7.20/mo
  • SMTP – $7.20/mo
  • DNS/TCP – $7.20/mo
  • DNS/UDP – $7.20/mo
  • VPN/UDP – $7.20/mo
  • SSH – $7.20/mo
  • Total: $50/mo

And I’m being conservative here, I could be opening up:

  • POP3
  • POP3 – SSL
  • IMAP4
  • IMAP4 – SSL
  • Identd
  • Total: another $36/mo

But I’m not, for now I’m not. Then you can double all of that for my 2nd system, so assuming I do go forward with deploying the second system my total costs (including those extra ports) is roughly $353/mo (I took out counting a second 100GB/mo of bandwidth). Extrapolate that out three years:

  • First year: $4,236 ($353/mo)
  • First two years: $8,472
  • First three years: $12,708

Compared to doing it on my own:

  • First year: ~$6,200 (with new $5,000 server)
  • First two years: ~$7,400
  • First three years: ~$8,600

And if you really want to see how this cost structure doesn’t scale, let’s take a more apples to apples comparison of CPU/memory of what I’d have in my own server and put it in the cloud:

  • First year – $15,328 [ 8 cores, 16GB ram 160GB disk ]
  • First two years – $30,657
  • First three years – $45,886

As you can see the model falls apart really fast.

So clearly it doesn’t make a lot of sense to do all of that at once, so if I collapse it to only the essential services on the cloud side:

  • First year: $3,420 ($270/mo)
  • First two years: $6,484
  • First three years: $9,727

I could live with that over three years, especially if the system is reliable, and maintains my data integrity. But if they added just one feature for lil ol me, that feature would be a “Forwarding VIP” on their load balancers and say basically just forward everything from this IP to this internal IP. I know their load balancers can do it, it’s just a matter of exposing the functionality. This would dramatically impact the costs:

  • First year: $2,517 ($210/mo)
  • First two years: $5,035
  • First three years: $7,552
  • First four years: $10,070

You can see how the model doesn’t scale, I am talking about 2 vCPUs  worth of power, and 3GB of memory, compared to say at least a 8-12 core physical server and 16GB or more of memory if I did it myself. But again I have no use for that extra capacity if I did it myself so it’d just sit idle, like it does today.

CPU usage is higher than I mentioned above I believe because of a bug in VMware Server 2.0 that causes CPU to “leak” somehow, which results in a steady, linear increase in cpu usage over time. I reported it to the forums, but didn’t get a reply, and don’t care enough to try to engage VMware support, they didn’t help me much with ESX and a support contract, they would do even less for VMware server and no support contract.

I signed up for Terremark’s vCloud Express program a couple of months ago, installed a fresh Debian 5.0 VM, and synchronized my data over to it from one of my existing co-located VMs.

So today I have officially transferred all of my services(except DNS) from one of my two co-located VMs to Terremark, and will run it for a while and see how the costs are, how it performs, reliability etc. My co-location contract is up for renewal in September so I have plenty of time to determine whether or not I want to make the jump, I’m hoping I can make it work, as it will be nice to not have to worry about hardware anymore. An excerpt from that link:

[..] My pager once went off in the middle of the night, bringing me out of an awesome dream about motorcycles, machine guns, and general ass-kickery, to tell me that one of the production machines stopped responding to ping. Seven or so hours later, I got an e-mail from Amazon that said something to the effect of:

There was a bad hardware failure. Hope you backed up your shit.

Look at it this way: at least you don’t have a tapeworm.

-The Amazon EC2 Team

I’ll also think long and hard, and probably consolidate both of my co-located VMs into a single VM at Terremark if I do go that route, which will save me a lot, I really prefer two VMs, but I don’t think I should be charged double for two, especially when two are going to use roughly the same amount of resources as one. They talk all about “pay for what you use”, when that is not correct, the only portion of their service that is pay for what you use is bandwidth. Everything else is “pay as you provision”. So if you provision 100GB and a 4CPU VM but you never turn it on, well your still going to pay for it.

The model needs significant work, hopefully it will improve in the future, all of these cloud companies are trying to figure out this stuff still. I know some people at Terremark and will pass this along to them to see what they think. Terremark is not alone in this model, I’m not picking on them for any reason other than I use their services. I think in some situations it can make sense. But the use cases are pretty low at this point. You probably know that I wouldn’t sign up and commit to such a service unless I thought it could provide some good value!

Part of the issue may very well be limitations in the hypervisor itself with regards to reporting actual usage, as VMware and others improve their instrumentation of their systems that could improve the cost model for customers signficantly, perhaps doing things like charging based on CPU usage based on a 95% model like we measure bandwidth. And being able to do things like cost capping, where if your resource usage is higher for an extended period the provider can automatically throttle your system(s) to keep your bill lower(at your request of course).

Another idea would be more accurate physical to virtual mapping, where I can provision say 1 physical CPU, and X amount of memory and then provision unlimited VMs inside that one CPU core and memory. Maybe I just need 1:1, or maybe my resource usage is low enough that I can get 5:1 or 10:1, afterall one of the biggest benefits of virtualization is being able to better isolate workloads. Terremark already does this to some degree on their enterprise products, but this model isn’t available for vCloud Express, at least not yet.

You know what surprised me most next to the charges for IP services, was how cheap enterprise storage is for these cloud companies. I mean $10/mo for 40GB of space on a high end storage array? I can go out and buy a pretty nice server to host VMs at a facility of my choosing, but if I want a nice storage array to back it up I’m looking at easily 10s of thousands of dollars. I just would of expected storage to be a bigger piece of the pie when it came to overall costs. When in my case it can be as low as 3-5% of the total cost over a 3 year period.

And despite Terremark listing Intel as a partner, my VM happens to be running on -you guessed it – AMD:

yehat:/var/log# cat /proc/cpuinfo
processor    : 0
vendor_id    : AuthenticAMD
cpu family    : 16
model        : 4
model name    : Quad-Core AMD Opteron(tm) Processor 8389
stepping    : 2
cpu MHz        : 2913.037

AMD get’s no respect I tell ya, no respect! 🙂

I really want this to work out.

March 16, 2010

IBM partners with Red hat for KVM cloud

Filed under: News,Virtualization — Tags: , , , — Nate @ 6:32 pm

One question: Why?

IBM has bombarded the IT world for years now how they can consolidate hundreds to thousands of Linux VMs onto a single mainframe.

IBM has recently announced a partnership with Red hat to use KVM in a cloud offering. At first I thought, well maybe they are doing it to offer Microsoft applications as well, but that doesn’t appear to be the case:

Programmers who use the IBM Cloud for test and dev will be given RHEV to play with Red Hat Enterprise Linux or Novell SUSE Linux Enterprise Server images with a Java layer as they code their apps and run them through regression and other tests.

Let’s see, Linux and Java, why not use the mainframes to do this? Why KVM? As far as the end users are concerned it really shouldn’t matter, after all it’s java and linux.

Seems like a slap in the face to their mainframe division (I never bought into the mainframe/linux/VM marketing myself, I suppose they don’t either). I do remember briefly having access to a S390 running a SuSE VM about 10 years ago, it was..interesting.

March 9, 2010

The Atomic Unit of Compute

Filed under: Virtualization — Tags: — Nate @ 5:16 pm

I found this pretty fascinating, as someone who has been talking to several providers it certainly raises some pretty good points.

[..]Another of the challenges you’ll face along the way of Cloud is that of how to measure exactly what it is you are offering. But having a look at what the industry is doing won’t give you much help… as with so many things in IT, there is no standard. Amazon have their EC2 unit, and state that it is roughly the equivalent of 1.0-1.2GHz of a 2007 Opteron or Xeon CPU. With Azure, Microsoft haven’t gone down the same path – their indicative pricing/sizing shows a base compute unit of 1.6GHz with no indication as to what is underneath. Rackspace flip the whole thing on it’s head by deciding that memory is the primary resource constraint, therefore they’ll just charge for that and presumably give you as much CPU as you want (but with no indication as to the characteristics of the underlying CPU). Which way should you go? IMHO, none of the above.[..]

We need to have a standard unit of compute, that applies to virtual _and_ physical, new hardware and old, irrespective of AMD or Intel (or even SPARC or Power). And of course, it’s not all just about GHz because all GHz are most definitely not equal and yes it _does_ matter to applications. And lets not forget the power needed to deliver those GHz.

In talking with Terremark it seems their model is around VMware resource pools where they allocate you a set amount of Ghz for your account. They have a mixture of Intel dual socket systems and AMD quad socket systems, and if you run a lot of multi vCPU VMs you have a higher likelihood of ending up in the AMD pool vs the Intel one. I have been testing their vCloud Express product for my own personal needs(1 vCPU, 1.5GB ram 50GB HD), and noticed that my VM is on one of the AMD quad socket systems.

February 9, 2010

Why I hate the cloud

Filed under: Virtualization — Tags: — Nate @ 4:26 pm

Ugh, I hate all this talk about the cloud, for the most part what I can see is it’s a scam to sell mostly overpriced/high margin services to organizations who don’t know any better.  I’m sure there are plenty of organizations out there that have IT staff that aren’t as smart as my cat, but there are plenty that have people that are smarter too.

The whole cloud concept is sold pretty good I have to admit. It frustrates me so much I don’t know how properly express it. The marketing behind the cloud is such that it gives some people the impression that they can get nearly unlimited resources at their disposal, with good SLAs, good performance and pay pennies on the dollar.

It’s a fantasy. That reality doesn’t exist. Now sure the cost models of some incompetent organizations out there might be bad enough to the point that clouds make a lot of sense. But again there are quite a few that already have a cost effective way of operating. I suppose I am not the target customer, as every cloud provider I have talked to or seen cost analysis for has come in at a MINIMUM of 2.5-3x more expensive than doing it in house, going as high as 10x. Even the cheap crap that Amazon offers is a waste of money.

In my perspective, a public cloud(by which I mean an external cloud service provider, vs hosting “cloud” in house by way of virtual machines, grid computing and the like) has a few of use cases:

  1. Outsourced infrastructure for very small environments. I’m talking single digit servers here, low utilization etc.
  2. Outsourced “managed” cloud services, which would replace managed hosting(in the form of dedicated physical hardware) primarily to gain the abstraction layer from the hardware to handle things like fault tolerance and DR better. Again really only cost effective for small environments.
  3. Peak capacity processing – sounds good on paper, but you really need a scale-out application to be able to handle it, very few applications can handle such a situation gracefully. That is being able to nearly transparently shift compute resources to a remote cloud on demand for short periods of time to handle peak capacity. But I can’t emphasize enough the fact that the application really has to be built from the ground up to be able to handle such a situation. A lot of the newer “Web 2.0” type shops are building(or have built) such applications, but of course the VAST majority of applications most organizations will use were never designed with this concept in mind. There are frequently significant concerns surrounding privacy and security.

I’m sure you can extract other use cases, but in my opinion those other use cases assume a (nearly?) completely incompetent IT/Operations staff and/or management layers that prevent the organization from operating efficiently. I believe this is common in many larger organizations unfortunately, which is one reason I steer clear of them when looking for employment.

It just drives me nuts when I encounter someone who either claims the cloud is going to save them all the money in the world, or someone who is convinced that it will (but they haven’t yet found the provider that can do it).

Outside of the above use cases, I would bet money that for any reasonably efficient IT shop(usually involves a team of 10 or fewer people) can do this cloud thing far cheaper than any service provider would offer the service to them. And if a service provider did happen to offer at or below cost pricing I would call BS on them. Either they are overselling oversubscribed systems that they won’t be able to sustain, or they are buying customers so that they can build a customer base. Even what people often say is the low cost leader for cloud Amazon is FAR more expensive than doing it in house in every scenario I have seen.

Almost equally infuriating to me are those that believe all virtualization solutions are created equal, and that oh we can go use the free stuff(i.e. “free” Xen) rather than pay for vSphere. I am the first to admit that vSphere enterprise plus is not worth the $$ for virtually all customers out there, there is a TON of value available in the lower end versions of VMware. Much like Oracle, sadly it seems when many people think of VMware they immediately gravitate towards the ultra high end and say “oh no it’s too expensive!”. I’ve been running ESX for a few years now and have gotten by just fine without DRS, without host profiles, without distributed switches, without vMotion, without storage vMotion, the list goes on..! Not saying they aren’t nice features, but if you are cost conscious you often need to ask yourself while those are nice to have do you really need them. I’d wager frequently the answer is no.

« Newer Posts

Powered by WordPress