Virtualization « TechOpsGuys.com

April 9, 2010

April 3, 2010

Terremark vCloud Express: Day 1

Filed under: Virtualization — Tags: cloud, terremark, vloudexpress — Nate @ 12:19 pm

You may of read another one of my blog entries “Why I hate the cloud“, I also mentioned how I’ve been hosting my own email/etc for more than a decade in “Lesser of two evils“.

So what’s this about? I still hate the cloud for any sort of large scale deployment, but for micro deployments it can almost make sense. Let me explain my situation:

About 9 years ago the ISP I used to help operate more or less closed shop, I relocated what was left of the customers to my home DSL line (1mbps/1mbps 8 static IPs) on a dedicated little server. My ISP got bought out, then got bought out again and started jacking up the rates(from $20/mo to ~$100/mo + ~$90;/mo for Qwest professional DSL). Hosting at my apartment was convienant but at the same time was a sort of a ball and chain, as it made it very difficult to move. Co-ordinating the telco move and the ISP move with minimal downtime, well let’s just say with DSL that’s about impossible. I managed to mitigate one move in 2001 by temporarily locating my servers at my “normal” company’s network for a few weeks while things got moved.

A few years ago I was also hit with what was a 27 hour power outage(despite being located in a down town metropolitan area, everyone got hit by that storm). Shortly after that I decided longer term a co-location is the best fit for me. So phase one was to virtualize the pair of systems in VMware. I grabbed an older server I had laying around and did that, and ran it for a year, worked great(though the server was really loud).

Then I got another email saying my ISP was bought out yet again, this time the company was going to force me to change my IP addresses, which when your hosting your own DNS can be problematic. So that was the last straw. I found a nice local company to host my server at a reasonable price. The facility wasn’t world class by any stretch, but the world class facilities in the area had little interest in someone wanting to host a single 1U box that averages less than 128kbps of traffic at any given time. But it would do for now.

I run my services on a circa 2004 Dual Xeon system, with 6GB memory, ~160GB of disk on a 3Ware 8006-2 RAID controller(RAID 1). I absolutely didn’t want to go to one of those cheap crap hosting providers where they have massive downtime and no SLAs. I also had absolutely no faith in the earlier generation “UML” “VMs“(yes I know Xen and UML aren’t the same but I trust them the same amount – e.g. none). My data and privacy are fairly important to me and I am willing to pay extra to try to maintain it.

So early last year my RAID card told me one of my disks was about to fail and to replace it, so I did, rebuilt the array and off I went again. A few months later the RAID card again told me another disk was about to fail(there are only two disks in this system), so I replaced that disk, rebuilt, and off I went. Then a few months later, the RAID card again said a disk is not behaving right and I should replace it. Three disk replacements in less than a year. Though really it’s been two, I’ve ignored the most recent failing drive for several months now. Media scans return no errors, however RAID integrity checks always fail causing a RAID rebuild(this happens once a week). Support says the disk is suffering from timeouts.Â There is no back plane on the system(and thus no hot swap, making disk replacements difficult). Basically I’m getting tired of maintaining hardware.

I looked at the cost of a good quality server with hot swap, remote management, etc, and something that can run ESX, cost is $3-5k. I could go $2-3k and stick to VMware server on top of Debian, a local server manufacturer has their headquarters literally less than a mile from my co-location, so it is tempting to stick with doing it on my own, and if my needs were greater than I would fo sure, cloud does not make sense in most cases in my opinion but in this case it can.

If I try to price out a cloud option that would match that $3-5k server, purely from a CPU/memory perspective the cloud option would be significantly more. But I looked closer and I really don’t need that much capacity for my stuff. My current VMware host runs at ~5-8% cpu usage on average on six year old hardware. I have 6GB of ram but I’m only using 2-3GB at best. Storage is the biggest headache for me right now hosting my own stuff.

So I looked to Terremark who seem to have a decent operation going, for the most part they know what they are doing(still make questionable decisions though I think most of those are not made by the technical teams). I looked to Terremark for a few reasons:

Enterprise storage either from 3PAR or EMC (storage is most important for me right now given my current situation)
Redundant networking
Tier IV facilities (my current facility lacks true redundant power and they did have a power outage last year)
Persistent, fiber attached storage, no local storage, no cheap iSCSI, no NFS,Â no crap RAID controllers, no need to worry about using APIs and other special tools to access storage it is as if it was local
Fairly nice user interface that allows me to self provision VMs, IPs etc

Other things they offer that I don’t care about(for this situation, others they could come in real handy):

Built in load balancing via Citrix Netscalers
Built in firewalls via Cisco ASAs

So for me, a meager configuration of 1 vCPU, 1.5GB of memory, and 40GB of disk space with a single external static IP is a reasonable cost(pricing is available here):

CPU/Memory: $65/mo [+$1,091/mo if I opted for 8-cores and 16GB/ram]
Disk space: $10/mo [+$30/mo if I wanted 160GB of disk space]
1 IP address: $7.20/mo
100GB data transfer: $17/mo (bandwidth is cheap at these levels so just picked a round number)
Total: $99/mo

Which comes to about the same as what I’m paying for in co-location fees now, if that’s all the costs were I’d sign up in a second, but unfortunately their model has a significant premium on “IP Services”, when ideally what I’d like is just a flat layer 3 connection to the internet. The charge is $7.20/mo for each TCP and UDP port you need opened to your system, so for me:

HTTP – $7.20/mo
HTTPS – $7.20/mo
SMTP – $7.20/mo
DNS/TCP – $7.20/mo
DNS/UDP – $7.20/mo
VPN/UDP – $7.20/mo
SSH – $7.20/mo
Total: $50/mo

And I’m being conservative here, I could be opening up:

POP3
POP3 – SSL
IMAP4
IMAP4 – SSL
Identd
Total: another $36/mo

But I’m not, for now I’m not. Then you can double all of that for my 2nd system, so assuming I do go forward with deploying the second system my total costs (including those extra ports) is roughly $353/mo (I took out counting a second 100GB/mo of bandwidth). Extrapolate that out three years:

First year: $4,236 ($353/mo)
First two years: $8,472
First three years: $12,708

Compared to doing it on my own:

First year: ~$6,200 (with new $5,000 server)
First two years: ~$7,400
First three years: ~$8,600

And if you really want to see how this cost structure doesn’t scale, let’s take a more apples to apples comparison of CPU/memory of what I’d have in my own server and put it in the cloud:

First year – $15,328 [ 8 cores, 16GB ram 160GB disk ]
First two years – $30,657
First three years – $45,886

As you can see the model falls apart really fast.

So clearly it doesn’t make a lot of sense to do all of that at once, so if I collapse it to only the essential services on the cloud side:

First year: $3,420 ($270/mo)
First two years: $6,484
First three years: $9,727

I could live with that over three years, especially if the system is reliable, and maintains my data integrity. But if they added just one feature for lil ol me, that feature would be a “Forwarding VIP” on their load balancers and say basically just forward everything from this IP to this internal IP. I know their load balancers can do it, it’s just a matter of exposing the functionality. This would dramatically impact the costs:

First year: $2,517 ($210/mo)
First two years: $5,035
First three years: $7,552
First four years: $10,070

You can see how the model doesn’t scale, I am talking about 2 vCPUsÂ worth of power, and 3GB of memory, compared to say at least a 8-12 core physical server and 16GB or more of memory if I did it myself. But again I have no use for that extra capacity if I did it myself so it’d just sit idle, like it does today.

CPU usage is higher than I mentioned above I believe because of a bug in VMware Server 2.0 that causes CPU to “leak” somehow, which results in a steady, linear increase in cpu usage over time. I reported it to the forums, but didn’t get a reply, and don’t care enough to try to engage VMware support, they didn’t help me much with ESX and a support contract, they would do even less for VMware server and no support contract.

I signed up for Terremark’s vCloud Express program a couple of months ago, installed a fresh Debian 5.0 VM, and synchronized my data over to it from one of my existing co-located VMs.

So today I have officially transferred all of my services(except DNS) from one of my two co-located VMs to Terremark, and will run it for a while and see how the costs are, how it performs, reliability etc. My co-location contract is up for renewal in September so I have plenty of time to determine whether or not I want to make the jump, I’m hoping I can make it work, as it will be nice to not have to worry about hardware anymore. An excerpt from that link:

[..] My pager once went off in the middle of the night, bringing me out of an awesome dream about motorcycles, machine guns, and general ass-kickery, to tell me that one of the production machines stopped responding to ping. Seven or so hours later, I got an e-mail from Amazon that said something to the effect of:

There was a bad hardware failure. Hope you backed up your shit.

Look at it this way: at least you don’t have a tapeworm.

-The Amazon EC2 Team

I’ll also think long and hard, and probably consolidate both of my co-located VMs into a single VM at Terremark if I do go that route, which will save me a lot, I really prefer two VMs, but I don’t think I should be charged double for two, especially when two are going to use roughly the same amount of resources as one. They talk all about “pay for what you use”, when that is not correct, the only portion of their service that is pay for what you use is bandwidth. Everything else is “pay as you provision”. So if you provision 100GB and a 4CPU VM but you never turn it on, well your still going to pay for it.

The model needs significant work, hopefully it will improve in the future, all of these cloud companies are trying to figure out this stuff still. I know some people at Terremark and will pass this along to them to see what they think. Terremark is not alone in this model, I’m not picking on them for any reason other than I use their services. I think in some situations it can make sense. But the use cases are pretty low at this point. You probably know that I wouldn’t sign up and commit to such a service unless I thought it could provide some good value!

Part of the issue may very well be limitations in the hypervisor itself with regards to reporting actual usage, as VMware and others improve their instrumentation of their systems that could improve the cost model for customers signficantly, perhaps doing things like charging based on CPU usage based on a 95% model like we measure bandwidth. And being able to do things like cost capping, where if your resource usage is higher for an extended period the provider can automatically throttle your system(s) to keep your bill lower(at your request of course).

Another idea would be more accurate physical to virtual mapping, where I can provision say 1 physical CPU, and X amount of memory and then provision unlimited VMs inside that one CPU core and memory. Maybe I just need 1:1, or maybe my resource usage is low enough that I can get 5:1 or 10:1, afterall one of the biggest benefits of virtualization is being able to better isolate workloads. Terremark already does this to some degree on their enterprise products, but this model isn’t available for vCloud Express, at least not yet.

You know what surprised me most next to the charges for IP services, was how cheap enterprise storage is for these cloud companies. I mean $10/mo for 40GB of space on a high end storage array? I can go out and buy a pretty nice server to host VMs at a facility of my choosing, but if I want a nice storage array to back it up I’m looking at easily 10s of thousands of dollars. I just would of expected storage to be a bigger piece of the pie when it came to overall costs. When in my case it can be as low as 3-5% of the total cost over a 3 year period.

And despite Terremark listing Intel as a partner, my VM happens to be running on -you guessed it – AMD:

yehat:/var/log# cat /proc/cpuinfo
processorÂ Â Â : 0
vendor_idÂ Â Â : AuthenticAMD
cpu familyÂ Â Â : 16
modelÂ Â Â Â Â Â : 4
model nameÂ Â Â : Quad-Core AMD Opteron(tm) Processor 8389
steppingÂ Â Â : 2
cpu MHzÂ Â Â Â Â Â : 2913.037

AMD get’s no respect I tell ya, no respect! 🙂

I really want this to work out.

Comments Off

April 1, 2010

New IBM blades based on Intel 7500 announced

Filed under: News,Virtualization — Tags: 7500, bladecenter, blades, ibm, intel, xeon — Nate @ 7:46 pm

The Register had the scoop a while back, but apparently today they were officially announced. IBM did some trickery with the new 7500 series Intel Xeons to accomplish two things:

Expand the amount of memory available to the system
Be able to “connect” two dual socket blades to form a single quad socket system

Pretty creative, though the end result wasn’t quite as impressive as it sounded up front. Their standard blade chassis is 9U and has 14 slots on it.

Each blade is dual socket, maximum 16 cores, and 16 DIMMs
Each memory extender offers 24 additional DIMMs

So for the chassis as a whole your talking about 7 dual socket systems with 40 DIMMs each. Or 3 quad socket systems with 80 DIMMs each, and 1 dual socket with 40.

Compared to an Opteron 6100 system, which you can get 8 quad socket systems with 48 DIMMs each in a single enclosure(granted such a system has not been announced yet but I am confident it will be).

Intel 7500-based system: 112 CPU cures (1.8Ghz), 280 DIMM slots – 9U
Opteron 6100-based system: 384 CPU cores (2.2Ghz), 384 DIMM slots – 10U

And the price of the IBM system is even less impressive –

In a base configuration with a single four-core 1.86 GHz E7520 processor and 8 GB of memory, the BladeCenter HX5 blade costs $4,629. With two of the six-core 2 GHz E7540 processors and 64 GB of memory, the HX5 costs $15,095.

They don’t seem to show pricing for the 8 core 7500-based blade, and say there is no pricing or ETA on the arrival of the memory extenders.

They do say this which is interesting (not surprising) –

The HX5 blade cannot support the top-end eight-core Xeon 7500 parts, which have a 130 watt thermal design point, but it has been certified to support the eight-core L7555, which runs at 1.86 GHz, has 24 MB of L3 cache, and is rated at 95 watts.

I only hope AMD has enough manufacturing capacity to keep up with demand, Opteron 6100s will wipe the floor with the Intel chips on price/performance (for the first time in a while).

Comments Off

March 29, 2010

Opteron 6100s are here

Filed under: News,Virtualization — Tags: 6100, amd, opteron — Nate @ 8:28 am

UPDATED I’ve been waiting for this for quite some time, finally the 12-core AMD Opteron 6100s have arrived. AMD did the right thing this time by not waiting to develop a “true” 12-core chip and instead bolted a pair of CPUs together into a single package. You may recall AMD lambasted Intel when it released it’s first four core CPUs a few years ago(composed of a pair of two-core chips bolted together), a strategy that paid off well for them, AMD’s market share was hurt badly as a result, a painful lesson which they learned from.

For me I’d of course rather have a “true” 12-core processor, but I’m very happy to make do with these Opteron 6100s in the meantime, I don’t want to have to wait another 2-3 years to get 12 cores in a socket.

Some highlights of the processor:

Clock speeds ranging from 1.7Ghz(65W) to 2.2Ghz(80W), with a turbo boost 2.3Ghz model coming in at 105W
Prices ranging from $744 to $1,396 in 1,000-unit quantities
Twelve-core and Eightâ€“core, L2 â€“ 512K/core, L3 – 12MB of shared L3 Cache
Quad-Channel LV & U/RDDR3, ECC, support for on-line spare memory
Supports up to 3 DIMMs/channel, up to 12 DIMMS per CPU
Quad 16-bit HyperTransportâ„¢ 3 technology (HT3) links, up to 6.4 GT/s per link (more than triple HT1 performance)
AMD SR56x0 chipset with I/O Virtualization and PCIeÂ® 2.0
Socket compatibility with planned AMD Opteronâ„¢ 6200 Series processor.(16 cores?)
New advanced idle states allowing the processor to idle with less power usage than the previous six core systems (AMD seems to have long had the lead in idle power conservation).

The new I/O virtualization looks quite nice as well – AMD-V 2.0, from their site:

Hardware features that enhance virtualization:

Unmatched Memory Bandwidth and Scalability â€“ Direct Connect Architecture 2.0 supports a larger number of cores and memory channels so you can configure robust virtual machines, allowing your virtual servers to run as close as possible to physical servers.

Greater I/O virtualization efficiencies â€“I/O virtualization to help increase I/O efficiency by supporting direct device assignment, while improving address translation to help improve the levels of hypervisor intervention.

Improved virtual machine integrity and security â€“With better isolation of virtual machines through I/O virtualization, helps increase the integrity and security of each VM instance.

Efficient Power Management â€“ AMD-P technology is a suite of power management features that are designed to drive lower power consumption without compromising performance. For more information on AMD-P, click here

Hardware-assisted Virtualization â€“ AMD-V technology to enhance and accelerate software-based virtualization so you can run more virtual machines, support more users and transactions per virtual machine with less overhead. This includes Rapid Virtualization Indexing (RVI) to help accelerate the performance of many virtualized applications by enabling hardware-based VM memory management. AMD-V technology is supported by leading providers of hypervisor and virtualization software, including Citrix, Microsoft, Red Hat, and VMware.

Extended Migration â€“ a hardware feature that helps virtualization software enable live migration of virtual machines between all available AMD Opteronâ„¢ processor generations. For a closer look at Extended Migration, follow this link.

With AMD returning to the chipset design business I’m happy with that as well, I was never comfortable with Nvidia as a server chipset maker.

The Register has a pair of great articles on the launch as well, though the main one I was kind of annoyed I had to scroll so much to get past the Xeon news, which I don’t think they had to go out of their way to recap with such detail in an article about the Opterons, but oh well.

I thought this was an interesting note on the recent Intel announcement of integrated silicon for encryption –

While Intel was talking up the fact that it had embedded cryptographic instructions in the new Xeon 5600s to implement the Advanced Encryption Standard (AES) algorithm for encrypting and decrypting data, Opterons have had this feature since the quad-core “Barcelona” Opterons came out in late 2007, er, early 2008.

And as for performance –

Generally speaking, bin for bin, the twelve-core Magny-Cours chips provide about 88 per cent more integer performance and 119 per cent more floating point performance than the six-core “Istanbul” Opteron 2400 and 8400 chips they replace..

AMD seems geared towards reducing costs and prices as well with –

The Opteron 6100s will compete with the high-end of the Xeon 5600s in the 2P space and also take the fight on up to the 4P space. But, AMD’s chipsets and the chips themselves are really all the same. It is really a game of packaging some components in the stack up in different ways to target different markets.

Sounds like a great way to keep costs down by limiting the amount of development required to support the various configurations.

AMD themselves also blogged on the topic with some interesting tidbits of information –

Youâ€™re probably wondering why we wouldnâ€™t put our highest speed processor up in this comparison. Itâ€™s because we realize that while performance is important, it is not the most important factor in server decisions.Â In most cases, we believe price and power consumption play a far larger role.

[..]

Power consumption â€“ Note that to get to the performance levels that our competitor has, they had to utilize a 130W processor that is not targeted at the mainstream server market, but is more likely to be used in workstations. Intel isnâ€™t forthcoming on their power numbers so we donâ€™t really have a good measurement of their maximum power, but their 130W TDP part is being beaten in performance by our 80W ACP part.Â It feels like the power efficiency is clearly in our court.Â The fact that we have doubled cores and stayed in the same power/thermal range compared to our previous generation is a testament to our power efficiency.

Price â€“ This is an area that I donâ€™t understand.Â Coming out of one of the worst economic times in recent history, why Intel pushed up the top Xeon X series price from $1386 to $1663 is beyond me.Â Customers are looking for more, not less for their IT dollar.Â In the comparison above, while they still canâ€™t match our performance, they really fall short in pricing.Â At $1663 versus our $1165, their customers are paying 42% more money for the luxury of purchasing a slower processor. This makes no sense.Â Shouldnâ€™t we all be offering customers more for their money, not less?

In addition to our aggressive 2P pricing, we have also stripped away the â€œ4P tax.â€ No longer do customers have to pay a premium to buy a processor capable of scaling up to 4 CPUs in a single platform.Â As of today, the 4P tax is effectively $0. Well, of course, that depends on you making the right processor choice, as I am fairly sure that our competitor will still want to charge you a premium for that feature.Â I recommend you donâ€™t pay it.

As a matter of fact, a customer will probably find that a 4P server, with 32 total cores (4 x 8-core) based on our new pricing, will not only perform better than our competitorâ€™s highest end 2P system, but it will also do it for a lower price. Suddenly, it is 4P for the masses!

While for the most part I am mainly interested in their 12-core chips, but I also see significant value in the 8 core chips, being able to replace a pair of 4 core chips with a single socket 8 core system is very appealing as well in certain situations. There is a decent premium on motherboards that need to support more than one socket. Being able to get 8, (and maybe even 12 cores) on a single socket system is just outstanding.

I also found this interesting –

Each one is capable of 105.6 Gigaflops (12 cores x 4 32-bit FPU instructions x 2.2GHz).Â And that score is for the 2.2GHz model, which isnâ€™t even the fastest one!

I still have a poster up on one of my walls back from 1995-1996 era on the world’s first Teraflop machine, which was –

The one-teraflops demonstration was achieved using 7,264 Pentium Pro processors in 57 cabinets.

With the same number of these new Opterons you could get 3/4ths of the way to a Petaflop.

SGI is raising the bar as well –

This means as many as 2,208 cores in a single rack of our Rackableâ„¢ rackmount servers. And in the SGI ICE Cube modular data center, our containerized data center environment, you can now scale within a single container to 41,760 cores! Of course, density is only part of the picture. Thereâ€™s as much to be excited about when it comes to power efficiency and the memory performance of SGI servers using AMD Opteron 6100 Series processor technology

Other systems announced today include:

HP DL165G7
HP SL165z G7
HP DL385 G7
Cray XT6 supercomputer
There is mention of a Dell R815 though it doesn’t seem to be officially announced yet. The R815 specs seem kind of underwhelming in the memory department, with it only supporting 32 DIMMs (the HP systems above support the full 12 DIMMs/socket). It is only 2U however. Sun has had 2U quad socket Opteron systems with 32 DIMMs for a couple years now in the form of the X4440, strange that Dell did not step up to max out the system with 48 DIMMs.

I can’t put into words how happy and proud I am of AMD for this new product launch, not only is it an amazing technological achievement, but the fact that they managed to pull it off on schedule is just amazing.

Congratulations AMD!!!

Comments Off

March 16, 2010

March 10, 2010

Save 50% off vSphere essentials for the next 90 days

Filed under: Virtualization — Tags: vmware, vsphere — Nate @ 3:00 pm

Came across this today, which mentions you can save about 50% when licensing vSphere essentials for the next ~90 days. As you may know Essentials is a really cheap way to get your vSphere stuff managed by vCenter. For your average dual socket 16-blade system as an example it is 91% cheaper(savings of ~$26,000) than going with vSphere Standard edition. Note that the vCenter included with Essentials needs to be thrown away if your managing more than three hosts with it. You’ll still need to buy vCenter standard (regardless of what version of vSphere you buy).

Comments Off

March 9, 2010

The Atomic Unit of Compute

Filed under: Virtualization — Tags: cloud — Nate @ 5:16 pm

I found this pretty fascinating, as someone who has been talking to several providers it certainly raises some pretty good points.

[..]Another of the challenges youâ€™ll face along the way of Cloud is that of how to measure exactly what it is you are offering. But having a look at what the industry is doing wonâ€™t give you much helpâ€¦ as with so many things in IT, there is no standard. Amazon have their EC2 unit, and state that it is roughly the equivalent of 1.0-1.2GHz of a 2007 Opteron or Xeon CPU. With Azure, Microsoft havenâ€™t gone down the same path â€“ their indicative pricing/sizing shows a base compute unit of 1.6GHz with no indication as to what is underneath. Rackspace flip the whole thing on itâ€™s head by deciding that memory is the primary resource constraint, therefore theyâ€™ll just charge for that and presumably give you as much CPU as you want (but with no indication as to the characteristics of the underlying CPU). Which way should you go? IMHO, none of the above.[..]

We need to have a standard unit of compute, that applies to virtual _and_ physical, new hardware and old, irrespective of AMD or Intel (or even SPARC or Power). And of course, itâ€™s not all just about GHz because all GHz are most definitely not equal and yes it _does_ matter to applications. And lets not forget the power needed to deliver those GHz.

In talking with Terremark it seems their model is around VMware resource pools where they allocate you a set amount of Ghz for your account. They have a mixture of Intel dual socket systems and AMD quad socket systems, and if you run a lot of multi vCPU VMs you have a higher likelihood of ending up in the AMD pool vs the Intel one. I have been testing their vCloud Express product for my own personal needs(1 vCPU, 1.5GB ram 50GB HD), and noticed that my VM is on one of the AMD quad socket systems.

Comments Off

March 1, 2010

The future of networking in hypervisors – not so bright

Filed under: Networking,Virtualization — Nate @ 10:15 pm

UPDATED Some networking companies see that they are losing control of the data center networks when it comes to blades and virtualization. One has reacted by making their own blades, others have come up with strategies and collaborating on standards to try to take back the network by moving the traffic back into the switching gear. Yet another has licensed their OS to have another company make blade switches on their behalf.

Where at least part of the industry wants to go is move the local switching out of the hypervisor and back into the Ethernet switches. Now this makes sense for the industry, because they are losing their grip on the network when it comes to virtualization. But this is going backwards in my opinion. Several years ago we had big chassis switches with centralized switch fabrics where(I believe, kind of going out on a limb here) if port 1 on blade 1 wanted to talk to port 2, then it had to go back to the centralized fabric before port 2 would see the traffic. That’s a lot of distance to travel. Fast forward a few years and now almost every vendor is advertising local switching. Which eliminates this trip. Makes things faster, and more scalable.

Another similar evolution in switching design was moving from backplane systems to midplane systems. I only learned about some of the specifics recently, prior to that I really had no idea what the difference was between a backplane and a midplane. But apparently the idea behind a midplane is to drive significantly higher throughput on the system by putting the switching fabric closer to the line cards. An inch here, an inch there could mean hundreds of gigabits of lost throughput or increased complexity/line noise etc in order to achieve those high throughput numbers. But again, the idea is moving the fabric closer to what needs it, in order to increase performance. You can see examples of a midplane systems in blades with the HP c7000 chassis, or in switches in the Extreme Black Diamond 20808(page 7). Both of them have things that plug into both the front and the back. I thought that was mainly due to space constraints on the front, but it turns out it seems more about minimizing the distance of connectivity between the fabric on the back and the thing using the fabric on the front. Also note that the fabric modules on the rear are horizontal while the blades on the front are vertical, I think this allows the modules to further reduce the physical distance between the fabric and the device at the other end by directly covering more slots, less distance to travel on the midplane.

Moving the switching out of the hypervisor, if VM #1 wants to talk to VM #2, having that go outside of the server and make a U-turn and come right back into it is stupid. Really stupid. It’s the industry grasping at straws trying to maintain control when they should be innovating. It goes against the two evolutions in switching designs I outlined above.

What I’ve been wanting to see myself is to integrate the switch into the server. Have a X GbE chip that has the switching fabric built into it. Most modern network operating systems are pretty modular and portable(a lot of them seem to be based on Linux or BSD). I say integrate it onto the blade for best performance, maybe use the distributed switch frame work(or come up with some other more platform independent way to improve management). The situation will only get worse in coming years, with VM servers potentially having hundreds of cores and TBs of memory at their disposal, your to the point now practically where you can fit an entire rack of traditional servers onto one hypervisor.

I know that for example Extreme uses Broadcom in most all of their systems, and Broadcom is what most server manufacturers use as their network adapters, even HP’s Flex10 seems to be based on Broadcom? How hard can it be for Broadcom to make such a chip(set) so that companies like Extreme (or whomever else might use Broadcom in their switches) could program it with their own stuff to make it a mini switch?

From the Broadcom press release above (2008):

To date, Broadcom is the only silicon vendor with all of the networking components (controller, switch and physical layer devices) necessary to build a complete end-to-end 10GbE data center. This complete portfolio of 10GbE network infrastructure solutions enables OEM partners to enhance their next generation servers and data centers.

Maybe what I want makes too much sense and that’s why it’s not happening, or maybe I’m just crazy.

UPDATE – I just wanted to clarify my position here, what I’m looking for is essentially to offload the layer 2 switching functionality from the hypervisor to a chip on the server itself. Whether it’s a special 10GbE adapter that has switching fabric or a dedicated add-on card which only has the switching fabric. Not interested in offloading layer 3 stuff, that can be handled upstream.Â Also interested in integrating things like ACLs, sFlow, QoS, rate limiting and perhaps port mirroring.

Comments (8)

ProCurve Not my favorite

Filed under: Networking,Virtualization — Nate @ 10:06 pm

I gotta find something new to talk about, after this..

I was thinking this evening and thought about my UCS/HP network shootout post I posted over the weekend and thought maybe I came across too strong in favor of HP’s networking gear.

As all three of you know, HP is not my favorite networking vendor. Not even my second favorite, or even my third.

But they do have some cool technology with this Virtualconnect stuff. I only wish blade interfaces were more standardized.

Comments Off

February 28, 2010

VMware dream machine

Filed under: Networking,Storage,Virtualization — Tags: 6100, amd, blades, c-class, hp, opteron, vmware — Nate @ 12:47 am

(Originally titled fourty eight all round, I like VMware dream machine more)

UPDATED I was thinking more about the upcoming 12-core Opterons and the next generation of HP c Class blades, and thought of a pretty cool configuration to have, hopefully it becomes available.

Imagine a full height blade that is quad socket, 48 cores (91-115Ghz), 48 DIMMs (192GB with 4GB sticks), 4x10Gbps Ethernet links and 2x4Gbps fiber channel links (total of 48Gbps of full duplex bandwidth). The new Opterons support 12 DIMMs per socket, allowing the 48 DIMM slots.

Why 4x10Gbps links? Well I was thinking why not.. with full height blades you can only fit 8 blades in a c7000 chassis. If you put a pair of 2x10Gbps switches in that gives you 16 ports. It’s not much more $$ to double up on 10Gbps ports. Especially if your talking about spending upwards of say $20k on the blade(guesstimate) and another $9-15k blade on vSphere software per blade. And 4x10Gbps links gives you up to 16 virtual NICs using VirtualConnect per blade, each of them adjustable in 100Mbps increments.

Also given the fact that it is a full height blade, you have access to two slots worth of I/O, which translates into 320Gbps of full duplex fabric available to a single blade.

That kind of blade ought to handle just about anything you can throw at it. It’s practically a super computer in of itself. Right now HP holds the top spot for VMark scores, with a 8 socket 6 core system(48 total cores) out pacing even a 16 socket 4 core system(64 total cores).

The 48 CPU cores will give the hypervisor an amazing number of combinations for scheduling vCPUs. Here’s a slide from a presentation I was at last year which illustrates the concept behind the hypervisor scheduling single and multi vCPU VMs:

There is a PDF out there from VMware that talks about the math formulas behind it all, it has some interesting commentary on CPU scheduling with hypervisors:

[..]Extending this principle, ESX Server installations with a greater number of physical CPUs offer a greater chance of servicing competing workloads optimally. The chance that the scheduler can find room for a particular workload without much reshuffling of virtual machines will always be better when the scheduler has more CPUs across which it can search for idle time.

This is even cooler though, honestly I can’t pretend to understand the math myself! –

Scheduling a two-VCPU machine on a two-way physical ESX Server hosts provides only one possible allocation for scheduling the virtual machine. The number of possible scheduling opportunities for a two-VCPU machine on a four-way or eight-way physical ESX Server host is described by combinatorial mathematics using the formula N! / (R!(N-R)!) where N=the number of physical CPUs on the ESX Server host and R=the number of VCPUs on the machine being scheduled.1 A two-VCPU virtual machine running on a four-way ESX Server host provides (4! / (2! (4-2)!) which is (4*3*2 / (2*2)) or 6 scheduling possibilities. For those unfamiliar with combinatory mathematics, X! is calculated as X(X-1)(X-2)(X-3)â€¦. (X- (X-1)). For example 5! = 5*4*3*2*1.

Using these calculations, a two-VCPU virtual machine on an eight-way ESX Server host has (8! / (2! (8-2)!) which is (40320 / (2*720)) or 28 scheduling possibilities. This is more than four times the possibilities a four-way ESX Server host can provide. Four-vCPU machines demonstrate this principle even more forcefully. A four-vCPU machine scheduled on a four-way physical ESX Server host provides only one possibility to the scheduler whereas a four-VCPU virtual machine on an eight-CPU ESX Server host will yield (8! / (4!(8-4)!) or 70 scheduling possibilities, but running a four-vCPU machine on a sixteen-way ESX Server host will yield (16! / (4!(16-4)!) which is (20922789888000 / ( 24*479001600) or 1820 scheduling possibilities. That means that the scheduler has 1820 unique ways in which it can place the four-vCPU workload on the ESX Server host. Doubling the physical CPU count from eight to sixteen results in 26 times the scheduling flexibility for the four-way virtual machines. Running a four-way virtual machine on a Host with four times the number of physical processors (16-way ESX Server host) provides over six times more flexibility than we saw with running a two-way VM on a Host with four times the number of physical processors (8-way ESX Server host).

Anyone want to try to extrapolate that and extend it to a 48-core system? 🙂

It seems like only yesterday that I was building DL380G5 ESX 3.5 systems with 8 CPU cores and 32GB of ram, with 8x1Gbps links thinking of how powerful they were. This would be six of those in a single blade. And only seems like a couple weeks ago I was building VMware GSX systems with dual socket single core systems and 16GB ram..

So, HP do me a favor and make a G7 blade that can do this, that would make my day! I know fitting all of those components on a single full height blade won’t be easy. Looking at the existingÂ BL685c blade, it looks like they could do it, remove the internal disks(who needs em, boot from SAN or something), and put an extra 16 DIMMs for a total of 48.

I thought about using 8Gbps fiber channel but then it wouldn’t be 48 all round 🙂

UPDATE Again I was thinking about this and wanted to compare the costs vs existing technology. I’m estimating roughly a $32,000 price tag for this kind of blade and vSphere Advanced licensing (note you cannot use Enterprise licensing on a 12-core CPU, hardware pricing extrapolated from existing HP BL685G6 quad socket 6 core blade system with 128GB ram). The approximate price of an 8-way 48-core HP DL785 with 192GB, 4x10GbE and 2x4Gb Fiber with vSphere licensing comes to about roughly $70,000 (because VMWare charges on a per socket basis the licensing costs go up fast). Not only that but you can only fit 6 of these DL785 servers in a 42U rack, and you can fit 32 of these blades in the same rack with room to spare. So less than half the cost, and 5 times the density(for the same configuration). The DL785 has an edge in memory slot capacity, which isn’t surprising given its massive size, it can fit 64 DIMMs vs 48 on my VMware dream machine blade.

Compared to a trio of HP BL495c blades each with 12 cores, and 64GB of memory, approximate pricing for that plus advanced vSphere is $31,000 for a total of 36 cores and 192GB of memory. So for $1,000 more you can add an extra 12 cores, cut your server count by 66%, probably cut your power usage by some amount and improve consolidation ratios.

So to summarize, two big reasons for this type of solution are:

More efficient consolidation on a per-host basis by having less “stranded” resources
More efficient consolidation on a per-cluster basis because you can get more capacity in the 32-node limit of a VMware cluster(assuming you want to build a cluster that big..) Again addressing the “stranded capacity” issue. Imagine what a resource pool could do with 3.3 Thz of compute capacity and 9.2TB of memory? All with line rate 40Gbps networking throughout? All within a single cabinet ?

Pretty amazing stuff to me anyways.

[For reference – Enterprise Plus licensing would add an extra $1250/socket plus more in support fees. VMware support costs not included in above pricing.]

END UPDATE

Comments (5)

« Newer Posts — Older Posts »

TechOpsGuys.com Diggin' technology every day

April 9, 2010

April 3, 2010

April 1, 2010

March 29, 2010

Hardware features that enhance virtualization:

Congratulations AMD!!!

March 16, 2010

March 10, 2010

March 9, 2010

March 1, 2010

February 28, 2010