TechOpsGuys.com Diggin' technology every day

10Apr/13Off

HP Project Moonshot micro servers

TechOps Guy: Nate

HP made a little bit of headlines recently when they officially unveiled their first set of ultra dense micro servers, under the product name Moonshot. Originally speculated as likely being an ARM-platform, it seems HP has surprised many in making this first round of products Intel Atom based.

Picture of HP Moonshot chassis with 45 servers

They are calling it the world's first software defined server. Ugh. I can't tell you how sick I feel whenever I hear the term software defined <insert anything here>.

In any case I think AMD might take issue with that, with their SeaMicro unit which they acquired a while back. I was talking with them as far back as 2009 I believe and they had their high density 10U virtualized Intel Atom-based platform(I have never used Seamicro though knew a couple folks that worked there). Complete with integrated switching, load balancing and virtualized storage(the latter two HP is lacking).

Unlike legacy servers, in which a disk is unalterably bound to a CPU, the SeaMicro storage architecture is far more flexible, allowing for much more efficient disk use. Any disk can mount any CPU; in fact, SeaMicro allows disks to be carved into slices called virtual disks. A virtual disk can be as large as a physical disk or it can be a slice of a physical disk. A single physical disk can be partitioned into multiple virtual disks, and each virtual disk can be allocated to a different CPU. Conversely, a single virtual disk can be shared across multiple CPUs in read-only mode, providing a large shared data cache. Sharing of a virtual disk enables users to store or update common data, such as operating systems, application software, and data cache, once for an entire system

Really the technology that SeaMicro has puts the Moonshot Atom systems to shame. SeaMicro has the advantage that this is their 2nd or 3rd (or perhaps more) generation product. Moonshot is on it's first gen.

Picture of Seamicro chassis with 256 servers

Moonshot provides 45 hot pluggable single socket dual core Atom processors, each with 8GB of memory and a single local disk in a 4.5U package.

SeaMicro provides up to 256 sockets of dual core Atom processors, each with 4GB of memory and virtualized storage. Or you can opt for up to 64 sockets of either quad core Intel Xeon or eight core AMD Opteron, with up to 64GB/system (32GB max for Xeon). All of this in a 10U package.

Let's expand a bit more - Moonshot can get 450 servers(900 cores) and 3.6TB of memory in a 47U rack. SeaMicro can get 1,024 servers (2,048 cores) and 4TB of memory in a 47U rack. If that is not enough memory you could switch to Xeon or Opteron with similar power profile, at the high end 2,048 Opteron(AMD uses a custom Opteron 4300 chip in the Seamicro system - a chip not available for any other purpose) cores with 16TB of memory.  Or maybe you mix/match .. There is also fewer systems to manage - HP having 10 systems, and Sea Micro having 4 per rack. I harped on HP's SL-series a while back for similar reasons.

Seamicro also has dedicated external storage which I believe extends upon the virtualization layer within the chassis but am not certain.

All in all it appears Seamicro has been years ahead of Moonshot before Moonshot ever hit the market. Maybe HP should of scrapped Moonshot and taken out Seamicro when they had the chance.

At the end of the day I don't see anything to get excited about with Moonshot - unless perhaps it's really cheap (relative to Seamicro anyway). The micro server concept is somewhat risky in my opinion. I mean if you really got your workload nailed down to something specific and you can fit it into one of these designs then great. Obviously the flexibility of such micro servers is very limited. Seamicro of course wins here too, given that an 8 core Opteron with 64GB of memory is quite flexible compared to the tiny Atom with tiny memory.

I have seen time and time again people get excited about this and say oh how they can get so many more servers per watt vs the higher end chips. Most of the time they forget to realize how few workloads are CPU bound, and simply slapping a hypervisor on top of a system with a boatload of memory can get you significantly more servers per watt than a micro server could hope to achieve. HOWEVER, if your workload can effectively exploit the micro servers, drive utilization up etc, then it can be a real good solution -- in my experience those sorts of workloads are the exception rather than the rule, I'll put it that way.

It seems that HP is still evaluating whether or not to deploy ARM processors in Moonshot - in the end I think they will - but won't have a lot of success - the market is too niche. You really have to go to fairly extreme lengths to have a true need for something specialized like ARM. The complexities in software compatibility are not trivial.

I think HP will not have an easy time competing in this space. The hyper scale folks like Rackspace, Facebook, Google, Microsoft etc all seem to be doing their own thing, and are unlikely to purchase much from HP. At the same time there of course is Seamicro, amongst other competitors (Dell DCS etc) who are making similar systems. I really don't see anything that makes Moonshot stand out, at least not at this point. Maybe I am missing something.

14Nov/11Off

AMD Launches Opteron 6200s

TechOps Guy: Nate

UPDATED I have three words:

About damn time.

I've been waiting for a long time for these, was expecting them months ago, had to put in orders with Opteron 6100s a few weeks ago because I couldn't wait any longer for the 6200s. Sigh. I'm half hoping I can get HP to exchange my 6100s for 6200s since the 6100s are still sitting in boxes. Though that may be being too hopeful given my time line for deployment. One thing's for sure though, if HP can pull it off they'll make my decision on which version of vSphere to go with pretty easy since vSphere 4 tops out at 12 cores.

AMD has finally launched the 6200, which everyone knows is the world's first 16-core x86-64 processor, and is socket compatible with the 6100 processor which launched over a year ago providing an easy upgrade path.

I'm just running through some of the new stuff now, one feature which is nice and I believe I mentioned it a while ago is the TDP Cap, which allows a user to set the maximum power usage of the processor, basically more granular control than technologies that were used previous to it. I don't believe it has the ability to dynamically turn cores on and off based on this value though which is unfortunate - maybe next time. Excluding the new turbo core support which is different technology.

AMD Turbo Core

I thought this was pretty cool, I was just reading about it in their slide deck. I thought, at first it was going to be similar to the Intel Turbo or IBM Turbo technology where, if I recall right (don't quote me), the system can more or less shut off all the other cores on the socket and turbo charge a single core to super sonic speeds. AMD Turbo core operates on all cores simultaneously by between 300-500Mhz if the workload fits the power envelope of the processor. It can do the same for half of the on board cores but instead of 300-500Mhz boost the frequency by up to 1Ghz.

Memory Enhancements

It also supports higher performance memory as well as something called LR-DIMMs, which I had never heard of before. Load Reduced DIMMs seem they allow you to add more memory to the system. Even after reading the stuff on Micron's site I'm not sure of the advantage.

I recall on the 6100 there was a memory performance hit when you utilized all 12 memory slots per CPU socket (vs using only 8/socket). I don't see whether this is different on the 6200 or not.

Power and Performance

The highest end, lowest power Opteron 6100 seems to be the 6176 (not to be confused with the 6176 SE). The 6176 (by itself) is not even mentioned on AMD's site (though it is on HP's site and my recent servers have it). It is a 2.3Ghz 12-core 80W (115W TDP) processor. It seems AMD has changed their power ratings from the ACP they were using before to the TDP (what Intel uses). If I recall right ACP was something like average processor power usage, vs TDP is peak usage(?).

The 6276 is the new high end lower power option, which is a 16-core 2.3Ghz processor with the same power usage. So they managed to squeeze in an extra 9.2Ghz worth of processing power in the same power envelope. That's pretty impressive.

There's not a lot of performance metrics out at this stage, but here's something I found on AMD's site:

SPEC Int rate_base2006 Mainstream CPUs

That's a very good price/performance ratio. This graph is for "mainstream CPUs" that is CPUs with "normal" power usage, not ultra high end CPUs which consume a lot more power. Those are four socket systems so for the CPUs alone on the high end from Intel would run $8,236, and from AMD $3,152. Then there is the motherboard+chipset from Intel which will carry a premium over AMD as well since Intel has different price/scalability bands for their processors between their two socket and four socket systems (where AMD does not, though with Intel you can now get two socket versions of servers with the latest Intel processors they still seem to carry a decent premium since I believe they use the same chipsets as the four socket boxes the two socket versions are made more for memory capacity bound workloads rather than CPU bound).

They have floating point performance too though for the stuff I do floating point doesn't really matter, more useful probably for SGI and Cray and their super computers.

It's not the 3.5Ghz that AMD was talking about but I trust that is coming..at some point. AMD has been having some manufacturing issues recently which probably was the main cause for the delays of the 6200, hopefully they get those worked out in short order.

HP has already updated their stuff to reflect support for the latest processors in their existing platforms.

From HP's site, here are the newest 16 core processors:

  • 6282SE (2.6GHz/16-core/16MB/140W TDP) Processor
  • 6276 (2.3GHz/16-core/16MB/115W TDP) Processor
  • 6274 (2.2GHz/16-core/16MB/115W TDP) Processor
  • 6272 (2.1GHz/16-core/16MB/115W TDP) Processor
  • 6262HE (1.6GHz/16-core/16MB/85W TDP) Processor

Few more stats -

  • L1 CPU Cache slashed from 128kB to 48kB (total 1,536kB to 768kB)
  • L2 CPU Cache increased from 512kB to 1,000 kB (total 6,144kB to 12,000kB)
  • L3 CPU Cache increased from 12,288 kB to 16,384 kB (1,024kB per core for both procs)
  • Memory controller clock speed increased from 1.8Ghz to 2Ghz
  • CMOS process shrunk from 45nm to 32nm

Interesting how they shifted focus away from the L1 cache to the L2 cache.

Anyone know how many transistors are on this thing? And how many were on the 6100 ? How about on some of the recent Intel chips?

Now to go figure out how much these things actually cost and what the lead times are.

UPDATE - I know pricing at least now, the new 16 core procs are, as the above graph implies actually cheaper than the 12-core versions! That's just insane, how often does that happen?!?!

Bottom line

With so many things driving virtualization these days, and with such high consolidation ratios, especially with workloads that are not CPU constrained(which are most), myself I like the value that the 6000-series AMD chips give, especially the number of raw cores without hyperthreading. The AMD 6000 platform is the first AMD platform I have really, truly liked I want to say going back a long, long ways. I'll admit I was mistaken in my ways for a few years when I was on the Intel bandwagon. Though I have been on the 'give me more cores' bandwagon ever since the first Intel quad core processor. Now that AMD has the most cores, on a highly efficient platform, I suppose I gravitate towards them now. There are limits to how far you go to get cores of course, I'm not sure what my limit is. I've mentioned in the past I wouldn't be interested in something like a 48x200Mhz CPU for example. The Opteron 6000 has a nice balance of per-core performance (certainly can't match Intel's per core performance but it's halfway decent especially given the price), and many, many cores.

Three blog posts in one morning, busy morning!

Tagged as: , , Comments Off
24Feb/11Off

16-core 3.5Ghz Opterons coming?

TechOps Guy: Nate

Was just reading an article from our friends at The Register about some new news on the upcoming Opteron 6200 (among other chips), it seems AMD is cranking up both the cores and clock speeds in the same power evelope, the smaller manufacturing process certainly does help! I think they're going from 45nm to 32nm.

McIntyre said that AMD was targeting clock speeds of 3.5 GHz and higher with the Bulldozer cores within the same power envelop as the current Opteron 4100 and 6100 processors.

Remember that the 6200 is socket compatible with the 6100!

Can you imagine a blade chassis with 512 x 3.5Ghz CPU cores and 4TB of memory in only 10U of space drawing roughly 7,000 watts peak ? Seems unreal ..but sounds like it's already on it's way.

Tagged as: , 1 Comment
9Nov/10Off

Next Gen Opterons — to 20 cores and beyond?

TechOps Guy: Nate

So I came across this a short time ago, but The Register has a lot more useful information here.

From AMD

The server products (“Interlagos” and “Valencia”) will first begin production in Q2 2011, and we expect to launch them in Q3 2011. [This includes the Opteron 6100 socket compatible 16-core Opteron 6200]

[..]

Since Bulldozer is designed to fit into the same power/thermal envelope as our current AMD Opteron™ 6100/4100 series processors we obviously have some new power tricks up our sleeve.  One of these is the new CC6 power state, which powers down an entire module when it is idle. That is just one of the new power innovations that you’ll see with Bulldozer-based processors.

[..]

We have disclosed that we would include AMD Turbo CORE technology in the past, so this should not be a surprise to anyone. But what is news is the uplift – up to 500MHz with all cores fully utilized. Today’s implementations of boost technology can push up the clock speed of a couple of cores when the others are idle, but with our new version of Turbo CORE you’ll see full core boost, meaning an extra 500MHz across all 16 threads for most workloads.

[..]

We are anticipating about a 50% increase in memory throughput with our new “Bulldozer” integrated memory controller.

From The register

Newell showed off the top-end "Terramar" Opteron, which will have up to 20 of a next-generation Bulldozer cores in a single processor socket, representing a 25 percent boost in cores from the top-end Interlagos parts, and maybe a 35 to 40 per cent boost in performance if the performance curve stays the same as the jump from twelve-core "Magny-Cours" Opteron 6100s to the Interlagos chips.

[..]

That said, AMD is spoiling for a fight about chip design in a way that it hasn't been since the mid-2000s.

[..]

with Intel working on its future "Sandy Bridge" and "Ivy Bridge" Xeon processors for servers, and facing an architecture shift in the two-socket space in 2011 that AMD just suffered through in 2010.

Didn't Intel just go through an architecture shift in the two socket space last year with the Xeon 5500s and their integrated memory controller? And they are shifting architectures again so soon? Granted I haven't really looked into what these new Intel things have to offer.

I suppose my only question is, will VMware come up with yet another licensing level to go beyond 12 cores per socket? It's kind of suspicious that both vSphere Advanced and Enterprise plus are called out at a limit of 12 cores per socket.

Tagged as: , , Comments Off
7Oct/10Off

Testing the limits of virtualization

TechOps Guy: Nate

You know I'm a big fan of the AMD Opteron 6100 series processor, also a fan of the HP c class blade system, specifically the BL685c G7 which was released on June 21st. I was and am very excited about it.

It is interesting to think, it really wasn't that long ago that blade systems still weren't all that viable for virtualization primarily because they lacked the memory density, I mean so many of them offered a paltry 2 or maybe 4 DIMM sockets. That was my biggest complaint with them for the longest time. About a year or year and a half ago that really started shifting. We all know that Cisco bought some small startup a few years ago that had their memory extender ASIC but well you know I'm not a Cisco fan so won't give them any more real estate in this blog entry, I have better places to spend my mad typing skills.

A little over a year ago HP released their Opteron G6 blades, at the time I was looking at the half height BL485c G6 (guessing here, too lazy to check). It had 16 DIMM sockets, that was just outstanding. I mean the company I was with at the time really liked Dell (you know I hate Dell by now I'm sure), I was poking around their site at the time and they had no answer to that(they have since introduced answers), the highest capacity half height blade they had at the time anyways was 8 DIMM sockets.

I had always assumed that due to the more advanced design in the HP blades that you ended up paying a huge premium, but wow I was surprised at the real world pricing, more so at the time because you needed of course significantly higher density memory modules in the Dell model to compete with the HP model.

Anyways fast forward to the BL685c G7 powered by the Opteron 6174 processor, a 12-core 2.2Ghz 80W processor.

Load a chassis up with eight of those:

  • 384 CPU cores (860Ghz of compute)
  • 4 TB of memory (512GB/server w/32x16GB each)
  • 6,750 Watts @ 100% load (feel free to use HP dynamic power capping if you need it)

I've thought long and hard over the past 6 months on whether or not to go 8GB or 16GB, and all of my virtualization experience has taught me in every case I'm memory(capacity) bound, not CPU bound. I mean it wasn't long ago we were building servers with only 32GB of memory on them!!!

There is indeed a massive premium associated with going with 16GB DIMMs but if your capacity utilization is anywhere near the industry average then it is well worth investing in those DIMMs for this system, your cost of going from 2TB to 4TB of memory using 8GB chips in this configuration makes you get a 2nd chassis and associated rack/power/cooling + hypervisor licensing. You can easily halve your costs by just taking the jump to 16GB chips and keeping it in one chassis(or at least 8 blades - maybe you want to split them between two chassis I'm not going to get into that level of detail here)

Low power memory chips aren't available for the 16GB chips so the power usage jumps by 1.2kW/enclosure for 512GB/server vs 256GB/server. A small price to pay, really.

So onto the point of my post - testing the limits of virtualization. When your running 32, 64, 128 or even 256GB of memory on a VM server that's great, you really don't have much to worry about. But step it up to 512GB of memory and you might just find yourself maxing out the capabilities of the hypervisor. At least in vSphere 4.1 for example you are limited to only 512 vCPUs per server or only 320 powered on virtual machines. So it really depends on your memory requirements, If your able to achieve massive amounts of memory de duplication(myself I have not had much luck here with linux it doesn't de-dupe well, windows seems to dedupe a lot though), you may find yourself unable to fully use the memory on the system, because you run out of the ability to fire up more VMs ! I'm not going to cover other hypervisor technologies, they aren't worth my time at this point but like I mentioned I do have my eye on KVM for future use.

Keep in mind 320 VMs is only 6.6VMs per CPU core on a 48-core server. That to me is not a whole lot for workloads I have personally deployed in the past. Now of course everybody is different.

But it got me thinking, I mean The Register has been touting off and on for the past several months every time a new Xeon 7500-based system launches ooh they can get 1TB of ram in the box. Or in the case of the big new bad ass HP 8-way system you can get 2TB of ram. Setting aside the fact that vSphere doesn't go above 1TB, even if you go to 1TB I bet in most cases you will run out of virtual CPUs before you run out of memory.

It was interesting to see, in the "early" years the hypervisor technology really exploiting hardware very well, and now we see the real possibility of hitting a scalability wall at least as far as a single system is concerned. I have no doubt that VMware will address these scalability issues it's only a matter of time.

Are you concerned about running your servers with 512GB of ram? After all that is a lot of "eggs" in one basket(as one expert VMware consultant I know & respect put it). For me at smaller scales I am really not too concerned. I have been using HP hardware for a long time and on the enterprise end it really is pretty robust. I have the most concerns about memory failure, or memory errors. Fortunately HP has had Advanced ECC for a long time now(I think I remember even seeing it in the DL360 G2 back in '03).

HP's Advanced ECC spreads the error correcting over four different ECC chips, and it really does provide quite robust memory protection. When I was dealing with cheap crap white box servers the #1 problem BY FAR was memory, I can't tell you how many memory sticks I had to replace it was sick. The systems just couldn't handle errors (yes all the memory was ECC!).

By contrast, honestly I can't even think of a time a enterprise HP server failed (e.g crashed) due to a memory problem. I recall many times the little amber status light come on and I log into the iLO and say, oh, memory errors on stick #2, so I go replace it. But no crash! There was a firmware bug in the HP DL585G1s I used to use that would cause them to crash if too many errors were encountered, but that was a bug that was fixed years ago, not a fault with the system design. I'm sure there have been other such bugs here and there, nothing is perfect.

Dell introduced their version of Advanced ECC about a year ago, but it doesn't (or at least didn't maybe it does now) hold a candle to the HP stuff. The biggest issue with the Dell version of Advanced ECC was if you enabled it, it disabled a bunch of your memory sockets! I could not get an answer out of Dell support at the time at least why it did that. So I left it disabled because I needed the memory capacity.

So combine Advanced ECC with ultra dense blades with 48 cores and 512GB/memory a piece and you got yourself a serious compute resource pool.

Power/cooling issues aside(maybe if your lucky you can get in to SuperNap down in Vegas) you can get up to 1,500 CPU cores and 16TB of memory in a single cabinet. That's just nuts! WAY beyond what you expect to be able to support in a single VMware cluster(being that your limited to 3,000 powered on VMs per cluster - the density would be only 2 VMs/core and 5GB/VM!)

And if you manage to get a 47U rack, well you can get one of those c3000 chassis in the rack on top of the four c7000 and get another 2TB of memory and 192 cores. We're talking power kicking up into the 27kW range in a single rack! Like I said you need SuperNap or the like!

Think about that for a minute, 1,500 CPU cores and 16TB of memory in a single rack. Multiply that by say 10 racks. 15,000 CPU cores and 160TB of memory. How many tens of thousands of physical servers could be consolidated into that? A conservative number may be 7 VMs/core, your talking 105,000 physical servers consolidated into ten racks. Well excluding storage of course. Think about that! Insane! I mean that's consolidating multiple data centers into a high density closet! That's taking tens to hundreds of megawatts of power off the grid and consolidating it into a measly 250 kW.

I built out, what was to me some pretty beefy server infrastructure back in 2005, around a $7 million project. Part of it included roughly 300 servers in roughly 28 racks. There was 336kW of power provisioned for those servers.

Think about that for a minute. And re-read the previous paragraph.

I have thought for quite a while because of this trend, the traditional network guy or server guy is well, there won't be as many of them around going forward. When you can consolidate that much crap in that small of a space, it's just astonishing.

One reason I really do like the Opteron 6100 is the cpu cores, just raw cores. And they are pretty fast cores too. The more cores you have the more things the hypervisor can do at the same time, and there is no possibilities of contention like there are with hyperthreading. CPU processing capacity has gotten to a point I believe where raw cpu performance matters much less than getting more cores on the boxes. More cores means more consolidation. After all industry utilization rates for CPUs are typically sub 30%. Though in my experience it's typically sub 10%, and a lot of times sub 5%. My own server sits at less than 1% cpu usage.

Now fast raw speed is still important in some applications of course. I'm not one to promote the usage of a 100 core CPU with each core running at 100Mhz(10Ghz), there is a balance that has to be achieved, and I really do believe the Opteron 6100 has achieved that balance, I look forward to the 6200(socket compatible 16 core). Ask anyone that has known me this decade I have not been AMD's strongest supporter for a very long period of time. But I see the light now.

7Sep/10Off

Only HP has it

TechOps Guy: Nate

I commented in response to an article on The Register recently but figure I'm here writing stuff might as well bring this up to.

Unless you've been living under a rock and/or not reading this site you probably know that AMD launched their Opteron 6100 series CPUs earlier this year. One of the highlights of the design is the ability to support 12 DIMMs of memory per socket, up from the previous eight per socket.

Though of all of the servers that have launched HP seems to have the clear lead in AMD technology, for starters as far as I am aware they are the only ones currently offering Opteron 6100-based blades.

Secondly, I have looked around at the offerings of Dell, IBM, HP, and even Supermicro and Tyan, but as far as I can tell only HP is offering Opteron systems with the full 12 DIMMs/socket support.The only reason I can think of I guess is the other companies have a hard time making a board that can accommodate that many DIMMs, after all it is a lot of memory chips. I'm sure if Sun was still independent they would have a new cutting edge design for the 6100. After all they were the first to launch (as far as I know) a quad socket, 2U AMD system with 32 memory slots nearly three years ago.

The new Barcelona four-socket server comes with dual TCP offloading enabled gigabit NIC cards, redundant power supplies, and 32 DIMM slots for up to 256 GBs of memory capacity  [..] Half the memory and CPU are stacked on top of the other half and this is a rather unusual but innovative design.

Anyways, if your interested in the Opteron 6100, it seems HP is the best bet in town, whether it's

Kind of fuzzy shot of the HP DL165 G7, anyone got a clearer picture?

HP DL385 G7

HP BL685c G7 - I can understand why they couldn't fit 48 DIMMs on this blade(Note: two of the CPUs are under the hard disks)!

HP BL465c G7 - again, really no space for 24 DIMMs ! (damnit)

Tyan Quad Socket Opteron 6100 motherboard, tight on space, guess the form factor doesn't cut it.

Twelve cores not enough? Well you'll be able to drop Opteron 6200 16-core CPUs into these systems in the not too distant future.

15Aug/10Off

Lowest power dual socket server ever

TechOps Guy: Nate

This was posted a couple of weeks ago but I was on vacation at the time and didn't notice it until a few days ago.

It talks about the latest 4000-series low power chips from AMD running in a dual socket system from ZT Systems.

The numbers are pretty startling. At peak load they measure the power draw at only 126 watts for the system as a whole:

  • Dual processor 6-core Opteron 4164 EE (1.8Ghz per core)
  • 16GB memory (4x4GB DDR3-1333)
  • 128GB SSD

From the blog:

[..] There are four major enhancements to the AMD Opteron™ 4000 Series platform which significantly lower server power consumption:

  1. The AMD Opteron™ 4100 EE Series of processors are the lowest power AMD Opteron processors ever. These processors are rated at 32W ACP, which is 20% lower than AMD’s previous generation 2400 EE Series processors.
  2. AMD Opteron™ 4100 Series processors support 1.35V DDR3 memory, enabling lower server power consumption at load.
  3. The AMD Opteron™ 4000 Series platform uses low-power chipsets. The SR5650 has a maximum TDP of only 13 watts.
  4. AMD Opteron™ 4100 Series processors include new AMD-P power management features, including C1E. C1E is a feature that helps reduce the power consumption of the AMD Opteron™ 4100 Series processor’s integrated memory controller and HyperTransport™ technology links.

[..]
The two lowest power Intel Xeon processor-based servers consume 28% more and 34% more power than the ZT Systems 1253Ra Datacenter Server[..]

Pretty amazing that you can get a dual processor, 12 core(total) system running at less power than some CPUs out there consume by themselves.

I'm sure it will run even at even lower power with rack level DC power and cooling.

Tagged as: , , 1 Comment
29Mar/10Off

Opteron 6100s are here

TechOps Guy: Nate

UPDATED I've been waiting for this for quite some time, finally the 12-core AMD Opteron 6100s have arrived. AMD did the right thing this time by not waiting to develop a "true" 12-core chip and instead bolted a pair of CPUs together into a single package. You may recall AMD lambasted Intel when it released it's first four core CPUs a few years ago(composed of a pair of two-core chips bolted together), a strategy that paid off well for them, AMD's market share was hurt badly as a result, a painful lesson which they learned from.

For me I'd of course rather have a "true" 12-core processor, but I'm very happy to make do with these Opteron 6100s in the meantime, I don't want to have to wait another 2-3 years to get 12 cores in a socket.

Some highlights of the processor:

  • Clock speeds ranging from 1.7Ghz(65W) to 2.2Ghz(80W), with a turbo boost 2.3Ghz model coming in at 105W
  • Prices ranging from $744 to $1,396 in 1,000-unit quantities
  • Twelve-core and Eight–core, L2 – 512K/core, L3 - 12MB of shared L3 Cache
  • Quad-Channel LV & U/RDDR3, ECC, support for on-line spare memory
  • Supports up to 3 DIMMs/channel, up to 12 DIMMS per CPU
  • Quad 16-bit HyperTransport™ 3 technology (HT3) links, up to 6.4 GT/s per link (more than triple HT1 performance)
  • AMD SR56x0 chipset with I/O Virtualization and PCIe® 2.0
  • Socket compatibility with planned AMD Opteron™ 6200 Series processor.(16 cores?)
  • New advanced idle states allowing the processor to idle with less power usage than the previous six core systems (AMD seems to have long had the lead in idle power conservation).


The new I/O virtualization looks quite nice as well - AMD-V 2.0, from their site:

Hardware features that enhance virtualization:

  • Unmatched Memory Bandwidth and Scalability – Direct Connect Architecture 2.0 supports a larger number of cores and memory channels so you can configure robust virtual machines, allowing your virtual servers to run as close as possible to physical servers.
  • Greater I/O virtualization efficiencies –I/O virtualization to help increase I/O efficiency by supporting direct device assignment, while improving address translation to help improve the levels of hypervisor intervention.
  • Improved virtual machine integrity and security –With better isolation of virtual machines through I/O virtualization, helps increase the integrity and security of each VM instance.
  • Efficient Power Management – AMD-P technology is a suite of power management features that are designed to drive lower power consumption without compromising performance. For more information on AMD-P, click here
  • Hardware-assisted Virtualization – AMD-V technology to enhance and accelerate software-based virtualization so you can run more virtual machines, support more users and transactions per virtual machine with less overhead. This includes Rapid Virtualization Indexing (RVI) to help accelerate the performance of many virtualized applications by enabling hardware-based VM memory management. AMD-V technology is supported by leading providers of hypervisor and virtualization software, including Citrix, Microsoft, Red Hat, and VMware.
  • Extended Migration – a hardware feature that helps virtualization software enable live migration of virtual machines between all available AMD Opteron™ processor generations. For a closer look at Extended Migration, follow this link.

With AMD returning to the chipset design business I'm happy with that as well, I was never comfortable with Nvidia as a server chipset maker.

The Register has a pair of great articles on the launch as well, though the main one I was kind of annoyed I had to scroll so much to get past the Xeon news, which I don't think they had to go out of their way to recap with such detail in an article about the Opterons, but oh well.

I thought this was an interesting note on the recent Intel announcement of integrated silicon for encryption -

While Intel was talking up the fact that it had embedded cryptographic instructions in the new Xeon 5600s to implement the Advanced Encryption Standard (AES) algorithm for encrypting and decrypting data, Opterons have had this feature since the quad-core "Barcelona" Opterons came out in late 2007, er, early 2008.

And as for performance -

Generally speaking, bin for bin, the twelve-core Magny-Cours chips provide about 88 per cent more integer performance and 119 per cent more floating point performance than the six-core "Istanbul" Opteron 2400 and 8400 chips they replace..

AMD seems geared towards reducing costs and prices as well with -

The Opteron 6100s will compete with the high-end of the Xeon 5600s in the 2P space and also take the fight on up to the 4P space. But, AMD's chipsets and the chips themselves are really all the same. It is really a game of packaging some components in the stack up in different ways to target different markets.

Sounds like a great way to keep costs down by limiting the amount of development required to support the various configurations.

AMD themselves also blogged on the topic with some interesting tidbits of information -

You’re probably wondering why we wouldn’t put our highest speed processor up in this comparison. It’s because we realize that while performance is important, it is not the most important factor in server decisions.  In most cases, we believe price and power consumption play a far larger role.

[..]

Power consumption – Note that to get to the performance levels that our competitor has, they had to utilize a 130W processor that is not targeted at the mainstream server market, but is more likely to be used in workstations. Intel isn’t forthcoming on their power numbers so we don’t really have a good measurement of their maximum power, but their 130W TDP part is being beaten in performance by our 80W ACP part.  It feels like the power efficiency is clearly in our court.  The fact that we have doubled cores and stayed in the same power/thermal range compared to our previous generation is a testament to our power efficiency.

Price – This is an area that I don’t understand.  Coming out of one of the worst economic times in recent history, why Intel pushed up the top Xeon X series price from $1386 to $1663 is beyond me.  Customers are looking for more, not less for their IT dollar.  In the comparison above, while they still can’t match our performance, they really fall short in pricing.  At $1663 versus our $1165, their customers are paying 42% more money for the luxury of purchasing a slower processor. This makes no sense.  Shouldn’t we all be offering customers more for their money, not less?

In addition to our aggressive 2P pricing, we have also stripped away the “4P tax.” No longer do customers have to pay a premium to buy a processor capable of scaling up to 4 CPUs in a single platform.  As of today, the 4P tax is effectively $0. Well, of course, that depends on you making the right processor choice, as I am fairly sure that our competitor will still want to charge you a premium for that feature.  I recommend you don’t pay it.

As a matter of fact, a customer will probably find that a 4P server, with 32 total cores (4 x 8-core) based on our new pricing, will not only perform better than our competitor’s highest end 2P system, but it will also do it for a lower price. Suddenly, it is 4P for the masses!

While for the most part I am mainly interested in their 12-core chips, but I also see significant value in the 8 core chips, being able to replace a pair of 4 core chips with a single socket 8 core system is very appealing as well in certain situations. There is a decent premium on motherboards that need to support more than one socket. Being able to get 8, (and maybe even 12 cores) on a single socket system is just outstanding.

I also found this interesting -

Each one is capable of 105.6 Gigaflops (12 cores x 4 32-bit FPU instructions x 2.2GHz).  And that score is for the 2.2GHz model, which isn’t even the fastest one!

I still have a poster up on one of my walls back from 1995-1996 era on the world's first Teraflop machine, which was -

The one-teraflops demonstration was achieved using 7,264 Pentium Pro processors in 57 cabinets.

With the same number of these new Opterons you could get 3/4ths of the way to a Petaflop.

SGI is raising the bar as well -

This means as many as 2,208 cores in a single rack of our Rackable™ rackmount servers. And in the SGI ICE Cube modular data center, our containerized data center environment, you can now scale within a single container to 41,760 cores! Of course, density is only part of the picture. There’s as much to be excited about when it comes to power efficiency and the memory performance of SGI servers using AMD Opteron 6100 Series processor technology

Other systems announced today include:

  • HP DL165G7
  • HP SL165z G7
  • HP DL385 G7
  • Cray XT6 supercomputer
  • There is mention of a Dell R815 though it doesn't seem to be officially announced yet. The R815 specs seem kind of underwhelming in the memory department, with it only supporting 32 DIMMs (the HP systems above support the full 12 DIMMs/socket). It is only 2U however. Sun has had 2U quad socket Opteron systems with 32 DIMMs for a couple years now in the form of the X4440, strange that Dell did not step up to max out the system with 48 DIMMs.

I can't put into words how happy and proud I am of AMD for this new product launch, not only is it an amazing technological achievement, but the fact that they managed to pull it off on schedule is just amazing.

Congratulations AMD!!!

Tagged as: , , Comments Off
28Feb/10Off

VMware dream machine

TechOps Guy: Nate

(Originally titled fourty eight all round, I like VMware dream machine more)

UPDATED I was thinking more about the upcoming 12-core Opterons and the next generation of HP c Class blades, and thought of a pretty cool configuration to have, hopefully it becomes available.

Imagine a full height blade that is quad socket, 48 cores (91-115Ghz), 48 DIMMs (192GB with 4GB sticks), 4x10Gbps Ethernet links and 2x4Gbps fiber channel links (total of 48Gbps of full duplex bandwidth). The new Opterons support 12 DIMMs per socket, allowing the 48 DIMM slots.

Why 4x10Gbps links? Well I was thinking why not.. with full height blades you can only fit 8 blades in a c7000 chassis. If you put a pair of 2x10Gbps switches in that gives you 16 ports. It's not much more $$ to double up on 10Gbps ports. Especially if your talking about spending upwards of say $20k on the blade(guesstimate) and another $9-15k blade on vSphere software per blade. And 4x10Gbps links gives you up to 16 virtual NICs using VirtualConnect per blade, each of them adjustable in 100Mbps increments.

Also given the fact that it is a full height blade, you have access to two slots worth of I/O, which translates into 320Gbps of full duplex fabric available to a single blade.

That kind of blade ought to handle just about anything you can throw at it. It's practically a super computer in of itself. Right now HP holds the top spot for VMark scores, with a 8 socket 6 core system(48 total cores) out pacing even a 16 socket 4 core system(64 total cores).

The 48 CPU cores will give the hypervisor an amazing number of combinations for scheduling vCPUs. Here's a slide from a presentation I was at last year which illustrates the concept behind the hypervisor scheduling single and multi vCPU VMs:

There is a PDF out there from VMware that talks about the math formulas behind it all, it has some interesting commentary on CPU scheduling with hypervisors:

[..]Extending this principle, ESX Server installations with a greater number of physical CPUs offer a greater chance of servicing competing workloads optimally. The chance that the scheduler can find room for a particular workload without much reshuffling of virtual machines will always be better when the scheduler has more CPUs across which it can search for idle time.

This is even cooler though, honestly I can't pretend to understand the math myself! -

Scheduling a two-VCPU machine on a two-way physical ESX Server hosts provides only one possible allocation for scheduling the virtual machine. The number of possible scheduling opportunities for a two-VCPU machine on a four-way or eight-way physical ESX Server host is described by combinatorial mathematics using the formula N! / (R!(N-R)!) where N=the number of physical CPUs on the ESX Server host and R=the number of VCPUs on the machine being scheduled.1 A two-VCPU virtual machine running on a four-way ESX Server host provides (4! / (2! (4-2)!) which is (4*3*2 / (2*2)) or 6 scheduling possibilities. For those unfamiliar with combinatory mathematics, X! is calculated as X(X-1)(X-2)(X-3)…. (X- (X-1)). For example 5! = 5*4*3*2*1.

Using these calculations, a two-VCPU virtual machine on an eight-way ESX Server host has (8! / (2! (8-2)!) which is (40320 / (2*720)) or 28 scheduling possibilities. This is more than four times the possibilities a four-way ESX Server host can provide. Four-vCPU machines demonstrate this principle even more forcefully. A four-vCPU machine scheduled on a four-way physical ESX Server host provides only one possibility to the scheduler whereas a four-VCPU virtual machine on an eight-CPU ESX Server host will yield (8! / (4!(8-4)!) or 70 scheduling possibilities, but running a four-vCPU machine on a sixteen-way ESX Server host will yield (16! / (4!(16-4)!) which is (20922789888000 / ( 24*479001600) or 1820 scheduling possibilities. That means that the scheduler has 1820 unique ways in which it can place the four-vCPU workload on the ESX Server host. Doubling the physical CPU count from eight to sixteen results in 26 times the scheduling flexibility for the four-way virtual machines. Running a four-way virtual machine on a Host with four times the number of physical processors (16-way ESX Server host) provides over six times more flexibility than we saw with running a two-way VM on a Host with four times the number of physical processors (8-way ESX Server host).

Anyone want to try to extrapolate that and extend it to a 48-core system? :)

It seems like only yesterday that I was building DL380G5 ESX 3.5 systems with 8 CPU cores and 32GB of ram, with 8x1Gbps links thinking of how powerful they were. This would be six of those in a single blade. And only seems like a couple weeks ago I was building VMware GSX systems with dual socket single core systems and 16GB ram..

So, HP do me a favor and make a G7 blade that can do this, that would make my day! I know fitting all of those components on a single full height blade won't be easy. Looking at the existing  BL685c blade, it looks like they could do it, remove the internal disks(who needs em, boot from SAN or something), and put an extra 16 DIMMs for a total of 48.

I thought about using 8Gbps fiber channel but then it wouldn't be 48 all round :)

UPDATE Again I was thinking about this and wanted to compare the costs vs existing technology. I'm estimating roughly a $32,000 price tag for this kind of blade and vSphere Advanced licensing (note you cannot use Enterprise licensing on a 12-core CPU, hardware pricing extrapolated from existing HP BL685G6 quad socket 6 core blade system with 128GB ram). The approximate price of an 8-way 48-core HP DL785 with 192GB, 4x10GbE and 2x4Gb Fiber with vSphere licensing comes to about roughly $70,000 (because VMWare charges on a per socket basis the licensing costs go up fast). Not only that but you can only fit 6 of these DL785 servers in a 42U rack, and you can fit 32 of these blades in the same rack with room to spare. So less than half the cost, and 5 times the density(for the same configuration). The DL785 has an edge in memory slot capacity, which isn't surprising given its massive size, it can fit 64 DIMMs vs 48 on my VMware dream machine blade.

Compared to a trio of HP BL495c blades each with 12 cores, and 64GB of memory, approximate pricing for that plus advanced vSphere is $31,000 for a total of 36 cores and 192GB of memory. So for $1,000 more you can add an extra 12 cores, cut your server count by 66%, probably cut your power usage by some amount and improve consolidation ratios.

So to summarize, two big reasons for this type of solution are:

  • More efficient consolidation on a per-host basis by having less "stranded" resources
  • More efficient consolidation on a per-cluster basis because you can get more capacity in the 32-node limit of a VMware cluster(assuming you want to build a cluster that big..) Again addressing the "stranded capacity" issue. Imagine what a resource pool could do with 3.3 Thz of compute capacity and 9.2TB of memory? All with line rate 40Gbps networking throughout? All within a single cabinet ?

Pretty amazing stuff to me anyways.

[For reference - Enterprise Plus licensing would add an extra $1250/socket plus more in support fees. VMware support costs not included in above pricing.]

END UPDATE

23Feb/10Off

AMD 12-core chips on schedule

TechOps Guy: Nate

I came across this article a few days ago on Xbitlabs and was surprised it didn't seem to get replicated elsewhere. I found while playing with a stock tracking tool on my PDA (was looking at news regarding AMD). I'm not an investor but I find the markets interesting and entertaining at times.

Anyways it mentioned some good news from my perspective that is the 12-core Opterons (rather call them that then their code name because the code names quickly become confusing, I used to stay on top of all the CPU specs back in the Socket 7 days) are on track to ship this quarter. I was previously under the impression I guess incorrectly that they would ship by the end of next quarter. And it was Intel's 8-core chips that would ship this quarter.

From the article

AMD Opteron “Magny-Cours” processor will be the first chip for the AMD G34 “Maranello” platform designed for Opteron processors 6000-series with up to 16 cores, quad-channel memory interface, 2 or 4 sockets, up to 12 memory modules per socket and some server and enterprise-specific functionality. Magny-Cours microprocessors feature two six-core or quad-core dies on one piece of substrate.

I read another article recently on The Register which mentioned AMD's plans to take the chip to 16-cores in 2011.  I've been eagerly waiting for the 12-core chips for some time now mainly for virtualization, having the extra cores gives more CPU scheduler options when scheduling multi vCPU virtual machines. And it further increases the value of dual socket systems, allowing 24 real cores in a dual socket configuration that to me is just astonishing. And having the ability to have 24 memory sockets on a dual socket system is also pretty amazing. I have my doubts that anyone can fit 24 memory modules on a single half height blade but who knows. Right now to my knowledge HP has the densest half height blade as far as memory is concerned with 18 DIMMs for a Xeon 5500-based system and 16 DIMMs for an 6-core Opteron-based system. IBM recently announced a new more dense blade with 18 slots but it appears it is full height, so doesn't really qualify. I think a dual socket full height blade is a waste of space. Some Sun blades have good densities as well though I'm not well versed in their technology.