TechOpsGuys.com Diggin' technology every day

August 23, 2011

Mac Daddy P10000

Filed under: Datacenter,Storage,Virtualization — Tags: , , — Nate @ 9:55 pm

It’s finally here, the HP P10000 – aka 3PAR V Class. 3PAR first revealed this to their customers more than a year ago, but the eagle has landed now.

When it comes to the hardware – bigger is better (usually means faster too)

Comparisons of recent 3PAR arrays

ArrayRaw
Capacity
Fibre
Ports
Data
Cache
Control
Cache
DisksInterconnect
Bandwidth
I/O
Bandwidth
SPC-1
IOPS
8-node P10000
(aka V800)
1,600 TB288 ports
(192 host)
512 GB256 GB1,920112 GB/sec96 GB/sec600,000
(guess)
8-node T800800 TB192 ports
(128 host)
96 GB32 GB1,28045 GB/sec19.2 GB/sec225,000
4-node T800
(or 4-node
T400)
400 TB96
(64 host)
48 GB 16 GB6409.6 GB/sec?~112,000
(estimate)
4-node F400384 TB32
(24 host)
24 GB16 GB3849.6 GB/sec ??93,000
Comparison between the F400, T400, T800 and the new V800. In all cases the numbers reflected are in a maximum configuration.

3PAR V800 ready to fight

The new system is based on their latest Generation 4 ASIC, and for the first time they are putting two ASICs in each controller. This is also the first system that supports PCI Express, with if my memory serves 9 PCI Express buses per controller. Front end throughput is expected to be up in the 15 Gigabytes/second range (up from ~6GB on the T800).  Just think they have nearly eight times the interconnect bandwidth than the controllers have capacity to push data to hosts, that’s just insane.

IOPS – HP apparently is not in a big rush to post SPC-1 numbers, but given the increased spindle count, cache, doubling up on ASICs, and the new ASIC design itself I would be surprised if the system would get less than say half a million IOPS on SPC-1 (by no means a perfect benchmark but at least it’s a level playing field).

It’s nice to see 3PAR finally bulk up on data cache (beefcake!!) – I mean traditionally they don’t need it all that much because their architecture blows the competition out of the water without breaking a sweat – but still – ram is cheap – it’s not as if they’re using the same type of memory you find in CPU cache – it’s industry standard ECC DIMMs. RAM may be cheap, but I’m sure HP won’t charge you industry standard DIMM pricing when you go to put 512GB in your system!

Now that they have PCI Express 3PAR can natively support 8Gbps fibre channel as well as 10Gbit iSCSI and FCoE which are coming soon.

The drive cages and magazines are more or less unchanged (physically) from the previous generation but apparently new stuff is still coming down the pike there.  The controller’s physical design (how it fits in the cabinet) seems radically different than their previous S or T series.

Another enhancement for this system is they expanded the number of drive chassis to 48, or 12 per node (up from 8 per node). Though if you go back in time you’ll find their earliest S800 actually supported 64 drive chassis for a time, since then they have refrained from daisy chaining drive chassis on their S/T/V class which is how they achieved the original 64 drive chassis configuration (or 2,560 disks back when disks were 9GB in size). The V class obviously has more ports so they can support more cages. I have no doubt they could go to even more cages by using ports assigned to hosts and assign them to disks, just a matter of testing. Flipping a fiber port from host to disk is pretty trivial on the system.

The raw capacity doesn’t quite line up with the massive amount of control cache the system has, in theory at least if 4GB of control cache per controller is good enough for 200TB raw (per controller pair), then 32GB  per controller should be able to net you 1,600 TB raw (per controller pair or 6,400 TB for the whole system), but obviously with a limit put in of 1,600 TB for the entire system they are using a lot of control cache for something else.

As far as I know the T-class isn’t going anywhere anytime soon, this V class is all about even more massive scale, at a significantly higher entry level price point than the T-class(at least $100,000 more at the baseline from what I can tell), with the beauty of running the same operating system, the same user interfaces, the same software features across the entire product line. The T-class, as-is still is mind numbingly fast and efficient, even three years after it was released.

No mainframe connectivity on this baby.

Storage Federation

The storage federation stuff is pretty cool in that it is peer based, you don’t need any external appliances to move the data around, the arrays talk to each other directly to manage all of that. This is where we get the first real integration between 3PAR and HP in that the entire line of 3PAR arrays as well as the Lefthand-based P4000 iSCSI systems (including the Virtual storage appliance even!) support this new peer federation (sort of makes me wonder where EVA support is – perhaps it’s coming later or maybe it’s a sign HP is sort of depreciating EVA when it comes to this sort of thing – I’m sure the official party line will be EVA is still a shining star).

The main advantage I think of storage federation technology over something like storage vMotion is the array has a more holistic view of what’s going on in the storage system rather than just what a particular host sees, or what a particular LUN is doing. The federation should also have more information about the location of the various arrays if they are in another data center or something and make more intelligent choices about moving stuff around. Certainly would like to see it in action myself. Even though hypervisors have had thin provisioning for a while – by no means does it reduce the need for thin provisioning at the storage level (at least for larger deployments).

I’d imagine like most things on the platform the storage federation is licensed based on the capacity of the array.

If this sort of thing interests you anywhere nearly as much as it interests me you should check out the architecture white paper from HP which has some new stuff from the V class here. You don’t have to register to download it like you did back in the good ‘ol days.

I’d be surprised if I ever decided to work for a company large enough to be able to leverage a V-class, but if anyone from 3PAR is out there reading this (I’m sure there’s more than one) since I am in the Bay area – not far from your HQ – I wouldn’t turn down an invitation to see one of these in person 🙂

Oh HP.. first you kick me in the teeth by killing WebOS devices then before I know what happened you come out with a V-class and want to make things all better, I just don’t know what to feel.

The joys of working with a 3PAR array, it’s been about a year since I laid my hands on one (working at a different company now), I do miss it.

May 11, 2011

2000+ 10GbE ports in a single rack

Filed under: Datacenter,Networking — Tags: , , , — Nate @ 9:41 pm

The best word I can come up with when I saw this was

oof

What I’m talking about is the announcement of the Black Diamond X-Series from my favorite switching company Extreme Networks. I have been hearing a lot about other switching companies coming out with new next gen 10 GbE and 40GbE switches, more than one using Broadcom chips (which Extreme uses as well), so have been patiently awaiting their announcements.

I don’t have a lot to say so I’ll let the specs do the talking

Extreme Networks Black Diamond X-Series

 

  • 14.5 U
  • 20 Tbps switching fabric (up ~4x from previous models)
  • 1.2 Tbps fabric per line slot (up ~10x from previous models)
  • 2,304 line rate 10GbE ports per rack (5 watts per port) (768 line rate per chassis)
  • 576 line rate 40GbE ports per rack (192 line rate per chassis)
  • Built in support to switch up to 128,000 virtual machines using their VEPA/ Direct Attach system

 

 

 

This was fascinating to me:

Ultra high scalability is enabled by an industry-leading fabric design with an orthogonal direct mating system between I/O modules and fabric modules, which eliminates the performance bottleneck of pure backplane or midplane designs.

I was expecting their next gen platform to be a mid plane design (like that of the Black Diamond 20808), their previous 10GbE high density Enterprise switch Black Diamond 8800, by contrast was a backplane design (originally released about six years ago). The physical resemblance to the Arista networks chassis switches is remarkable. I would like to see how this direct mating system looks in a diagram of some kind to get a better idea on what this new design is.

Mini RJ21 adapters, 1 plug on the switch, goes to 6x1GbE ports

To put that port density in to some perspective, their older system (Black Diamond 8800), by comparison, has an option to use Mini RJ21 adapters to achieve 768 1GbE ports in a chassis (14U), so an extra inch of space gets you the same number of ports running at 10 times the speed, and line rate (the 768x1GbE is not quite to line rate but still damn fast). It’s the only way to fit so many copper ports in such a small space.

 
 
 

It seems they have phased out the Black Diamond 10808 (I deployed a pair of these several years ago first released 2003), the Black Diamond 12804C (first released about 2007), the Black Diamond 12804R (also released around 2007) and the Black Diamond 20808 (this one is kind of surprising given how recent it was though didn’t have anything approaching this level of performance of course, I think it was released in around 2009). They also finally seemed to drop the really ancient Alpine series (10+ year old technology) as well.

Also they seem to have announced a new high density stackable 10GbE switch the Summit X670, the successor to the X650 which was already an outstanding product offering several features that until recently nobody else in the market was providing.

Extreme Networks Summit X670

  • 1U
  • 1.28 Tbps switching fabric (roughly double that of the X650)
  • 48 x 10Gbps line rate standard (64 x 10Gbps max)
  • 4 x 40Gbps line rate (or 16 x 10Gbps)
  • Long distance stacking support (up to 40 kilometers)

The X670 from purely a port configuration standpoint looks similar to some of other recently announced products from other companies, like Arista and Force10, both of whom are using the Broadcom Trident+ chipset, I assume Extreme is using the same. These days given so many manufacturers are using the same type of hardware you have to differentiate yourself in the software, which is really what drives me to Extreme more than anything else, their Linux-based easy-to-use Extremeware XOS operating system.

Neither of these products appear to be shipping, not sure when they might ship, maybe sometime in Q3 or something.

40GbE has taken longer than I expected to finalize, they were one of the first to demonstrate 40GbE at Interop Las Vegas last year, but the parts have yet to ship (or if they have the web site is not updated).

For the most part, the number of companies that are able to drive even 10% of the performance of these new lines of networking products is really tiny. But the peace of mind that comes with everything being line rate, really is worth something !

x86 or ASIC? I’m sure performance boosts like the ones offered here pretty much guarantees that x86 (or any general purpose CPU for that matter) will not be driving high speed networking for a very long time to come.

Myself I am not yet sold on this emerging trend in the networking industry that is trying to drive everything to be massive layer 2 domains. I still love me some ESRP! I think part of it has to do with selling the public on getting rid of STP. I haven’t used STP in 7+ years so not using any form of STP is nothing new for me!

May 4, 2011

Microsoft Server Designs

Filed under: Datacenter — Tags: , , , , — Nate @ 8:26 am

I was out of town for most of last week so didn’t happen to catch this bit of news that came out.

It seems shortly after Facebook released their server/data center designs Microsoft has done the same.

I have to admit when I first heard of the Facebook design I was interested, but once I saw the design I felt let down, I mean is that the best they could come up with? It seems there are market based solutions that are vastly superior to what Facebook designed themselves. Facebook did good by releasing in depth technical information but the reality is only a tiny number of organizations would ever think about attempting to replicate this kind of setup. So it’s more for the press/geek factor than being something practical.

I attended a Datacenter Dynamics conference about a year ago, where the most interesting thing that I saw there was a talk by a Microsoft guy who spoke about their data center designs, and focused a lot on their new(ish) “IT PAC“.  I was really blown away. Not much Microsoft does has blown me away but consider me blown away by this. It was (and still is) by far the most innovative data center design I have ever seen myself at least. Assuming it works of course, at the time the guy said there was still some kinks they were working out, and it wasn’t on a wide scale deployment at all at that point. I’ve heard on the grape vine that Microsoft has been deploying them here and there in a couple facilities in the Seattle area. No idea how many though.

Anyways, back to the Microsoft server design, I commented last year on the concept of using rack level batteries and DC power distribution as another approach to server power requirements, rather than the approach that Google and some others have taken which involve server-based UPSs and server based power supplies (which seem much less efficient).

 

Google Server Design with server-based batteries and power supplies

Add to that rack-based cooling(or in Microsoft’s case – container based cooling), ala SGI CloudRack C2/X2, and Microsoft’s extremely innovative IT PAC containers, and you got yourself a really bad ass data center. Microsoft seems to borrow heavily from the CloudRack design, enhancing it even further. The biggest update would be the power system with the rack level UPS and 480V distribution.  I don’t know of any commercial co-location data centers that offer 480V to the cabinets, but when your building your own facilities you can go to the ends of the earth to improve efficiency.

Microsoft’s design permits up to 96 dual socket servers(2 per rack unit) each with 8 memory slots in a single 57U rack (the super tall rack is due to the height of the container). This compares to the CloudRack C2 which fits 76 dual socket servers in a 42U rack (38U of it used for servers).

SGI Cloudrack C2 tray with 2 servers, 8 disks (note no power supplies or fans, those are provided at the rack level )

My only question on Microsoft’s design is their mention of “top of rack switches”. I’ve never been a fan of top of rack switches myself. I always have preferred to have switches in the middle of the rack, better for cable management (half of the cables go up, the other half go down). Especially when we are talking about 96 servers in one rack. Maybe it’s just a term they are using to describe what kind of switches, though there is a diagram which shows the switches positioned at the top of the rack.

SGI CloudRack C2 with top of rack switches positioned in the middle of the rack

I am also curious on their power usage, which they say they aim to have 40-60 watts/server, which seems impossibly low for a dual socket system, so they likely have done some work to figure out optimal performance based on system load and probably never have the systems run at anywhere near peak capacity.

Having 96 servers consume only 16kW of power is incredibly impressive though.

I have to give mad, mad, absolutely insanely mad props to Microsoft. Something I’ve never done before.

Facebook – 180 servers in 7 racks (6 server racks + 1 UPS rack)

Microsoft – 630 servers in 7 racks

Density is critical to any large scale deployment, there are limits to how dense you can practically go before the costs are too high to justify it. Microsoft has gone about as far as is achievable given current technology to accomplish this.

Here is another link where Microsoft provides a couple of interesting PDFs, the first one I believe is written by the same guy that gave the Microsoft briefing at the conference I was at last year.

(As a side note I have removed Scott from the blog since he doesn’t have time to contribute any more)

March 21, 2011

Please do not extend Data center tax breaks

Filed under: Datacenter,News — Tags: , — Nate @ 9:20 am

This is just disgusting to me. It pissed me off when it passed the first time and it is even more stupid and crazy if it happens to pass again.

Just read on DataCenterKnowledge that Washington state (where I am) has someone(s) proposing a bill that would extend data center tax breaks for another 10+ years.

This, in a time where the state forecast just last week an even larger state budget deficit.

Key lawmakers now turn their full attention to writing budgets for the 2011-2013 cycle. Revenue is expected to be down for that budget by an additional $700 million, Thursday’s forecast said. Now, the deficit is estimated to be about $5.1 billion, but that includes voter-approved mandates that lawmakers don’t plan to fund.

The big issue I have with this data center tax break is these data centers really don’t contribute much. They have a short term gain in construction jobs but operationally they employ hardly anyone and they consume an enormous amount of energy and water requirements for cooling.

Take a look at this $1 billion Apple data center for example –

Tax breaks could total $300 million for 50-employee server farm in North Carolina

If your going to give tax breaks, give them to businesses that actually generate jobs. There should be some sort of rule, # of jobs per square foot, or # of jobs per $ in tax break or something. Data centers are a waste for tax breaks, let them go somewhere else.

The original tax break to data centers was approved right after the state announced a $1 billion tax increase on the rest of the state.

October 21, 2010

Red Hat wants to end “IT Suckage”

Filed under: Datacenter,Virtualization — Tags: , — Nate @ 8:50 am

Read an interesting article over on The Register with a lot of comments by a Red Hat executive.

And I can’t help but disagree on a bunch of stuff the executive says. But it could be because the executive is looking at and talking with big bloated slow moving organizations that have a lot of incompetent people in their ranks (“Never got fired for buying X” mantra), instead of smaller more nimble more leading edge organizations willing, ready and able to take some additional “risk” for a much bigger return (such as running virtualized production systems, seems like a common concept to many but I know there’s a bunch of people out there that aren’t convinced that it will work, btw I ran my first VMware in production in 2004, and saved my company BIG BUCKS with the customer (that’s a long story, and an even longer weekend)).

OK so this executive says

After all, processor and storage capacity keep tracking along on their respective Moore’s and Kryder’s Laws, doubling every 18 months, and Gilder’s Law says that networking capacity should double every six months. Those efficiencies should lead to comparable economies. But they’re not.

I was just thinking this morning about the price and capacity of the latest systems(sorry keep going back to the BL685c G7 with 48 cores and 512GB of ram 🙂 ).

I remember back in 2004/2005 time frame the company I was at paying well over $100,000 for a 8-way Itanium system with 128GB of memory to run Oracle databases. The systems of today whether it is the aforementioned blade or countless others can run circles around such hardware now at a tiny fraction of the price. It wasn’t unreasonable just a few short years ago to pay more than $1M for a system that had 512GB of memory and 24-48 CPUs, and now you can get it for less than $50,000(in this case using HP web pricing). That big $1M system probably consumed at least 5-10kW of power and a full rack as well, vs now the same capacity can go for ~800W(100% load off the top of my head) and you can get at least 32 of them in a rack(barring power/cooling constraints).

Granted that big $1M system was far more redundant and available than the small blade or rack mount server, but at the time if you wanted so many CPU cores and memory in a single system you really had no choice but to go big, really big. And if I was paying $1M for a system I’d want it to be highly redundant anyways!

With networking, well 10GbE has gotten to be dirt cheap, just think back a few years ago if you wanted a switch with 48 x 10GbE ports you’d be looking at I’d say $300k+ and it’d take the better part of a rack. Now you can get such switches in a 1U form factor from some vendors(2U from others), for sub $40k?

With storage, well spinning rust hasn’t evolved all that much over the past decade for performance unfortunately but technologies like distributed RAID have managed to extract an enormous amount of untapped capacity out of the spindles that older architectures are simply unable to exploit. More recently the introduction of SSDs and the sub LUN automagic storage tiering technology that is emerging (I think it’s still a few years away from being really useful) you can really get a lot more bang out of your system. EMC‘s fast cache looks very cool too from a conceptual perspective at least I’ve never used it and don’t know anyone who has but I do wish 3PAR had it! Assuming I understand the technology right, with the key being the SSDs are used for both read and write caching. Verses something like the NetApp PAM card which is only a read cache. Neither Fast cache nor PAM is enough to make we want to use those platforms for my own stuff.

The exec goes on to say

Simply put, Whitehurst’s answer to his own question is that IT vendors suck, and that the old model of delivering products to customers is fundamentally broken.

I would tend to agree for the most part but there are those out there that really are awesome. I was lucky enough to find one such vendor, and a few such manufacturers. As one vendor I deal with says they work with the customer not with the manufacturer, they work to give the customer what is best for them. So many vendors I have dealt with over the years are really lazy when it comes down to it, they only know a few select solutions from a few big name organizations and give blank stares if you go outside their realm of comfort (random thought: I got the image of Speed Bump: The roadkill possum from a really old TV series called Liquid Television that I watched on MTV for a brief time in the 90s).

By the same token while most IT vendors suck, most IT managers suck too, for the same reason. Probably because most people suck that may be what it comes down to it at the end of the day. IT as you well know is still an emerging industry, still a baby really evolving very quickly, but has a ways to go. So like with anything the people out there that can best leverage IT are few and far between. Most of the rest are clueless — like my first CEO about 10-11 years ago was convinced he could replace me with a tech head from Fry’s Electronics (despite my 3 managers telling him he could not). About a year after I left the company he did in fact hire such a person — only problem was that individual never showed up for work (maybe he forgot).

Exec goes on to say..

“Functionality should be exploding and costs should be plummeting — and being a CIO, you should be a rock star and out on the golf course by 3 pm,” quipped Whitehurst to his Interop audience.

That is in fact what is happening — provided your choosing the right solutions, and have the right people to manage them, the possibilities are there, just most people don’t realize it or don’t have the capacity to evolve into what could be called the next generation of IT, they have been doing the same thing for so long, it’s hard to change.

Speaking of being a rock star and out on the golf course by 3pm, I recall two things I’ve heard in the past year or so-

The first one used the golf course analogy, from a local VMware consulting shop that has a bunch of smart folks working for them I thought this was a really funny strategy and can see it working quite well in many cases – the person took an industry average of say 2-3 days to provision a new physical system, and said in the virtual world — don’t tell your customers that you can provision that new system in ten minutes, tell them it will take you 2-3 days, spend the ten minutes doing what you need and spend the rest of the time on the golf course.

The second one was from a 3PAR user I believe. Who told one of their internal customers/co-workers something along the lines of “You know how I tell you it takes me a day to provision your 10TB of storage? Well I lied, it only takes me about a minute”.

For me, I’m really too honest I think, I tell people how long I think it will really take and at least on big projects am often too optimistic on time lines. Maybe I should take up Scotty’s strategy and take my time lines and multiply them by four to look like a miracle worker when it gets done early. It might help to work with a project manager as well, I haven’t had one for any IT projects in more than five years now. They know how to manage time (if you have a good one, especially one experienced with IT not just a generic PM).

Lastly the exec says

The key to unlocking the value of clouds is open standards for cloud interoperability, says Whitehurst, as well as standardization up and down the stack to simplify how applications are deployed. Red Hat’s research calculates that about two-thirds of a programmer’s time is spent worrying about how the program will be deployed rather than on the actual coding of the program.

Worrying about how the program will be deployed is a good thing, an absolutely good thing. Rewinding again to 2004 I remember a company meeting where one of the heads of the company stood up and said something along the lines of 2004 was the year of operations, we worked hard to improve how the product operates, and the next phase is going back to feature work for customers. I couldn’t believe my ears, that year was the worst for operations, filled with half implemented software solutions that actually made things worse instead of better, outages increased, stress increased, turnover increased.

The only thing I could do from an operations perspective and buy a crap load of hardware and partition the application to make it easier to manage. We ended up with tons of excess capacity but the development teams were obviouslly unable to make the design changes we needed to improve the operations of the application, but we at least had something that was more manageable, the deployment and troubleshooting teams were so happy when the new stuff was put into production, no longer did they have to try to parse gigabyte sized log files trying to find which errors belong to which transactions from which subsystem. Traffic for different subsystems was routed to different physical systems so you knew if there was an issue with one type of process you go to server farm X to look at it, problem resolution was significantly faster.

I remember having one conversation with a software architect in early 2005 about a particular subsystem that was very poorly implemented (or maybe even designed), it caused us massive headaches in operations, non stop problems really. His response was Well I invited you to a architecture meeting in January of 2004 to talk about this but you never showed up. I don’t remember the invite but if I saw it I know why I didn’t show up, it’s because I was buried in production outages 24/7 and had no time to think more than 24 hours ahead yet alone think about a software feature that was months away from deployment. Just didn’t have the capacity, was running on fumes for more than a year.

So yes, if you are a developer please do worry about how it is deployed, never stop worrying. Consult your operations team (assuming they are worth anything), and hopefully you can get a solid solution out the door. If you have a good experienced operations team then it’s very likely they know a lot more about running production than you do and can provide some good insight into what would provide the best performance and uptime from an operations perspective. They may be simple changes, or not.

One such example, I was working at a now defunct company who had a hard on for Ruby on Rails. They were developing app after app on this shiny new platform. They were seemingly trying to follow Services Oriented Architecture (SOA), something I learned about ironically at a Red Hat conference a few years ago (didn’t know there was a acronym for that sort of thing it seemed so obvious). I had a couple, really simple suggestions for them to take into account for how we would deploy these new apps. Their original intentions called for basically everything running under a single apache instance(across multiple systems), and for example if Service A wanted to talk to Service B then it would talk to that service on the same server. My suggestions which we went with involved two simple concepts:

  • Each application had it’s own apache instance, listening on it’s own port
  • Each application lived behind a load balancer virtual IP with associated health checking, with all application-to-application communication flowing through the load balancer

Towards the end we had upwards of I’d say 15 of these apps running on a small collection of servers.

The benefits are pretty obvious, but the developers weren’t versed in operations — which is totally fine they don’t need to be (though it can be great when they are, I’ve worked with a few such people though they are VERY RARE) that’s what operations people do and you should involve them in your development process.

As for cloud standards — folks are busy building those as we speak and type. VMware seems to be the furthest along from an infrastructure cloud perspective I believe, I wouldn’t expect them to lose their leadership position anytime soon they have an enormous amount of momentum behind them, and it takes a lot to counter that momentum.

About a year ago I was talking to some former co-workers who told me another funny story they were launching a new version of software to production, the software had been crashing their test environments daily for about a month. They had a go no-go meeting in which everyone involved with the product said NO GO. But management overrode them, and they deployed it anyways. The result? A roughly 14 hour production outage while they tried to roll the software back. I laughed and said, things really haven’t changed since I left have they?

So the solutions are there, the software companies and hardware companies have been evolving their stuff for years, the problem is the concepts can become fairly complex when talking about things like capacity utilization and stranded resources, getting the right people in place to be able to not only find such solutions but deploy and manage them as well can really go a long ways, but those people are rare at this point.

I haven’t been writing too much recently been really busy, Scott looks to be doing a good job so far though.

 

October 7, 2010

Testing the limits of virtualization

Filed under: Datacenter,Virtualization — Tags: , , , , , , — Nate @ 11:24 pm

You know I’m a big fan of the AMD Opteron 6100 series processor, also a fan of the HP c class blade system, specifically the BL685c G7 which was released on June 21st. I was and am very excited about it.

It is interesting to think, it really wasn’t that long ago that blade systems still weren’t all that viable for virtualization primarily because they lacked the memory density, I mean so many of them offered a paltry 2 or maybe 4 DIMM sockets. That was my biggest complaint with them for the longest time. About a year or year and a half ago that really started shifting. We all know that Cisco bought some small startup a few years ago that had their memory extender ASIC but well you know I’m not a Cisco fan so won’t give them any more real estate in this blog entry, I have better places to spend my mad typing skills.

A little over a year ago HP released their Opteron G6 blades, at the time I was looking at the half height BL485c G6 (guessing here, too lazy to check). It had 16 DIMM sockets, that was just outstanding. I mean the company I was with at the time really liked Dell (you know I hate Dell by now I’m sure), I was poking around their site at the time and they had no answer to that(they have since introduced answers), the highest capacity half height blade they had at the time anyways was 8 DIMM sockets.

I had always assumed that due to the more advanced design in the HP blades that you ended up paying a huge premium, but wow I was surprised at the real world pricing, more so at the time because you needed of course significantly higher density memory modules in the Dell model to compete with the HP model.

Anyways fast forward to the BL685c G7 powered by the Opteron 6174 processor, a 12-core 2.2Ghz 80W processor.

Load a chassis up with eight of those:

  • 384 CPU cores (860Ghz of compute)
  • 4 TB of memory (512GB/server w/32x16GB each)
  • 6,750 Watts @ 100% load (feel free to use HP dynamic power capping if you need it)

I’ve thought long and hard over the past 6 months on whether or not to go 8GB or 16GB, and all of my virtualization experience has taught me in every case I’m memory(capacity) bound, not CPU bound. I mean it wasn’t long ago we were building servers with only 32GB of memory on them!!!

There is indeed a massive premium associated with going with 16GB DIMMs but if your capacity utilization is anywhere near the industry average then it is well worth investing in those DIMMs for this system, your cost of going from 2TB to 4TB of memory using 8GB chips in this configuration makes you get a 2nd chassis and associated rack/power/cooling + hypervisor licensing. You can easily halve your costs by just taking the jump to 16GB chips and keeping it in one chassis(or at least 8 blades – maybe you want to split them between two chassis I’m not going to get into that level of detail here)

Low power memory chips aren’t available for the 16GB chips so the power usage jumps by 1.2kW/enclosure for 512GB/server vs 256GB/server. A small price to pay, really.

So onto the point of my post – testing the limits of virtualization. When your running 32, 64, 128 or even 256GB of memory on a VM server that’s great, you really don’t have much to worry about. But step it up to 512GB of memory and you might just find yourself maxing out the capabilities of the hypervisor. At least in vSphere 4.1 for example you are limited to only 512 vCPUs per server or only 320 powered on virtual machines. So it really depends on your memory requirements, If your able to achieve massive amounts of memory de duplication(myself I have not had much luck here with linux it doesn’t de-dupe well, windows seems to dedupe a lot though), you may find yourself unable to fully use the memory on the system, because you run out of the ability to fire up more VMs ! I’m not going to cover other hypervisor technologies, they aren’t worth my time at this point but like I mentioned I do have my eye on KVM for future use.

Keep in mind 320 VMs is only 6.6VMs per CPU core on a 48-core server. That to me is not a whole lot for workloads I have personally deployed in the past. Now of course everybody is different.

But it got me thinking, I mean The Register has been touting off and on for the past several months every time a new Xeon 7500-based system launches ooh they can get 1TB of ram in the box. Or in the case of the big new bad ass HP 8-way system you can get 2TB of ram. Setting aside the fact that vSphere doesn’t go above 1TB, even if you go to 1TB I bet in most cases you will run out of virtual CPUs before you run out of memory.

It was interesting to see, in the “early” years the hypervisor technology really exploiting hardware very well, and now we see the real possibility of hitting a scalability wall at least as far as a single system is concerned. I have no doubt that VMware will address these scalability issues it’s only a matter of time.

Are you concerned about running your servers with 512GB of ram? After all that is a lot of “eggs” in one basket(as one expert VMware consultant I know & respect put it). For me at smaller scales I am really not too concerned. I have been using HP hardware for a long time and on the enterprise end it really is pretty robust. I have the most concerns about memory failure, or memory errors. Fortunately HP has had Advanced ECC for a long time now(I think I remember even seeing it in the DL360 G2 back in ’03).

HP’s Advanced ECC spreads the error correcting over four different ECC chips, and it really does provide quite robust memory protection. When I was dealing with cheap crap white box servers the #1 problem BY FAR was memory, I can’t tell you how many memory sticks I had to replace it was sick. The systems just couldn’t handle errors (yes all the memory was ECC!).

By contrast, honestly I can’t even think of a time a enterprise HP server failed (e.g crashed) due to a memory problem. I recall many times the little amber status light come on and I log into the iLO and say, oh, memory errors on stick #2, so I go replace it. But no crash! There was a firmware bug in the HP DL585G1s I used to use that would cause them to crash if too many errors were encountered, but that was a bug that was fixed years ago, not a fault with the system design. I’m sure there have been other such bugs here and there, nothing is perfect.

Dell introduced their version of Advanced ECC about a year ago, but it doesn’t (or at least didn’t maybe it does now) hold a candle to the HP stuff. The biggest issue with the Dell version of Advanced ECC was if you enabled it, it disabled a bunch of your memory sockets! I could not get an answer out of Dell support at the time at least why it did that. So I left it disabled because I needed the memory capacity.

So combine Advanced ECC with ultra dense blades with 48 cores and 512GB/memory a piece and you got yourself a serious compute resource pool.

Power/cooling issues aside(maybe if your lucky you can get in to SuperNap down in Vegas) you can get up to 1,500 CPU cores and 16TB of memory in a single cabinet. That’s just nuts! WAY beyond what you expect to be able to support in a single VMware cluster(being that your limited to 3,000 powered on VMs per cluster – the density would be only 2 VMs/core and 5GB/VM!)

And if you manage to get a 47U rack, well you can get one of those c3000 chassis in the rack on top of the four c7000 and get another 2TB of memory and 192 cores. We’re talking power kicking up into the 27kW range in a single rack! Like I said you need SuperNap or the like!

Think about that for a minute, 1,500 CPU cores and 16TB of memory in a single rack. Multiply that by say 10 racks. 15,000 CPU cores and 160TB of memory. How many tens of thousands of physical servers could be consolidated into that? A conservative number may be 7 VMs/core, your talking 105,000 physical servers consolidated into ten racks. Well excluding storage of course. Think about that! Insane! I mean that’s consolidating multiple data centers into a high density closet! That’s taking tens to hundreds of megawatts of power off the grid and consolidating it into a measly 250 kW.

I built out, what was to me some pretty beefy server infrastructure back in 2005, around a $7 million project. Part of it included roughly 300 servers in roughly 28 racks. There was 336kW of power provisioned for those servers.

Think about that for a minute. And re-read the previous paragraph.

I have thought for quite a while because of this trend, the traditional network guy or server guy is well, there won’t be as many of them around going forward. When you can consolidate that much crap in that small of a space, it’s just astonishing.

One reason I really do like the Opteron 6100 is the cpu cores, just raw cores. And they are pretty fast cores too. The more cores you have the more things the hypervisor can do at the same time, and there is no possibilities of contention like there are with hyperthreading. CPU processing capacity has gotten to a point I believe where raw cpu performance matters much less than getting more cores on the boxes. More cores means more consolidation. After all industry utilization rates for CPUs are typically sub 30%. Though in my experience it’s typically sub 10%, and a lot of times sub 5%. My own server sits at less than 1% cpu usage.

Now fast raw speed is still important in some applications of course. I’m not one to promote the usage of a 100 core CPU with each core running at 100Mhz(10Ghz), there is a balance that has to be achieved, and I really do believe the Opteron 6100 has achieved that balance, I look forward to the 6200(socket compatible 16 core). Ask anyone that has known me this decade I have not been AMD’s strongest supporter for a very long period of time. But I see the light now.

October 6, 2010

Amazon EC2: Not your father’s enterprise cloud

Filed under: Datacenter — Tags: , — Nate @ 9:00 am

OK, so obviously I am old enough that my father did not have clouds back in his days, well not the infrastructure clouds that are offered today. I just was trying to think of a somewhat zingy type of topic. And I understand enterprise can have many meanings depending on the situation, it could mean a bank that needs high uptime for example. In this case I use the term enterprise to signify the need for 24×7 operation.

Here I am, once again working on stuff related to “the cloud”, and it seems like everything “cloud” part of it revolves around EC2.

Even after all the work I have done recently and over the past year or two with regards to cloud proposals, I don’t know why it didn’t hit me until probably in the past week or so but it did (sorry if I’m late to the party).

There are a lot of problems with running traditional infrastructure in the Amazon cloud, as I’m sure many have experienced first hand. The realization that occured to me wasn’t that of course.

The realization was that there isn’t a problem with the Amazon cloud itself, but there is a problem with how it is:

  • Marketed
  • Targeted

Which leads to people using the cloud for things it was not intended to ever be used for. In regards to Amazon, one has to look no further than their SLA on EC2 to immediately rule it out for any sort of “traditional” application which includes:

  • Web servers
  • Database servers
  • Any sort of multi tier application
  • Anything that is latency sensitive
  • Anything that is sensitive to security
  • Really, anything that needs to be available 24×7

Did you know that if they lose power to a rack, or even a row of racks that is not considered an outage? It’s not as if they provide you with the knowledge of where your infrastructure is in their facilities, they rather you just pay them more and put things in different zones and regions.

Their SLA says in part that they can in fact lose an entire data center (“availability zone”), and that’s not considered an outage.  Amazon describes this as an “availability zone”

Additionally, they are physically separate, such that even extremely uncommon disasters such as fires, tornados or flooding would only affect a single Availability Zone.

And while I can’t find it on their site at the moment, I swear not too long ago their SLA included a provision that said even if they lost TWO data centers it’s still not an outage unless you can’t spin up new systems in a THIRD. Think of how many hundreds to thousands of servers are knocked off line when an Amazon data center becomes unavailable. I think they may of removed the two availability zones clause because not all of their regions have more than two zones(last I checked only us-east did, but maybe more have them now).

I was talking to someone who worked at Amazon not too long ago and had in fact visited the us-east facilities, and said all of the availability zones were in the same office park, really quite close to each other. They may of had different power generators and such, but quite likely if a tornado or flooding hit, more than one zone would be impacted, likely the entire region would go out(that is Amazon’s code word for saying all availability zones are down). While I haven’t experienced it first hand I know of several incidents that impacted more that one availability zone, indicating that there is more things shared between them than customers are led to believe.

Then there is the extremely variable performance & availability of the services as a whole. On more than one occasion I have seen Amazon reboot the underlying hardware w/o any notification (note they can’t migrate the work loads off the machine! anything on the machine at the time is killed!).  I also love how unapologetic they are when it comes to things like data loss. Basically they say you didn’t replicate the data enough times, so it’s your fault. Now I can certainly understand that bad things happen from time to time, that is expected, what is not expected though is how they handle it. I keep thinking back to this article I read on The Register a couple years ago, good read.

Once you’re past that, there’s the matter of reliability. In my experience with it, EC2 is fairly reliable, but you really need to be on your shit with data replication, because when it fails, it fails hard. My pager once went off in the middle of the night, bringing me out of an awesome dream about motorcycles, machine guns, and general ass-kickery, to tell me that one of the production machines stopped responding to ping. Seven or so hours later, I got an e-mail from Amazon that said something to the effect of:

There was a bad hardware failure. Hope you backed up your shit.

Look at it this way: at least you don’t have a tapeworm.

-The Amazon EC2 Team

I’m sure I have quoted it before in some posting somewhere, but it’s such an awesome and accurate description.

So go beyond the SLAs, go beyond the performance and availability issues.

Their infrastructure is “built to fail” which is a good concept at very large scale, I’m sure every big web-type company does something similar. The concept really falls apart at small scale though.

Everyone wants to get to the point where they have application level high availability and abstract the underlying hardware from both a performance and reliability standpoint. I know that, you know that. But what a lot of the less technical people don’t understand is that this is HARD TO DO. It takes significant investments in time & money to pull off. And at large scale these investments do pay back big. But at small scale they can really hurt you. You spend more time building your applications and tools to handle unreliable infrastructure when you could be spending time adding the features that will actually make your customers happy.

There is a balance there, as with anything. My point is that with the Amazon cloud those concepts are really forced upon you, if you want to use their service as a more “traditional” hosting model. And the overhead associated with that is ENORMOUS.

So back to my point as to the problem isn’t with Amazon itself, it’s with whom it is targeted to and the expectations around it. They provide a fine service, if you use it for what it was intended. EC2 stands for “elastic compute”, the first thing that comes to my mind when I hear that kind of term I think of HPC-type applications, data processing, back end type stuff that isn’t latency sensitive, and is more geared towards infrastructure failure.

But even then, that concept falls apart if you have a need for 24×7 operations. The cost model even of Amazon, the low cost “leader” in cloud computing doesn’t hold water vs doing it yourself.

Case in point, earlier in the year at another company I was directed to go on another pointless expedition comparing the Amazon cloud to doing it in house for a data intensive 24×7 application. Not even taking into account the latency introduced by S3, operational overhead with EC2, performance and availability problems. Assuming everything worked PERFECTLY, or at least as good as physical hardware – the ROI for the project for keeping it in house was less than 7 months(I re-checked the numbers and revised the ROI from the original 10 months to 7 months, I was in a hurry writing this morning before work). And this was for good quality hardware with 3 years of NBD on site support. This wasn’t scraping bottom of the barrel. To give you an idea on the savings after those 7 months it could more than pay for my yearly salary and benefits, and other expenses a company has for an employee for each and every month after that.

OK so we’re passed that point now. Onto a couple of really cool slides I came up for a pending presentation, which I really thing illustrate the Amazon cloud quite well, another one of those “picture is worth fifty words” type of thing. The key point here is capacity utilization.

What has using virtualization over the past half decade (give or take..) taught us? What has the massive increases in server and storage capacity taught us? Well they taught me that applications no longer have the ability to exploit the capacity of the underlying hardware. There are very rare exceptions to this but in general over the past  I would say at least 15 years of my experience applications really have never had the ability to exploit the underlying capacity of the hardware. How many systems do you see averaging under 5% cpu? Under 3%? Under 2% ? How many systems do you see with disk drives that are 75% empty? 80%?

What else has virtualization given us? It’s given us the opportunities to logically isolate workloads into different virtual machines, which can ease operational overhead associated with managing such workloads, both from a configuration standpoint as well as a capacity planning standpoint.

That’s my point. Virtualization has given us the ability to consolidate these workloads onto fewer resources. I know this is a point everyone understands I’m not trying to make people look stupid, but my point here with regards to Amazon is their model doesn’t take us forward — it takes us backward. Here are those two slides that illustrate this:

(Click image for full size)

And the next slide

(Click image for full size)

Not all cloud providers are created equal of course. The Terremark Enterprise cloud (not vCloud Express mind you), for example is resource pool based. I have no personal experience with their enterprise cloud (I am a vCloud express user for my personal stuff[2x1VCPU servers – including the server powering this blog!]). Though I did interact with them pretty heavily earlier in the year on a big proposal I was working on at the time. I’m not trying to tell you that Terremark is more or less cost effective, just that they don’t reverse several years of innovation and progress in the infrastructure area.

I’m sure Terremark is not the only provider that can provide resources based on resource pools instead of hard per-VM allocations. I just keep bringing them up because I’m more familiar with their stuff due to several engagements with them at my last company(none of which ever resulted in that company becoming a customer). I originally became interested in Terremark because I was referred to them by 3PAR, and I’m sure by now you know I’m a fan of 3PAR, Terremark is a very heavy 3PAR user. And they are a big VMware user, and you know I like VMware by now right?

If Amazon would be more, what is the right word, honest? up front? Better at setting expectations I think their customers would be better off, mainly they would have less of them because such customers would realize what that cloud is made for. Rather than trying to fit a square peg in a round hole. If you whack it hard enough you can usually get it in, but well you know what I mean.

As this blog  entry exceeds 1,900 words now I feel I should close it off. If you read this far, hopefully I made some sense to you. I’d love to share more of my presentation as I feel it’s quite good but I don’t want to give all of my secrets away 🙂

Thanks for reading.

October 5, 2010

HP Launches new denser SL series

Filed under: Datacenter,News — Nate @ 11:49 am

[domain name transfer still in progress but at least for now I managed to update the name servers to point to mine so the blog is being directed to the right server now]

Getting closer! Not quite there yet though.

Earlier in the year I was looking at the HP SL6000 series of systems for a project that needed high efficiency and density.

The biggest drawback to the system in my opinion is it wasn’t dense enough, it was no denser than 1U servers for the configuration I was looking at (needing 4×3.5″ drives per system). It was more power efficient though, and hardware serviceability was better.

The limitation was in the chassis, and HP acknowledged this at the time saying they were working on a new and improved version but it wasn’t available at the time. Well looks like they have launched it today, in the form of the SL6500. It seems to deliver(on the statements HP gave to me earlier in the year), I don’t see much info on the chassis itself on their site but looks significantly more dense, with the key here being the chassis is a lot deeper than the original 2U.

But they still have a ways to go, as far as I know the SGI Cloudrack C2 is the density leader in this space, at least from material that is publically available, who knows what the likes of IBM/Dell/HP come up with behind the scenes for special customers.

I did, what was to me a pretty neat comparison earlier this year comparing the power efficiency of the Cloudrack against the 3PAR T-class storage enclosures (granted the density technology behind the 3PAR is 8 years old at this point they haven’t felt the need to go more dense, though HP may encourage them to since they waste up to 10U of space in each of the racks but weight and power can become issues in many facilities going even as dense as 3PAR can go).

Anyways, onto the comparison, this is one place where the picture tells the story, pretty crazy huh? Yeah I know the products are aimed at very diffierent markets, I just thought it was a pretty crazy comparison.

You can think of the Cloudrack as one giant chassis. The rack is the chassis(literally). So while HP has gone from a 2U chassis to a 4U chassis, SGI is waiting for them with a 38U chassis. Another nice advantage of the Cloudrack is you can get true N+1 power (3 diverse power sources), most systems can only support two power sources, the Cloudrack can go much, MUCH higher. And with the power supplies built into the chassis, the servers can benefit from that extra fault tolerance and high efficiency(no fans or power supplies in the servers! Same as the HP SL series)

September 22, 2010

The Cloud: Grenade fishing in a barrel

Filed under: Datacenter — Tags: — Nate @ 10:03 pm

I can’t help but laugh. I mean I’ve been involved in several initiatives surrounding the cloud. So many people out there think the cloud is efficient and cost effective. Whoever came up with the whole concept deserves to have their own island (or country) by now.

Because, really, competing against the cloud is like grenade fishing in a barrel. Shooting fish in a barrel isn’t easy enough, really it’s not!

Chuck from EMC earlier in the year talked to the folks at Pfizer around their use of the Amazon cloud, and the real story behind it. Interesting read, really shows the value you can get from the cloud if you use it right.

R+D’s use of HPC resources is unimaginably bursty and diverse, where on any given day one of 1000 different applications will be run. Periodically enormous projects (of very short duration!) come up very quickly, driven by new science or insights, which sometimes are required to make key financial or  strategic decisions with vast amounts of money at stake for the business.

As a result, there’s no real ability to forecast or plan in any sort of traditional IT sense.  The HPC team has to be able to respond in a matter of days to huge requests for on-demand resources — far outside the normal peaks and valleys you’d find in most traditional IT settings.

But those use cases at the moment really are few and far between. Contrasted by use cases of having your own cloud (of sorts) lots more use there. It would not surprise me if over time Pfizer continues to expand it’s internal HPC stuff as it gets more of a grasp as far as what the average utilization rate is and host more and more stuff internally vs going to Amazon. It’s just that in the early days of this they don’t have enough data to predict how much they need. They may never get completely out of the cloud I’m just saying that the high watermark(for lack of a better term) can be monitored so that there is less significant “bursting” to the cloud.

Now if Pfizer is unable to ever really get a grip on forecasting their HPC requirements well then they might just keep using the cloud, but I suspect at the end of the day they’ will get better forecasting. They obviously have the talent internally to do this very tricky balance of cloud and internal HPC. The cloud people would have you believe it’s a simple thing to do, it’s really not. Especially for off the shelf applications. If you have seen the numbers I have seen, you’d shake your head too. Sort of the response I had when I did come across a real good use case for the cloud earlier this year.

I could see paying a lot more for premium cloud services if I got more, but I don’t get more, in fact I get less, a LOT less, than doing it myself. Now for my own personal “server” that is in the Terremark cloud I can live with it, not a big deal my  needs are tiny(though now that I think about it they couldn’t even give me a 2nd NAT address for a 2nd VM for SMTP purposes, I had to create a 2nd account to put my 2nd VM in it to get my 2nd NAT address, costs for me are the same regardless but it is a bit more complicated than it should be, and opening a 2nd account in their system caused all sorts of problems with their back end which seemed to get confused by having two accounts with the same name, had to engage support on more than one occasion to get all the issues fixed). But for real work stuff, no way.

Still so many sheep out there still buy the hype – hook, line and sinker.

Which can make jobs for people like me harder, I’ve heard the story time and time again from several different people in my position, PHB’s are so sold on the cloud concept they can’t comprehend why it’s so much more expensive then doing it yourself, so they want you to justify it six ways from Sunday (if that’s the right phrase). They know there’s something wrong with your math but they don’t know what it is so they want you to try to prove yourself wrong when your not. At the end of the day it works out though, just takes some time to break that glass ceiling (again it sounds like the right term but it might not be)

Then there’s the argument the cloud people make, I was involved in one deal earlier in the year, usual situation, and the cloud providers said “well do you really have the staff to manage all of this?” I said “IT IS A RACK AND A HALF OF EQUIPMENT, HOW MANY PEOPLE DO I NEED, REALLY? They were just as oblivious to that as the PHB’s were to the cloud costs.

While I’m thinking of wikipedia anyone else experience massive slowdowns with their DNS infrastructure? It takes FOREVER to resolve their domains for me. All other domains resolve really fast. I run my own DNS, maybe there is something wrong with it I’m not sure, haven’t investigated.

September 7, 2010

Only HP has it

Filed under: Datacenter,Random Thought,Virtualization — Tags: , , , , — Nate @ 11:32 pm

I commented in response to an article on The Register recently but figure I’m here writing stuff might as well bring this up to.

Unless you’ve been living under a rock and/or not reading this site you probably know that AMD launched their Opteron 6100 series CPUs earlier this year. One of the highlights of the design is the ability to support 12 DIMMs of memory per socket, up from the previous eight per socket.

Though of all of the servers that have launched HP seems to have the clear lead in AMD technology, for starters as far as I am aware they are the only ones currently offering Opteron 6100-based blades.

Secondly, I have looked around at the offerings of Dell, IBM, HP, and even Supermicro and Tyan, but as far as I can tell only HP is offering Opteron systems with the full 12 DIMMs/socket support.The only reason I can think of I guess is the other companies have a hard time making a board that can accommodate that many DIMMs, after all it is a lot of memory chips. I’m sure if Sun was still independent they would have a new cutting edge design for the 6100. After all they were the first to launch (as far as I know) a quad socket, 2U AMD system with 32 memory slots nearly three years ago.

The new Barcelona four-socket server comes with dual TCP offloading enabled gigabit NIC cards, redundant power supplies, and 32 DIMM slots for up to 256 GBs of memory capacity  [..] Half the memory and CPU are stacked on top of the other half and this is a rather unusual but innovative design.

Anyways, if your interested in the Opteron 6100, it seems HP is the best bet in town, whether it’s

Kind of fuzzy shot of the HP DL165 G7, anyone got a clearer picture?

HP DL385 G7

HP BL685c G7 – I can understand why they couldn’t fit 48 DIMMs on this blade(Note: two of the CPUs are under the hard disks)!

HP BL465c G7 – again, really no space for 24 DIMMs ! (damnit)

Tyan Quad Socket Opteron 6100 motherboard, tight on space, guess the form factor doesn’t cut it.

Twelve cores not enough? Well you’ll be able to drop Opteron 6200 16-core CPUs into these systems in the not too distant future.

« Newer PostsOlder Posts »

Powered by WordPress