Cisco UCS Networking falls short

February 27, 2010

Cisco UCS Networking falls short

Filed under: Networking,Virtualization — Tags: blades, c-class, cisco, hp, ucs — Nate @ 4:33 am

UPDATED Yesterday when I woke up I had an email from Tolly in my inbox, describing a new report comparing the networking performance of the Cisco UCS vs the HP c Class blade systems. Both readers of the blog know I haven’t been a fan of Cisco for a long time(about 10 years, since I first started learning about the alternatives), and I’m a big fan of HP c Class (again never used it, but planning on it). So as you could imagine I couldn’t resist what it said considering the amount of hype that Cisco has managed to generate for their new systems(the sheer number of blog posts about it make me feel sick at times).

I learned a couple things from the report that I did not know about UCS before (I often times just write their solutions off since they have a track record of under performance, over price and needless complexity).

The first was that the switching fabric is external to the enclosure, so if two blades want to talk to each other that traffic must leave the chassis in order to do so, an interesting concept which can have significant performance and cost implications.

The second is that the current UCS design is 50% oversubscribed, which is what this report targets as a significant weakness of the UCS vs the HP c Class.

The mid plane design of the c7000 chassis is something that HP is pretty proud of(for good reason), capable of 160Gbps full duplex to every slot, totaling more than 5 Terrabits of fabric, they couldn’t help but take shots at IBM’s blade system and comment on how it is oversubscribed and how you have to be careful in how you configure the system based on that oversubscription when I talked to them last year.

This c7000 fabric is far faster than most high end chassis Ethernet switches, and should allow fairly transparent migration to 40Gbps ethernet when the standard arrives for those that need it. In fact HP already has 40Gbps Infiniband modules available for c Class.

The test involved six blades from each solution, when testing throughput of four blades both solutions performed similarly(UCS was 0.76Gbit faster). Add two more blades and start jacking up the bandwidth requirements. HP c Class scales linerally as the traffic goes up, UCS seems to scale lineraly in the opposite direction. End result is with 60Gbit of traffic being requested(6 blades @ 10Gbps), HP c Class managed to choke out 53.65Gbps, and Cisco UCS managed to cough up a mere 27.37Gbps. On UCS, pushing six blades at max performance actually resulted in less performance than four blades at max performance, significantly less. Illustrating serious weaknesses in the QoS on the system(again big surprise!).

The report mentions putting Cisco UCS in a special QoS mode for the test because without this mode performance was even worse. There is only 80Gbps of fabric available for use on the UCS(4x10Gbps full duplex). You can get a second fabric module for UCS but it cannot be used for active traffic, only as a backup.

UPDATE – A kind fellow over at Cisco took notice of our little blog here(thanks!!) and wanted to correct what they say is a bad test on the part of Tolly, apparently Tolly didn’t realize that the fabrics could be used in active-active(maybe that complexity thing rearing it’s head I don’t know). But in the end I believe the test results are still valid, just at an incorrect scale. Each blade requires 20Gbps of full duplex fabric in order to be non blocking throughout. The Cisco UCS chassis provides for 80Gbps of full duplex fabric, allowing 4 blades to be non blocking. HP by contrast allows up to three dual port Flex10 adapters per half height server which requires 120Gbps of full duplex fabric to support at line rate. Given each slot supports 160Gbps of fabric, you could get another adapter in there but I suspect there isn’t enough real estate on the blade to connect the adapter! I’m sure 120Gbps of ethernet on a single half height blade is way overkill, but if it doesn’t radically increase the cost of the system, as a techie myself I do like the fact that the capacity is there to grow into.

Things get a little more complicated when you start talking about non blocking internal fabric(between blades) and the rest of the network, since HP designs their switches to support 16 blades, and Cisco designs their fabric modules to support 8. You can see by the picture of the Flex10 switch that there are 8 uplink ports on it, not 16, but it’s pretty obvious that is due to space constraints because the switch is half width. END UPDATE

The point I am trying to make here isn’t so much the fact that HP’s architecture is superior to that of Cisco’s. It’s not that HP is faster than Cisco. It’s the fact that HP is not oversubscribed and Cisco is. In a world where we have had non blocking switch fabrics for nearly 15 years it is disgraceful that a vendor would have a solution where six servers cannot talk to each other without being blocked. I have operated 48-port gigabit swtiches which have 256 gigabits of switching fabric, that is more than enough for 48 systems to talk to each other in a non blocking way. There are 10Gbps switches that have 500-800 gigabits of switching fabric allowing 32-48 systems to talk to each other in a non blocking way. These aren’t exactly expensive solutions either. That’s not even considering the higher end backplane and midplane based system that run into the multiple terrabits of switching fabrics connecting hundreds of systems at line rates.

I would expect such a poor design to come from a second tier vendor, not a vendor that has a history of making networking gear for blade switches for several manufacturers for several years.

So say take it worst case, what if you want completely non blocking fabric from each and every system? For me I am looking to HP c Class and 10Gbs Virtual Connect mainly for inttra chassis communication within the vSphere environment. In this situation with a cheap configuration on HP, you are oversubscribed 2:1 when talking outside of the chassis. For most situations this is probably fine, but say that wasn’t good enough for you. Well you can fix it by installing two more 10Gbps switches on the chassis (each switch has 8x10GbE uplinks). That will give you 32x10Gbps uplink ports enough for 16 blades each having 2x10Gbps connections. All line rate, non blocking throughout the system. That is 320 Gigabits vs 80 Gigabits available on Cisco UCS.

HP doesn’t stop there, with 4x10Gbps switches you’ve only used up half of the available I/O slots on the c7000 enclosure, can we say 640 Gigabits of total non-blocking ethernet throughput vs 80 gigabits on UCS(single chassis for both) ? I mean for those fans of running vSphere over NFS, you could install vSphere on a USB stick or SD card and dedicate the rest of the I/O slots to networking if you really need that much throughput.

Of course this costs more than being oversubscribed, the point is the customer can make this decision based on their own requirements, rather than having the limitation be designed into the system.

Now think about this limitation in a larger scale environment. Think about the vBlock again from that new EMC/Cisco/VMware alliance. Set aside the fact that it’s horribly overpriced(I think mostly due to EMC’s side). But this system is designed to be used in large scale service providers. That means unpredictable loads from unrelated customers running on a shared environment. Toss in vMotion and DRS, you could be asking for trouble when it comes to this oversubscription stuff, vMotion (as far as I know) relies entirely on CPU and memory usage. At some point I think it will take storage I/O into account as well. I haven’t heard of it taking into account network congestion, though in theory it’s possible. But it’s much better to just have a non blocking fabric to begin with, you will increase your utilization, efficiency, and allow you to sleep better at night.

Makes me wonder how does Data Center Ethernet (whatever it’s called this week?) hold up under these congestion conditions that the UCS suffers from? Lots of “smart” people spent a lot of time making Ethernet lossless only to design the hardware so that it will incur significant loss in transit. In my experience systems don’t behave in a predictable manor when storage is highly constrained.

I find it kind of ironic that a blade solution from the world’s largest networking company would be so crippled when it came to the network of the system. Again, not a big surprise to me, but there are a lot of Cisco kids out there I see that drink their koolaid without thinking twice, and of course I couldn’t resist to rag again on Cisco.

I won’t bother to mention the recent 10Gbps Cisco Nexus test results that show how easily you can cripple it’s performance as well(while other manufacturers perform properly at non-blocking line rates), maybe will save that for another blog entry.

Just think, there is more throughput available to a single slot in a HP c7000 chassis than there is available to the entire chassis on a UCS. If you give Cisco the benefit of the second fabric module, setting aside the fact you can’t use it in active-active, the HP c7000 enclosure has 32 times the throughput capacity of the Cisco UCS. That kind of performance gap even makes Cisco’s switches look bad by comparison.

Comments (25)

25 Comments

This statement is completely false:
“You can get a second fabric module for UCS but it cannot be used for active traffic, only as a backup.”

The Tolly Group test engineers do not understand how UCS works. Anybody that understand UCS knows that both fabric extenders are active. Therefore, the entire report was based on a false premise resulting in an improperly engineered test.

Cheers,
Brad
(Cisco)

Comment by Brad Hedlund — March 1, 2010 @ 6:29 pm
Thanks for the reply! interesting tidbit of information! I guess it would of been nice for Cisco to join in the test to clarify things when the tests were ran, the report indicates Cisco was invited but declined.

Comment by Nate — March 1, 2010 @ 6:45 pm
Nate, there are definite technical descrepencies with your post, Brad points out one of the largest. That being said the biggest problem is that you made a technical assesment comming from an admittedly biased stand point. HP has great blades, as do IBM and others. That being said Cisco has brought some excellent concepts and innovation into the server market. I wouldn’t claim that any vendor or technology is correct for every environment but each should be objectively assesed rather than written off based on bias. I’ve written a post on this subject in response to your article (http://bit.ly/bNwlev), there are plenty of great existing posts discussing the technology.

I’d love to see a post comparing and contrasting the two server architectures after you’ve had a chance to understand both technologies.

Comment by Joe Onisick — April 30, 2010 @ 7:50 pm
Thanks for the reply and thanks for reading!! I’m sure the formatting of this comment will get mangled, wish this site had better formatting, I’ll try to see if there is a plugin that can fix it, by no means am I a expert at blogging software, if it weren’t for my former co-workers I would of never of been writing this shit!

My post really comes down to this basic math: 5.1Terrabits vs 80 Gigabits of fabric.

The post is targeted at a specific aspect of UCS. I did not mean to cover the entire system. I have other posts that mention things including 1) Why I think FcOE is a waste of time and money 2) Why I think the Memory extender ASIC is not worthwhile 3) the network industries attempt to move switching out of the servers and back to the network is misguided 4) Scale and management things that might be available with at least the Vblock design which of course includes UCS is a joke

My bias against Cisco builds over time as I continually see them release technically inferior products(whether it’s UCS or CRS-3 which I have another post on) over time and I see so many sheep gravitate towards them just because they are Cisco, they don’t know any better. And of course there’s Cisco’s business practices which are riddled with anti trust tactics, their fragmented product lines built from buying all these smaller companies that don’t integrate cleanly. Then last but certainly not least is John Chambers himself. When UCS launched he made a comment on a conference call that when I read it, it was permanently etched into my soul. Fortunately for people like me it’s easy to rip the solution apart from a technical and cost standpoint but it seems the number of people that have this information is few and far between. Vendors typically love me or hate me, love me if their product is good, hate me if it’s not because I will make them aware of it either way.

By no means am I a HP fanboy, I think they make the best servers, and they happen to have a good blade system, I would prefer not to use Procurve switches but I can live with them to get virtualconnect at the edge(would not use them at the core). I wouldn’t touch their storage, or probably most of their other stuff. Only places where I’d go out of my way to push HP gear would be blades and database servers(which could be on blades but assuming not for this example).

Comment by Nate — May 1, 2010 @ 7:49 am
Nate,

You’re still coming from a biased standpoint and comparing apples to oranges: 5.1 Terabits vs. 80 Gigabits of Fabric is comparing total chassis midplane throughput of HP against Cisco’s current IOM uplink bandwidth. For a fair comparison you need midplane vs. midplane. With additional IOM options Cisco can increase throughput to the blade. With current hardware Cisco can provide:

4 full width blades in a 6U enclosure with:
4x 6 Core processors
32 x 8GB memory DIMMs
40 Gbps total usable bandwidth to the blade on 4 redundant connections (2:1 oversubscribed at the IOM)

The other error in the comparison is the chassis themselves:

Cisco chassis:
6RU supporting 8 half width blades or 4 Full-Width blades

HP Chassis:
10RU supporting 16 half height blades or 8 full height

Again you are apples to oranges.

The last and most important thing is the method of comparison. If you compare 1 blade or 1 chassis you’ve missed the mark on the real values UCS can bring. Comparing 17 blades (half-height/half-width) will show the real advantages of TCO, ROI and management.

Again none of this says that UCS is the end-all be-all of blades (that’s up to the end-user) but in order to evaluate them both technically you have to drop pre-concieved opinions and look at the hardware/management objectively.

Overall you’re doing a great job of keeping vendors honest but I think you’re selling an architecture short before giving it a fair shake.

Comment by Joe Onisick — May 1, 2010 @ 3:05 pm
You said it right there 40Gbps of bandwidth available to the blade, vs 160Gbps( or 320Gbps full duplex) of bandwidth available on a single half height blade on a c Class enclosure.

As I said in the blog posting, there’s more bandwidth available to a single half height slot on a C7000 enclosure then there is on the entire UCS chassis. That was really the point of the post, Cisco made a design decision with their UCS system and from my perspective it was a really stupid one for performance and scalability reasons.

Comment by Nate — May 1, 2010 @ 5:07 pm
Nate, any chance you can tell me which HP model has the ability to use 160Gbps worth of I/O cards? That would require 2x10Gbps on board and 7 more dual port 10GE mezzanine cards. I can’t seem to find that model (half or full width.)

Additonally has HP reinvented the PCIe specification for use in their blades? If not how are they using that much bandwidth accross the bus?

By the way 10G Ethernet only supports full duplex so doubling the available bandwidth in a 10GE discussion is best left to marketeers not engineers.

Comment by Joe Onisick — May 2, 2010 @ 7:55 am
Look no further than the 40Gbps infiniband modules that HP has today, which I linked to in the blog posting. One of those alone will drive 160Gbps of full duplex bandwidth, then you have the option of 2 more expansion cards plus on board, so say another 6x10Gbps links, another 120Gbps of bandwidth full duplex That’s 280 out of 320.

40Gbps Ethernet is supposed to be ratified in June, I have a more recent post talking about another product offering 40Gbps for $1,000 per port.

Not saying someone needs this much fabric available to them, but it comes at no cost premium to the product, which is the point. If the system was an order of magnitude more expensive then I would not want it, but the fabric is there for about the same cost as other blade system products. Whether or not you use it that’s up to the user.

Which again is the point, 32 times more fabric available in a C7000 chassis, yes it is a bigger chassis and supports more blades but at the same time that reduces the # of things to manage. In an ideal world I think in an service provider model you could have a chassis that consumes an entire rack(hell make the rack the chassis). Ala SGI Cloudrack for scale out deployments(which provides power and cooling at the rack level but not network fabric). Cisco has switches and routers that consume an entire rack so they clearly have the ability to make such a beast, but they chose a radically different approach, to the detriment of performance and cost, it’s orders of magnitude more expensive to connect 320Gbps worth of fabric via cables than it is a mid plane connection, and more complex, and likely draws significantly more power. I’d love to be able to link say 64 quad socket 12-core(per socket, 16-core per socket next year) blade servers to each other at 40Gbps line rates cheaply, a rack-based chassis would have the ability to do this without ever having to leave the rack. Then have a few 100Gbps uplinks to your core network.
My own plans call for using a C7000 chassis with quad socket 12-core CPUs when those blades come out in a few months, being that HP designs their Ethernet switches for 16 blades, a quad socket full length blade consumes 2 slots, giving that blade access to 4x10Gbps connections. Adding another dual port 10GbE card to a blade is really not much cost(no changes needed on the switch side), so I plan to do that and have 8xquad socket boards each with 4x10GbE and 2x4Gbps fiber(vmware dream machine post). Also gives me more virtual NICs in the event I need them (4 flex NICs per port). And the fiber channel switches in the blade chassis will link directly to the storage array, no intermediate layer. Ethernet switches in the blades will link directly to the core switches. And no I don’t plan to use their FC virtualconnect modules because they can’t link directly to a storage array.

I don’t think I need to continue this conversation further, I do appreciate the feedback! I guess my original blog posting wasn’t clear enough communicating what I wanted to communicate, though I’m not sure how else I could of worded it. I think you understand my point, and I think your point is “who would ever need that much?” which I suppose is fine too, my point is “if your not having to pay extra for it why not keep your options open”. So whatever, it’s a preference thing I suppose.

Comment by Nate — May 2, 2010 @ 9:17 am
Good stuff Nate, thanks for the discussion!

Comment by Joe Onisick — May 2, 2010 @ 10:17 am
Nate,
You need to have a better understanding of Infiniband, and PCI-E limitations.
First, every QDR (40Gb/s) IB adapter available today is PCI-E 2.0 x8, meaning that even with 2 ports, you’re only pushing 40Gb/s, not 80. Add in 8/10 encoding overhead, and PCI-E bus overhead, and now its 30Gb/s per blade going out the back, or 60Gb/s full duplex. Also should mention that the Flex10 (Nexgen) based chipset HP uses is nowhere near capable of line-rate. And, I’d be curious to know where these mythical “2 more expansion cards” reside on board? Certainly not on the standard half-height blades?

For the record, IBM bladecenter has the exact same capabilities. Midplanes are passive..and throughput numbers are simply “marketecture”. What matters is connectivity to each blade in front, and chassis switch capability out back, and IBM and HP are on equal ground here.

I’ll agree on the lack of uplinks for Cisco UCS…but then, they’re not targeting the the HPC marketplace, and typical biz app consolidation doesn’t usually have massive throughput requirements.

Comment by Dan — May 6, 2010 @ 9:15 am
A half height blade can take 3 expansion boards and has on board stuff as well(often 2x10GbE).

Myself I’m more interested in full height quad socket with Opteron 6100s. So my ideal config would be 4x10GbE and 2x4Gbps FC. Perhaps eventually up linking the chassis using N x 40GbE.

The point of the post was to highlight the performance(and as a result, cost) drawbacks of “outsourcing” your switching outside of the chassis vs having it internal to the chassis.

I done for sure commenting on this thread now! thanks for watching.

Comment by Nate — May 6, 2010 @ 10:05 am
@Nate would love to see your cabling at the back once your HP c-class is installed.

Comment by Marthin — May 24, 2010 @ 2:08 am
Man, you call yourself an Enterprise Architect, you are just a hardware jockey.

Comment by Andre — November 17, 2010 @ 7:16 pm
thanks for the comment! I only started calling myself that after several other people started calling me that, wasn’t even aware of the title at the time. Looking at this as an example – http://en.wikipedia.org/wiki/Enterprise_architect that is what I do. whether it is networking, servers, operating systems, applications, storage, automation, architecture, efficiency (e.g power usage/capacity utilization) scalability. Add to that vendor management, cost analysis, ROI and other sorts of things. so many to list…if I stayed focused on one particular area I’d get bored, so I stick to places where I can do many things.

Comment by Nate — November 17, 2010 @ 7:54 pm
Ð¢ÑƒÑ‚ Ð½Ð¾Ð²ÐµÐ¹ÑˆÐ¸Ðµ ÐºÐ¸Ð½Ð¾Ð»ÐµÐ½Ñ‚Ñ‹ Ñ‡Ñ‚Ð¾ Ñ Ð½Ð°ÑˆÐµÐ» Ð² Ð¸Ð½Ñ‚ÐµÑ€Ð½ÐµÑ‚Ðµ Ð¸Ð· Ð³Ð¾Ñ‚Ð¾Ð²Ñ‹Ñ… Ðº Ð¿Ñ€Ð¾ÑÐ¼Ð¾Ñ‚Ñ€Ñƒ Ñ„Ð¸Ð»ÑŒÐ¼Ñ‹ Ð±ÐµÑÐ¿Ð»Ð°Ñ‚Ð½Ð¾
ÐºÐ¸Ð½Ð¾ Ð½Ð¾Ð²Ð¸Ð½ÐºÐ¸
cÐ¼Ð¾Ñ‚Ñ€ÐµÑ‚ÑŒ Ñ„Ð¸Ð»ÑŒÐ¼Ñ‹

Comment by lokscotmece — January 24, 2011 @ 1:53 pm
Thank you Nate. I apprecate your post.

Comment by Gerald — March 13, 2011 @ 2:56 pm
I think the UCS is stinky like poop in a diaper. And it costs to much. Poop in a diaper should be free. Not 2x as expensive as non poopy diapers. Too much poop. Making me gag and choke. Eyes are watering now. Just vomited.

Comment by Cisco_poopy — May 6, 2011 @ 10:59 am
sigh….!
A Ferrari is much better than my humble Ford. A Ferrari can do 200MPH. My Ford can do 140MPH. I have never driven faster than 100MPH. So who cares which is better!

Comment by perriko — June 22, 2011 @ 10:53 pm
I came across your blog while researching UCS vs HP for my company. We are looking into a blade infrastructure/xenserver environment for our Web farm.

This blog dates back to 2010. I am wondering with any advances that cisco has done in the last two years – do you still recommend HP over UCS? Thanks!

Comment by Jenn — September 30, 2012 @ 6:00 pm
it’s really hard to answer that without more information about your requirements. I would recommend HP over Cisco blades at least because of a much broader selection of choices. Though if you are at a company that has a really large Cisco networking deployment the Cisco stuff networking wise may be a better fit. Cisco’s big splash a few years ago revolved around a special memory ASIC that they acquired from a company a couple of years earlier that allowed much larger than normal memory footprint for some servers vs other offerings. Though I believe they stopped using that ASIC about a year ago since the native Intel offering offers high memory densities now.

If you have more specific questions I can try to answer them..

Comment by Nate — September 30, 2012 @ 7:49 pm
Thank Nate for the valuable information.

I would highly appreciate it if you can help me out to decide which product to select for my company based on set of evaluation criteria to be able to support my decision with solid proof as I can see my management intending to go with UCS.

Thanking you in advance.

Comment by Onassi — August 6, 2013 @ 9:43 pm
Interesting timing. I was at a conference last week and one of the topics covered HP blade system vs Cisco UCS. The slide deck is here:
http://www.techopsguys.com/?attachment_id=4199

What most interested me is it was presented by someone who spent the last two years selling and training partners how to sell UCS. He recently re-joined HP a few months ago.

Of course it helps to have the audio that went along with it, I can try to summarize the main points.
- Cisco UCS does not offer out of band management (this is bad)
- Cisco’s fault domain is up to 160 servers, HP is 16, (this can be bad depending on how the customer deploys UCS)
- HP’s update software manages both drivers and firmware. Cisco manages firmware only. This can be terrible as drivers often need to be matched with firmware. The presenter cited a ~20 hour customer outage with UCS because they used the Cisco tools to update the firmware of the system and did not update the drivers. They got on conference calls with all suppliers Cisco, VMware, storage etc, and it took em ~20 hours before they determined the cause
- Cisco does not support sequencing of updates – sometimes things have to be installed in a particular order for it to work correctly
- Cisco has issued 38 critical updates in the past couple of years – HP only 9
- HP is very proactive about failure detection and reporting. They introduced pre-failure alerting almost 20 years ago. Their systems have tons of sensors. In some systems (I assume perhaps DL980 or something) there are as many as 1,600 internal sensors.
- Cisco treats the server as a dumb device, no pre failure stuff, no fancy sensors, the system fails, you get an outage and then you replace the system. HP you get the alerts before the failure occurs and can react to avoid the outage — I had this happen to me recently in fact, I went through and doubled the memory in all of our systems to 384GB. After upgrading the 2nd system I got an alert in vCenter saying there was a memory error. I went to the iLO on the HP server and it said one of the DIMMs had exceeded the error threshold — but at least in part due to HP’s Advanced ECC (something Cisco lacks — IBM has similar tech with their ChipKill) the system did not have any service impact. I was able to evacuate the system and had the memory chip replaced the next day(would of been sooner HP didn’t have that part in the local depot at the time we have 4 hour on site support
- HP’s networking config involves dramatically fewer devices and is less complicated – in fact I’d argue that even HP’s depiction of a network is even more complicated than it needs to be the entire distribution layer can be eliminated these days in most cases(for all but the most massive of installations) if you have the right equipment. I am obviously partal to Extreme Networks for this sort of thing in which case their Black Diamond X-series for example goes up to 768x10GbE ports(line rate) in a 14U chassis with up to 20Tbps of N+1 switching fabric.
- The cost modeling Cisco gives comparing their stuff to HP is misleading, the slide shows the costs that Cisco shows are for an entry level configuration, not a configuration that is “built to scale”, in which case the costs far outstrip HP
- HP touts their end to end product lines whether it is servers, networking, storage, software, multi CPU support(Intel, AMD, Itanium, and soon ARM).
- Along those same lines HP touts their flexibility in configurations, whether it is networking options for the blades, storage options, they far outstrip anything Cisco offers. HP will even sell you integration stuff that will tie into the Cisco networking gear, if that’s what you want.
There was a larger more specific presentation on UCS vs HP but that was NDA only so I did not attend it.

I don’t have personal experience with UCS, for all I know all of the above is totally wrong. But I do give some credit to the presenter, I have no reason to believe he is lying about his background about recently spending years selling UCS.

If your interested in learning about HP Advanced ECC they have a document here on it:
ftp://ftp.hp.com/pub/c-products/servers/options/c00256943.pdf

It’s really neat stuff – myself I won’t deploy any large memory configuration without something similar. I have not been able to find technical info on IBM Chipkill (last time I looked was about two years ago), other than a brief PDF that mentions how IBM developed it for a Mars space craft or something and reliability is up by massive amounts over regular ECC. I have been using Advanced ECC in HP servers(it is on by default on anything that supports it typically 300-series and up) for more than 10 years now. There was a brief time(few years) when I used whitebox systems, and my NUMBER ONE problem was bad memory. Obviously those systems did not have Advanced ECC, so system freezes and crashes were not too uncommon sadly enough. It was frustrating. Learned the hard way!!

Here is an image I created a few years ago that depicts the massive increase in memory capacity over the years, it’s pretty scary(for me) to think people relying on regular ol ECC still…

http://www.techopsguys.com/?attachment_id=4200

Update – here is a PDF from IBM on Chipkill (from 1997!)
http://ece.umd.edu/courses/enee759h.S2003/references/ibm_chipkill.pdf

and the original PDF I found that doesn’t have a lot of info:
http://www.ibm.com/hu/termekismertetok/xseries/dn/chipkill.pdf
“The reliability rate of standard ECC memory was measured at 91% versus 99.94% reliability of Chipkill.” (the test was run with *128MB* memory chips to give an idea how old that pdf is).

I can’t buy Cisco on principle alone. They operations are too shady, too much back room dealing, too much unethical things as part of the sales process.

Hope this helps.

Comment by Nate — August 6, 2013 @ 10:15 pm
Hi,

very interesting topic that I have few comments on as I have worked with both.

First let us start with the infiniband:

Infiniband cannot be on LOM and you only have two expansion slots for the half height so each blade will have 2 card x 2 ports x 40Gb = 160Gb Theoretically.

The infiniband interconnects “InfiniBand for HP BladeSystem c-Class” take up two adjacent slots and have 16 in and 16 out and since the 16 servers each with two infiniband ports of 40Gb we have a total for all blades 320Gb but we have an subscription of 2:1 since you have one port per server the other cannot be utilized as the interconnects only have 16 downlink ports vs 32 in the servers :).

Also, the cost of these interconnects is just insane !

“That will give you 32x10Gbps uplink ports enough for 16 blades each having 2x10Gbps connections. All line rate, non blocking throughout the system. That is 320 Gigabits vs 80 Gigabits available on Cisco UCS.”

that is not true. The fabric interconnect you will connect to will be either flexfabric or flex10

Flexfabric the usual configuration is 4 x 10Gb and the remaining 4 are FC which gives you: 32/4 = 8:1 oversubscription.

if you have the interconnects active-active which is already hard enough to do you will have 4:1.

flex10 8 ports per interconnect in active-passive you will have: 32/8 = 4:1.

in active-active you will have 32/16 = 2:1.

so in reality out of the chassis you have 160 Gb just like UCS.

Also, if you stack the enclosures, you lose a port for stacking (actually two per interconnect).

Regards

Comment by kkk — August 9, 2013 @ 11:30 pm
Few more points regarding the above as well, if you chose flex10 active-active you will have to include also two VC FC which means more $$.

Regarding UCS that is not true it is just 80Gb, the 80Gb is per FEX and you have two FEX in a chassis so 160Gb.

As of the mezz cards on the blades the current Cisco UCS Virtual Interface Card 1280 offers two ports of 40Gb a total of 80Gb per card x 2 that is a 160Gb per server :).

The 2248PQ FEX actually offers 4 x 40Gb QSFP+ which 160Gb so if you have two in a chassis you actually have 8 ports each capable of 40Gb outside the chassis (HP has 16 ports) so in way they are similar.

Also, it terms of management and ease of use, personally I see the UCS Manager much easier to use VS VCEM (I have used both).

The UCS VIC is a plus as it offers to offload the CPU by bypassing the hypervisor and therefore free CPU cycles.

N7k OTV, VDC among many are also a big plus.

IMO, it is not just a talk of oversubcription and bandwidth. Both are different solutions that will do, but it is the architecture that truly differentiate these systems.

I think you are selling the UCS short, but then we can spend all week talking about Nexus OTV or C7000 VC or whatever and we wouldn’t get anywhere as in the end I suppose it is the buyers’ choice.

Sincerely

Comment by kkk — August 10, 2013 @ 12:27 am
Thanks Nate

Very much appreciated

Comment by Onassi — September 15, 2013 @ 6:38 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

TechOpsGuys.com Diggin' technology every day

February 27, 2010

Cisco UCS Networking falls short

25 Comments