TechOpsGuys.com Diggin' technology every day

8Aug/13Off

Nth Symposium 2013: HP Bladesystem vs Cisco UCS

TechOps Guy: Nate

Travel to HP Storage Tech Day/Nth Generation Symposium was paid for by HP; however, no monetary compensation is expected nor received for the content that is written in this blog.

I can feel the flames I might get for this post but I'm going to write about it anyway because I found it interesting. I have written about Cisco UCS in the past(very limited topics), have never been impressed with it, and really at the end of the day I can't buy Cisco on principle alone - doesn't matter if it was $1, I can't do it (in part because I know that $1 cost would come by screwing over many other customers to make that price possible for me).

Cisco has gained a lot of ground in the blade market since they came out with this system a few years ago and I think they are in 3rd place, maybe getting close to 2nd (last I saw 2nd was a very distant position behind HP).

So one of the keynotes (I guess you can call it that? it was on the main stage) was someone from HP who says they recently re-joined HP earlier in the year(or perhaps last year) after spending a couple of years at Cisco both selling and training their partners on how to sell UCS to customers. So obviously that was interesting to me, hearing this person's perspective on the platform. There was a separate break-out session on this topic that went into more detail but it was NDA-only so I didn't attend.

I suppose what was most striking is HP going out of their way to compare themselves against UCS, that says a lot right there. They never mentioned Dell or IBM stuff, just Cisco. So Cisco obviously has gotten some good traction (as sick as that makes me feel).

Out of band management

HP claims that Cisco has no out of band management on UCS, there are primary and backup data paths but if those are down then you are SOL. HP obviously has (optionally) redundant out of band management on their blade system.

I love out of band management myself, especially full lights out. My own HP VMware servers have dedicated in-band(1GbE) as well as the typical iLO out of band management interfaces. This is on top of the 4x10GbE and 2x4Gbps FC for storage. Lots of connectivity. When I was having issues with our Qlogic 10GbE NICs last year this came in handy.

Fault domains

This can be a minor issue - mainly an implementation one. Cisco apparently allows UCS to have a fault domain of up to 160 servers, vs HP is 16(one chassis). So you can, of course, lower your fault domain on UCS if you think about this aspect of things -- how many customers realize this and actually do something about it? I don't know.

HP Smart Update Manager

I found this segment quite interesting. HP touts their end to end updates mechanism which includes:

  • Patch sequencing
  • Driver + Firmware management
  • Unified service pack (1 per quarter)

HP claims Cisco has none of these, they cannot sequence patches, their management system does not manage drivers (it does manage firmware), and the service packs are not unified.

At this point the HP person pointed out a situation a customer faced recently where they used the UCS firmware update system to update the firmware on their platform. They then rebooted their ESX systems(I guess for the firmware to take effect), and the systems could no longer see the storage. It took the customer on the line with Cisco, VMware, and the storage company 20 hours until they figured out the problem was the drivers were out of sync with the firmware which was the reason for the downtime.

I recall a few years ago another ~20 hour outage on a Cisco UCS platform at a sizable company in Seattle for similar reasons, I don't know why in both cases it took so long to resolve, in the Seattle case there was a firmware bug (known bug) that was causing link flapping and as a result massive outage because I believe storage was not very forgiving to that. Fortunately Cisco had a patch but it took em ~20 hours of hard downtime to figure out the problem.

I'm sure there are similar stories for the HP end of things too... I have heard of some nasty issues with flex fabric and virtual connect.  There is one feature I like about flexfabric and virtual connect, that is the chassis-based MAC/WWN assignments. Everything else they can keep. I don't care about converged ethernet, I don't care about reducing my cable count(having a few extra fibre cables for storage per chassis really is nothing)...

Myself the only outages I have had that have lasted that long have been because of application stack failures, I think the longest infrastructure related outage I've been involved with in the past 15 years was roughly six, maybe eight hours.  I have had outages where it took longer than 20 hours to recover fully from - but the bulk of that time the system was running we just had recovery steps to perform. Never had a 20 hour outage where 15 hours into the thing nobody has any idea what is the problem or how to fix it.

Longest outage ever though was probably ~48-72 hours - and that was entirely application stack failure. That was the time we got all the senior software developers and architects in a room and asked them How do we fix this? and they gave us blank stares and said We don't know, it's not supposed to do this.  Not a good situation to be in!

Anyway, back on topic.

HP says since December 2011 they have released 9 critical updates, and Cisco have released 38 critical updates.

The case for intelligent compute

I learned quite a bit from this segment as well. Back in 2003 the company I was at was using HP and Compaq gear, it ran well though obviously was pretty expensive. Everything was DL360s, some DL380s, some DL580s. When it came time to do a big data center refresh we wanted to use SATA disks to cut some costs, so we ended up going with a white box company instead of HP (this was before HP had the DL100 series). I learned a lot from that experience, and was very happy to return to HP as a customer at my next company(though I certainly realize given the right workload HP's premium may not be worth it - but for highly consolidated virtualized stuff I really don't want to use anything else). The biggest issue I had with white box stuff was bad ram. It seemed to be everywhere. Not long after we started deployment I started using the Cerberus Test Suite to burn in our systems which caught a lot of it. Cerberus is awesome if you haven't tried it. I even used it on our HP gear mainly to drive CPU and memory to 100% usage to burn them in (no issues found).

HP Advanced ECC Outcomes

HP Advanced ECC Outcomes

HP has a technology called Advanced ECC, which they've had since I believe 1996, and is standard on at least all 300-series servers and up. 10 years ago when our servers rarely had more than 2GB of memory in them(I don't think we went 64-bit until at least 2005), Advanced ECC wasn't a huge deal, 2GB of memory is not much. Today, with my servers having 384GB ..I really refuse to run any high memory configuration without something like that. IBM has ChipKill, which is similar. Dell has nothing in this space. Not sure about Cisco(betting they don't, more on that in a moment).

HP's advanced ECC

HP Advanced ECC

HP talked about their massive numbers of sensors with some systems(I imagine the big ones!) having up to 1,600 sensors in them. (Here is a neat video on Sea of Sensors from one of the engineers who built them - one thing I learned is the C7000 chassis has 104 different fan speeds for maximum efficiency) HP introduced pre failure alerting in 1995, and has had pre failure warranties for a long time (perhaps back to 1995 as well). They obviously have complete hypervisor integration (one thing I wasn't sure of myself until recently, while upgrading our servers one of the new sticks went bad and an alert popped up in vCenter and I was able to evacuate the host and get the stick replaced without any impact -- this failure wasn't caught by burn-in, just regular processing, I didn't have enough spare capacity to take out too many systems to dedicate to burn-in at that point).

What does Cisco have? According to HP not much. Cisco doesn't treat the server with much respect apparently, they treat it as something that can fail and you just get it replaced or repaired at that point.

UCS: Post failure response

UCS: Post failure response

That model reminds me of what I call built to fail which is the model that public clouds like Amazon and stuff run on. It's pretty bad. Though at least in Cisco's case the storage is shared and the application can be restarted on another system easily enough, public cloud you have to build a new system and configure it from scratch.

The point here is obviously, HP works hard to prevent the outage in the first place, Cisco doesn't seem to care.

Simplicity Matters

I'll just put the full slide here there's not a whole lot to cover. HP's point here is the Cisco way is more complicated and seems angled to drive more revenue for the network. HP is less network oriented, and they show you can directly connect the blade chassis to a 3PAR storage system(s). I think HP's diagram is even a bit too complicated for all but the largest setups you could easily eliminate the distribution layer.

BladeSystem vs UCS: Simplicity matters

BladeSystem vs UCS: Simplicity matters

The cost of the 17th server

I found this interesting as well, Cisco goes around telling folks that their systems are cheaper, but they don't do an apples to apples comparison, they use a Smart Play Bundle, not a system that is built to scale.

HP put a couple of charts up showing the difference in cost between the two solutions.

BladeSystem vs UCS TCO: UCS Smart Play bundle

BladeSystem vs UCS TCO: UCS Smart Play bundle

BladeSystem vs UCS: UCS Built to scale

BladeSystem vs UCS TCO: UCS Built to scale

Portfolio Matters

Lastly HP went into some depth on comparing the different product portfolios and showed how Cisco was lacking in pretty much every area whether it was server coverage, storage coverage, blade networking options, software suites and the integration between them.

They talked about how Cisco has one way to connect networking to UCS, HP has many whether it is converged ethernet(similar to Cisco), or regular ethernet, native Fibre channel, Infiniband, and even SAS to external disk enclosures. The list goes on and on for the other topics but I'm sure you get the point. HP offers more options so you can build a more optimal configuration for your application.

BladeSystem vs UCS: Portfolio matters

BladeSystem vs UCS: Portfolio matters

Then they went into analyst stuff and I took a nap.

In reviewing the slide deck they do mention Dell once.. in the slide, not by the speaker -

HP vs Dell in drivers/firmware management

HP vs Dell in drivers/firmware management

By attending this I didn't learn anything that would affect my purchasing in the future, as I mentioned I won't buy Cisco for any reason already. But it was still interesting to hear about.

2Oct/12Off

Cisco drops price on Nexus vSwitch to free

TechOps Guy: Nate

I saw news yesterday that Cisco dropped the price of their vSwitch to $free, they still have a premium version which has a few more features.

I'm really not all that interested in what Cisco does, but what got me thinking again is the lack of participation by other vendors in making a similar vSwitch, of integrating their stack down to the hypervisor itself.

Back in 2009, Arista Networks launched their own vSwitch (though now that I read more on it, it wasn't a "real" vSwitch),  but you wouldn't know that by looking at their site today, I tried a bunch of different search terms I thought they still had it, but it seems the product is dead and buried. I have not heard myself of any other manufacturers making a software vSwitch of any kind (for VMware at least). I suppose customer demand is not there.

I asked Extreme back then if they would come out with a software vSwitch, and at the time at least they said there was no plans, instead they were focusing on direct attach, a strategy at least for VMware, appears to be dead for the moment, as the manufacturer of the NICs used to make it happen is no longer making NICs(as of about 1-2 years ago). I don't know why they have the white paper on their site still, I guess to show the concept, since you can't build it today.

Direct attach - at least taken to it's logical conclusion is a method to force all inter-VM switching out of the host and into the physical switches layer. I was told that this is possible with Extreme(and possibly others too) with KVM today (I don't know the details), just not with VMware.

They do have a switch that runs in VMware, though it's not a vSwitch, more of a demo/type thing where you can play with commands. Their switching software has run on Intel CPUs since the initial release in 2003 (and they still have switches today that use Intel CPUs), so I imagine the work involved is not herculean to make a vSwitch happen if they wanted to.

I have seen other manufacturers (Brocade at least if I remember right) that were also looking forward to direct attach as the approach to take instead of a vSwitch. I can never remember the official networking name for the direct attach technology...

With VMware's $1.2B purchase of Nicira it seems they believe the future is not direct attach.

Myself I like the concept of switching within the host, though I have wanted to have an actual switching fabric (in hardware) to make it happen. Some day..

Off topic - but it seems the global economic cycle has now passed the peak and now for sure headed down hill? One of my friends said yesterday the economy is "complete garbage", I see tech company after company missing or warning, layoffs abound, whether it's massive layoffs at HP, or smaller layoffs at Juniper that was announced this morning. Meanwhile the stock market is hitting new highs quite often.

I still maintain we are in a great depression. Lots of economists try to dispute that, though if you take away the social safety nets that we did not have in the '20s and '30s during the last depression I am quite certain you'd see massive numbers of people lined up at soup kitchens and the like. I think the economists try to dispute it more because they fear a self fulfilling prophecy rather than their willingness to have a serious talk on the subject. Whether or not we can get out of the depression, I don't know. We need a catalyst - last time it was WWII, at least the last two major economic expansions were bubbles, it's been a long time since we've had a more normal economy. If we don't get a catalyst then I see stagnation for another few years, perhaps a decade while we drift downwards towards a more serious collapse (something that would make 2008 look trivial by comparison).

10Apr/12Off

What’s wrong with this picture?

TechOps Guy: Nate

I was reading this article from our friends at The Register which has this picture for an entry level FlexPod from NetApp/Cisco.

It just seems wrong. I mean the networking stuff. Given NetApp's strong push for IP-based storage, one would think an entry level solution would simply have 2x48 port 10gig stackable switches, or whatever Cisco's equivalent is(maybe this is it).

This solution is supposed to provide scalability for up to 1,000 users - what those 1,000 users are actually doing I have no idea, does it mean VDI? Database? web site users? File serving users? ?????????????

It's also unclear in the article if this itself scales to that level or it just provides the minimum building blocks to scale to 1,000 users (I assume the latter) - and if so what does 1,000 user configuration look like? (or put another way how many users does the below picture support)

I'll be the first to admit I'm ignorant as to the details and the reason why Cisco needs 3 different devices with these things but whatever the reason seems major overkill for an entry level solution assuming the usage of IP-based storage.

The new entry level flex pod

The choice of a NetApp FAS2000 array seems curious to me - at least given the fact that it does not appear to support that Flex Cache stuff which they like to tout so much.

Tagged as: , 1 Comment
18Oct/11Off

Cisco’s new 10GbE push – a little HP and Dell too

TechOps Guy: Nate

Just got done reading this from our friends at The Register.

More than anything else this caught my eye:

On the surface it looks pretty impressive, I mean it would be interesting to see exactly how Cisco configured the competing products as in which 60 Juniper devices or 70 HP devices did they use and how were they connected?

One thing that would of been interesting to call out in such a configuration, is the number of logical devices needed for management. For example I know Brocade's VDX product is some fancy way of connecting lots of devices sort of like more traditional stacking just at a larger scale for ease of management. I'm not sure whether or not the VDX technology extends to their chassis product as Cisco's configuration above seems to imply using chassis switches. I believe Juniper's Qfabric is similar. I'm not sure if HP or Arista have such technology(I don't believe they do). I don't think Cisco does - but they don't claim to need it either with this big switch. So a big part of the question is managing so many devices, or just managing one. Cost of the hardware/software is one thing..

HP recently announced a revamp of their own 10GbE products, at least the 1U variety. I've been working off and on with HP people recently and there was a brief push to use HP networking equipment but they gave up pretty quick. They mentioned they were going to have "their version" of the 48-port 10-gig switch soon, but it turns out it's still a ways away - early next year is when it's supposed to ship, even if I wanted it  (which I don't) - it's too late for this project.

I dug into their fact sheet, which was really light on information to see what, if anything stood out with these products. I did not see anything that stood out in a positive manor, I did see this which I thought was kind of amusing -

Industry-leading HP Intelligent Resilient Framework (IRF) technology radically simplifies the architecture of server access networks and enables massive scalability—this provides up to 300% higher scalability as compared to other ToR products in the market.

Correct me if I'm wrong - but that looks like what other vendors would call Stacking, or Virtual Chassis. An age-old technology, but the key point here was the up to 300% higher scalability. Another way of putting it is at least 50% less scalable - when your comparing it to the Extreme Networks Summit X670V(which is shipping I just ordered some).

The Summit X670 series is available in two models: Summit X670V and Summit X670. Summit X670V provides high density for 10 Gigabit Ethernet switching in a small 1RU form factor. The switch supports up to 64 ports in one system and 448 ports in a stacked system using high-speed SummitStack-V160*, which provides 160 Gbps throughput and distributed forwarding. The Summit X670 model provides up to 48 ports in one system and up to 352 ports in a stacked system using SummitStack-V longer distance (up to 40 km with 10GBASE-ER SFP+) stacking technology.

In short, it's twice as scalable as the HP IRF feature, because it goes up to 8 devices (56x10GbE each), and HP's goes up to 4 devices (48x10GbE each -- or perhaps they can do 56 too with breakout cables since both switches have the same number of physical 10GbE and 40GbE ports).

The list price on the HP switches is WAY high too, The Register calls it out at $38,000 for a 24-port switch. The X670 from Extreme has a list price of about $25,000 for 48-ports(I see it on-line for as low as about $17k). There was no disclosure of HP's pricing for their 48-port switch.

Extreme has another 48-port switch which is cheaper (almost half the cost if I recall right - I see it on-line going for as low as $11,300) but it's for very specialized applications where latency is really important. If I recall right they removed the PHY (?) from the switch which dramatically reduces functionality and introduces things like very short cable length limits but also slashes the latency (and cost). You wouldn't want to use those for your VMware setup(well if you were really cost constrained these are probably better than some other alternatives especially if your considering this or 1GbE), but you may want them if your doing HPC or something with shared memory or high frequency stock trading (ugh!).

The X670 also has (or will have? I'll find out soon) a motion sensor on the front of the switch which I thought was curious, but seems like a neat security feature, being able to tell if someone is standing in front of your switch screwing with it. It also apparently has the ability(or will have the ability) to turn off all of the LEDs on the switch when someone gets near it, and turn them back on when they go away.

(ok back on topic, Cisco!)

I looked at the Cisco slide above, and thought to myself, really, can they be that far ahead? I certainly do not go out on a routine basis and see how many devices and connectivity between them that I need to achieve  X number of line rate ports, I'll keep it simple, if you need a large number of line rate ports just use a chassis product(you may need a few of them). It is interesting to see though, assuming it's anywhere close to being accurate.

When I asked myself the question "Can they be that far ahead?" I wasn't thinking of Cisco, I think I'm up to 7 readers now -- you know me better than that! :)

I was thinking of the Extreme Networks Black Diamond X-Series which was announced (note not yet shipping...) a few months ago.

  • Cisco claims to do 768 x 10GbE ports in 25U (Extreme will do it in 14.5U)
  • Cisco claims to do 10W per 10GbE/port (Extreme will do it in 5W/port)
  • Cisco claims to do it with 1 device .. Well that's hard to beat but Extreme can meet them, it's hard to do it with less than one device.
  • Cisco's new top end taps out at very respectable 550Gbit per slot (Extreme will do 1.2Tb)
  • Cisco claims to do it with a list price of $1200/port. I don't know what Extreme's pricing will be but typically Cisco is on the very high end for costs.

Though I don't know how Cisco gets to 768 ports, Extreme does it via 40GbE ports and breakout cables (as far as I know), so in reality the X-series is a 40GbE switch (and I think 40GbE only - to start with unless you use the break out cables to get to 10GbE).  It was a little over a year ago that Extreme was planning on shipping 40GbE at a cost of $1,000/port. Certainly the X-series is a different class of product than what they were talking about a while ago, but prices have also come down since.

X-Series is shipping "real soon now'.  I'm sure if you ask them they'll tell you more specifics.

It is interesting to me, and kind of sad how far Force10 has fallen in the 10GbE area, I mean they seemed to basically build themselves on the back of 10GbE(or at least tried to), but I look at their current products on the very high end, and short from the impressive little 40GbE switch they have, they seem to top out at 140 line rate 10GbE in 21U. Dell will probably do well with them, I'm sure it'll be a welcome upgrade to those customers using Procurve, uh I mean Powerconnect? That's what Dell call(ed) their switches right?

As much as it pains me I do have to give Dell some props for doing all of these acquisitions recently and beefing up their own technology base, whether it's in storage, or networking they've come a long way (more so in storage, need more time to tell in networking). I have not liked Dell myself for quite some time, a good chunk of it is because they really had no innovation, but part of it goes back to the days before Dell shipped AMD chips and Dell was getting tons of kick backs from Intel for staying an Intel exclusive provider.

In the grand scheme of things such numbers don't mean a whole lot, I mean how many networks in the world can actually push this kind of bandwidth? Outside of the labs I really think any organization would be very hard pressed to need such fabric capacity, but it's there -- and it's not all that expensive.

I just dug up an old price list I had from Extreme - from late November 2005. An 6-port 10GbE module for their Black Diamond 10808 switch (I had two at the time) had a list price of $36,000. For you math buffs out there that comes to $9,000 per line rate port.

That particular product was oversubscribed (hence it not being $6,000/port) as well having a mere 40Gbps of switch fabric capacity per slot, or a total of 320Gbps for the entire switch (it was marketed as a 1.2Tb switch but hardware never came out to push the backplane to those levels - I had to dig into the depths of the documentation to find that little disclosure - naturally I found it after I purchased, didn't matter for us though I'd be surprised if we pushed more than 5Gbps at any one point!). If I recall right the switch was 24U too. My switches were 1GbE only, cost reasons :)

How far we've come..

4Nov/10Off

Chicken and the egg

TechOps Guy: Nate

Random thought time! --  came across an interesting headline on Chuck's Blog - Attack of the Vblock Clones.

Now I'm the first to admit I didn't read the whole thing but the basic gist he is saying if you want a fully tested integrated stack (of course you know I don't like these stacks they restrict you too much, the point of open systems is you can connect many different types of systems together and have them work but anyways), then you should go with their VBlock because it's there now, and tested, deployed etc. Others recently announced initiatives are responses to the VBlock and VCE, Arcadia(sp?) etc.

I've brought up 3cV before, something that 3PAR coined back almost 3 years ago now. Which is, in their words "Validated Blueprint of 3PAR, HP, and VMware Products Can Halve Costs and Floor Space".

And for those that don't know what 3cV is, a brief recap -

The Elements of 3cV
3cV combines the following products from 3PAR, HP, and VMware to deliver the virtual data center:

  • 3PAR InServ Storage Server featuring Virtual Domains and thin technologies—The leading utility storage platform, the 3PAR InServ is a highly virtualized tiered-storage array built for utility computing. Organizations creating virtualized IT infrastructures for workload consolidation use the 3PAR InServ to reduce the cost of allocated storage capacity, storage administration, and the SAN infrastructure.
  • HP BladeSystem c-Class—The No. 1 blade infrastructure on the market for datacenters of all sizes, the HP BladeSystem c-Class minimizes energy and space requirements and increases administrative productivity through advantages in I/O virtualization, power and cooling, and manageability. (1)
  • VMware Infrastructure—Infrastructure virtualization suite for industry-standard servers. VMware Infrastructure delivers the production-proven efficiency, availability, and dynamic management needed to build the responsive data center.

Sounds to me that 3cV beat VBlock to the punch by quite a ways. It would have been interesting to see how Dell would of handled the 3cV solution had they managed to win the bidding war, given they don't have anything that competes effectively with c-Class. But fortunately HP won out so 3cV can be just that much more official.

It's not sold as a pre-packaged product I guess you could say, but I mean how hard is it to say I need this much CPU, this much ram, this much storage HP go get it for me. Really it's not hard. The hard part is all the testing and certification. Even if 3cV never existed you can bet your ass that it would work regardless. It's not that complicated, really. Even if Dell managed to buy 3PAR and kill off the 3cV program because they wouldn't want to directly promote HP's products, you could still buy the 3PAR from Dell and the blades from HP and have it work. But of course you know that.

The only thing missing from 3cV is I'd like a more powerful networking stack, or at least sFlow support. I'll take Flex10 (or Flexfabric) over Cisco any day of the week but I'd still like more.

I don't know why this thought didn't pop into my head until I read that headline, but it gave me something to write about.

But whatever, that's my random thought of the day/week.

9Mar/10Off

Yawn..

TechOps Guy: Nate

I was just watching some of my daily morning dose of CNBC and they had all these headlines about how Cisco was going to make some earth shattering announcement("Change the internet forever"), and then the announcement hit, some new CRS-1 router, that claimed 12x faster performance than the competition. So naturally I was curious. Robert Paisano on the floor of the NYSE was saying how amazing it was that the router could download the library of congress in 1 second(he probably didn't understand the router would have no place to put it).

If I want a high end router that means I'm a service provider and in that case my personal preference would be for Foundry Networks (now Brocade). Juniper makes good stuff too of course though honestly I am not nearly as versed in their technology. Granted I'll probably never work for such a company as those companies are really big and I prefer small companies.

But in any case wanted to illustrate (another) point. According to Cisco's own site, their fastest single chassis system has a mere 4.48 terrabits of switching capacity. This is called the CRS-3, which I don't even see listed as a product on their site, perhaps it's yet to come. The biggest, baddest product they have on their site right now is a 16-slot CRS-1. This according to their own site, has a total switching capacity of a paltry 1.2Tbps, and even worse a per-slot capacity of 40Gbps (hello 2003).

So take a look at the Foundry Networks (the Brocade name makes me shudder, I have never liked them) , their NetIron XMR series. From their documentation the "total switching fabric", ranges from 960 gigabits on the low end to 7.68 terrabits on the high end. Switch forwarding capacity ranges from 400 gigabits to 3.2 terrabits. This comes out to 120 gigabits of full duplex switch fabric per slot (same across all models). While I haven't been able to determine precisely how long XMR has been on the market I have found evidence that it is at least nearly 3 years old.

To put it in another perspective, in a 48U rack with the new CRS-3 you can get 4.48 terrabits of switching fabric(1 chassis is 48U). With Foundry in the same rack you can get one XMR32k and one XMR16k(combined size 47U) for a total of 11.52 terrabits of switching fabric. More than double the fabric in the same space, from a product that is 3 years old. And as you can imagine in the world of IT, 3 years is a fairly significant amount of time.

And while I'm here and talking about Foundry and Brocade take a look at this from Brocade, it's funny it's like something I would write. Compares the Brocade Director switches vs Cisco ("Numbers don't lie"). One of my favorite quotes:

To ensure accuracy, Brocade hired an independent electrician to test both the Brocade 48000 and the Cisco MDS 9513 and found that the 120 port Cisco configuration actually draws 1347 watts, 45% higher than Cisco's claim of 931 watts. In fact, an empty 9513 draws more electrical current (5.6 amps) than a fully-populated 384 port Brocade 48000 (5.2 amps). Below is Brocade's test data. Where are Cisco's verified results?

Another

With 33% more bandwidth per slot (64Gb vs 48Gb), three times as much overall bandwidth (1.5Tb vs 0.5 Tb) and a third the power draw, the Brocade 48000 is a more scalable building block, regardless of the scale, functionality or lifetime of the fabric. Holistically or not, Brocade can match the "advanced functionality" that Cisco claims, all while using far less power and for a much [?? I think whoever wrote it was in a hurry]

That's just too funny.

Tagged as: Comments Off
27Feb/10Off

Cisco UCS Networking falls short

TechOps Guy: Nate

UPDATED Yesterday when I woke up I had an email from Tolly in my inbox, describing a new report comparing the networking performance of the Cisco UCS vs the HP c Class blade systems. Both readers of the blog know I haven't been a fan of Cisco for a long time(about 10 years, since I first started learning about the alternatives), and I'm a big fan of HP c Class (again never used it, but planning on it). So as you could imagine I couldn't resist what it said considering the amount of hype that Cisco has managed to generate for their new systems(the sheer number of blog posts about it make me feel sick at times).

I learned a couple things from the report that I did not know about UCS before (I often times just write their solutions off since they have a track record of under performance, over price and needless complexity).

The first was that the switching fabric is external to the enclosure, so if two blades want to talk to each other that traffic must leave the chassis in order to do so, an interesting concept which can have significant performance and cost implications.

The second is that the current UCS design is 50% oversubscribed, which is what this report targets as a significant weakness of the UCS vs the HP c Class.

The mid plane design of the c7000 chassis is something that HP is pretty proud of(for good reason), capable of 160Gbps full duplex to every slot, totaling more than 5 Terrabits of fabric, they couldn't help but take shots at IBM's blade system and comment on how it is oversubscribed and how you have to be careful in how you configure the system based on that oversubscription when I talked to them last year.

This c7000 fabric is far faster than most high end chassis Ethernet switches, and should allow fairly transparent migration to 40Gbps ethernet when the standard arrives for those that need it. In fact HP already has 40Gbps Infiniband modules available for c Class.

The test involved six blades from each solution, when testing throughput of four blades both solutions performed similarly(UCS was 0.76Gbit faster). Add two more blades and start jacking up the bandwidth requirements. HP c Class scales linerally as the traffic goes up, UCS seems to scale lineraly in the opposite direction. End result is with 60Gbit of traffic being requested(6 blades @ 10Gbps), HP c Class managed to choke out 53.65Gbps, and Cisco UCS managed to cough up a mere 27.37Gbps. On UCS, pushing six blades at max performance actually resulted in less performance than four blades at max performance, significantly less. Illustrating serious weaknesses in the QoS on the system(again big surprise!).

The report mentions putting Cisco UCS in a special QoS mode for the test because without this mode performance was even worse. There is only 80Gbps of fabric available for use on the UCS(4x10Gbps full duplex). You can get a second fabric module for UCS but it cannot be used for active traffic, only as a backup.

UPDATE - A kind fellow over at Cisco took notice of our little blog here(thanks!!) and wanted to correct what they say is a bad test on the part of Tolly, apparently Tolly didn't realize that the fabrics could be used in active-active(maybe that complexity thing rearing it's head I don't know). But in the end I believe the test results are still valid, just at an incorrect scale. Each blade requires 20Gbps of full duplex fabric in order to be non blocking throughout. The Cisco UCS chassis provides for 80Gbps of full duplex fabric, allowing 4 blades to be non blocking. HP by contrast allows up to three dual port Flex10 adapters per half height server which requires 120Gbps of full duplex fabric to support at line rate. Given each slot supports 160Gbps of fabric, you could get another adapter in there but I suspect there isn't enough real estate on the blade to connect the adapter! I'm sure 120Gbps of ethernet on a single half height blade is way overkill, but if it doesn't radically increase the cost of the system, as a techie myself I do like the fact that the capacity is there to grow into.

Things get a little more complicated when you start talking about non blocking internal fabric(between blades) and the rest of the network, since HP designs their switches to support 16 blades, and Cisco designs their fabric modules to support 8. You can see by the picture of the Flex10 switch that there are 8 uplink ports on it, not 16, but it's pretty obvious that is due to space constraints because the switch is half width. END UPDATE

The point I am trying to make here isn't so much the fact that HP's architecture is superior to that of Cisco's. It's not that HP is faster than Cisco. It's the fact that HP is not oversubscribed and Cisco is. In a world where we have had non blocking switch fabrics for nearly 15 years it is disgraceful that a vendor would have a solution where six servers cannot talk to each other without being blocked. I have operated 48-port gigabit swtiches which have 256 gigabits of switching fabric, that is more than enough for 48 systems to talk to each other in a non blocking way. There are 10Gbps switches that have 500-800 gigabits of switching fabric allowing 32-48 systems to talk to each other in a non blocking way. These aren't exactly expensive solutions either. That's not even considering the higher end backplane and midplane based system that run into the multiple terrabits of switching fabrics connecting hundreds of systems at line rates.

I would expect such a poor design to come from a second tier vendor, not a vendor that has a history of making networking gear for blade switches for several manufacturers for several years.

So say take it worst case, what if you want completely non blocking fabric from each and every system? For me I am looking to HP c Class and 10Gbs Virtual Connect mainly for inttra chassis communication within the vSphere environment. In this situation with a cheap configuration on HP, you are oversubscribed 2:1 when talking outside of the chassis. For most situations this is probably fine, but say that wasn't good enough for you. Well you can fix it by installing two more 10Gbps switches on the chassis (each switch has 8x10GbE uplinks). That will give you 32x10Gbps uplink ports enough for 16 blades each having 2x10Gbps connections. All line rate, non blocking throughout the system. That is 320 Gigabits vs 80 Gigabits available on Cisco UCS.

HP doesn't stop there, with 4x10Gbps switches you've only used up half of the available I/O slots on the c7000 enclosure, can we say 640 Gigabits of total non-blocking ethernet throughput vs 80 gigabits on UCS(single chassis for both) ? I mean for those fans of running vSphere over NFS, you could install vSphere on a USB stick or SD card and dedicate the rest of the I/O slots to networking if you really need that much throughput.

Of course this costs more than being oversubscribed, the point is the customer can make this decision based on their own requirements, rather than having the limitation be designed into the system.

Now think about this limitation in a larger scale environment. Think about the vBlock again from that new EMC/Cisco/VMware alliance. Set aside the fact that it's horribly overpriced(I think mostly due to EMC's side). But this system is designed to be used in large scale service providers. That means unpredictable loads from unrelated customers running on a shared environment. Toss in vMotion and DRS, you could be asking for trouble when it comes to this oversubscription stuff, vMotion (as far as I know) relies entirely on CPU and memory usage. At some point I think it will take storage I/O into account as well. I haven't heard of it taking into account network congestion, though in theory it's possible. But it's much better to just have a non blocking fabric to begin with, you will increase your utilization, efficiency, and allow you to sleep better at night.

Makes me wonder how does Data Center Ethernet (whatever it's called this week?) hold up under these congestion conditions that the UCS suffers from? Lots of "smart" people spent a lot of time making Ethernet lossless only to design the hardware so that it will incur significant loss in transit. In my experience systems don't behave in a predictable manor when storage is highly constrained.

I find it kind of ironic that a blade solution from the world's largest networking company would be so crippled when it came to the network of the system. Again, not a big surprise to me, but there are a lot of Cisco kids out there I see that drink their koolaid without thinking twice, and of course I couldn't resist to rag again on Cisco.

I won't bother to mention the recent 10Gbps Cisco Nexus test results that show how easily you can cripple it's performance as well(while other manufacturers perform properly at non-blocking line rates), maybe will save that for another blog entry.

Just think, there is more throughput available to a single slot in a HP c7000 chassis than there is available to the entire chassis on a UCS. If you give Cisco the benefit of the second fabric module, setting aside the fact you can't use it in active-active, the HP c7000 enclosure has 32 times the throughput capacity of the Cisco UCS. That kind of performance gap even makes Cisco's switches look bad by comparison.

24Nov/09Off

Legacy CLI

TechOps Guy: Nate

One of the bigger barriers to adoption of new equipment often revolves around user interface. If people have to adapt to something radically different some of them naturally will resist. In the networking world, switches in particular Extreme Networks has been brave enough to go against the grain, toss out the legacy UI and start from scratch(they did this more than a decade ago). While most other companies out there tried to make their systems look/feel like Cisco for somewhat obvious reasons.

Anyways I've always though highly of them for doing that, don't do what everyone else is doing just because they are doing it that way, do it better(if you can). I think they have accomplished that. Their configuration is almost readable in plain english, the top level commands are somewhat similar to 3PAR in some respects:

  • create
  • delete
  • configure
  • unconfigure
  • enable
  • disable

Want to add a vlan ? create vlan Want to configure that vlan? configure vlan (or config vlan for short, or config <vlan name> for shorter). Want to turn on sFlow? enable sflow. You get the idea. There are of course many other commands but the bulk of your work is spent with these. You can actually login to an Extreme XOS-based switch that is on the internet, instructions are here. It seems to be a terminal server and you connect on the serial port as you can do things like reboot the switch and wipe out the configuration and you don't lose connectivity or anything. If you want a more advanced online lab they have them, but they are not freely accessible.

Anyways back on topic, legacy cli. I first heard rumors of this about five years ago when I was looking at getting(and eventually did) a pair of Black Diamond 10808 switches which at the time was the first and only switch that ran Extremeware XOS.  Something interesting I learned recently which I had no idea was the case was that Extremeware XOS is entirely XML based. I knew the configuration file was XML based, but they take it even further than that, commands issued on the CLI are translated into XML objects and submitted to the system transparently. Which I thought was pretty cool.

About three years ago I asked them about it again and the legacy cli project had been shelved they said due to lack of customer interest. But now it's back, and it's available.

Now really back on topic. The reason for this legacy cli is so that people that are used to using the 30+ year old broken UI that others like Cisco use can use something similar on Extreme if they really want to. At least it should smooth out a migration to the more modern UI and concepts associated with Extremeware XOS(and Extremeware before it), an operating system that was built from the ground up with layer 3 services in mind(and the UI experience shows it). XOS was also built from the ground up(First released to production in December 2003) to support IPv6 as well. I'm not a fan of IPv6 myself but that's another blog entry.

It's not complete yet, right now it's limited to most of the layer 2 functions of the switch, layer 3 stuff is not implimented at this point. I don't know if it will be implimented I suppose it depends on customer feedback. But anyways if you have a hard time adjusting to a more modern world, this is available for use. The user guide is here.

If you are like me and like reading technical docs, I highly reccomend the Extremware XOS Concepts Guide. There's so much cool stuff in there I don't know where to begin, and it's organized so well! They really did an outstanding job on their docs.

3Nov/09Off

The new Cisco/EMC/Vmware alliance – the vBlock

TechOps Guy: Nate

Details were released a short time ago thanks to The Register on the vBlock systems coming from the new alliance of Cisco and EMC, who dragged along Vmware(kicking and screaming I'm sure). The basic gist of it is to be able to order a vBlock and have it be a completely integrated set of infrastructure ready to go, servers and networking from Cisco, storage from EMC, and Hypervisor from VMware.

vBlock0 consists of rack mount servers from Cisco, and unknown EMC storage, price not determined yet

vBlock1 consists 16-32 blade servers from Cisco and EMC CX4-480 storage system. Price ranges from $1M - 2.8M

vBlock2 consists of 32-64 blade servers from Cisco and an EMC V-MAX. Starting price $6M.

Sort of like FCoE, sounds nice in concept but the details fall flat on their face.

First off is the lack of choice. That is Cisco's blades are based entirely on the Xeon 5500s, which are, you guessed it limited to two sockets. And at least at the moment limited to four cores. I haven't seen word yet on compatibility with the upcoming 8-core cpus if they are socket/chip set compatible with existing systems or not(if so, wonderful for them..). Myself I prefer more raw cores, and AMD is the one that has them today(Istanbul with 6 cores, Q1 2010 with 12 cores). But maybe not everyone wants that so it's nice to have choice. In my view HP blades win out here for having the broadest selection of offerings from both Intel and AMD. Combine that with their dense memory capacity(16 or 18 DIMM slots on a half height blade), allows you up to 1TB of memory in a blade chassis in an afforadable confiugration using 4GB DIMMs. Yes Cisco has their memory extender technology but again IMO at least with a dual socket Xeon 5500 that it is linked to the CPU core:memory density is way outta whack. It may make more sense when we have 16, 24, or even 32 cores on a system using this technology. I'm sure there are niche applications that can take advantage of it on a dual socket/quad core configuration, but the current Xeon 5500 is really holding them back with this technology.

Networking, it's all FCoE based, I've already written a blog entry on that, you can read about my thoughts on FCoE here.

Storage, you can see how even with the V-MAX EMC hasn't been able to come up with a storage system that can start on the smaller end of the scale, something that is not insanely unaffordable to 90%+ of the organizations out there. So on the more affordable end they offer you a CX4. If you are an organization that is growing you may find yourself outliving this array pretty quickly. You can add another vBlock, or you can rip and replace it with a V-MAX which will scale much better, but of course the entry level pricing for such a system makes it unsuitable for almost everyone to try to start out with even on the low end.

I am biased towards 3PAR of course as both of the readers of the blog know, so do yourself a favor and check out their F and T series systems, if you really think you want to scale high go for a 2-node T800, the price isn't that huge, the only difference between a T400 and a T800 is the backplane. They use "blocks" to some extent, blocks being controllers(in pairs, up to four pairs), disk chassis(40 disks per chassis, up to 8 per controller pair I think). Certainly you can't go on forever, or can you? If you don't imagine you will scale to really massive levels go for a T400 or even a F400.  In all cases you can start out with only two controllers the additional cost to give you the option of an online upgrade to four controllers is really trivial, and offers nice peace of mind. You can even go from a T400 to a T800 if you wanted, just need to switch out the back plane (downtime involved). The parts are the same! the OS is the same! How much does it cost? Not as much as you would expect. When 3PAR announced their first generation 8-node system 7 years ago, entry level price started at $100k. You also get nice things like their thin built in technology which will allow you to run those eager zeroed VMs for fault tolerance and not consume any disk space or I/O for the zeros. You can also get multi level synchronous/asynchronous replication for a fraction of the cost of others. I could go on all day but you get the idea. There are so many fiber ports on the 3PAR arrays that you don't need a big SAN infrastructure just hook your blade enclosures directly to the array.

And as for networking hook your 10GbE Virtual Connect switches on your c Class enclosures to your existing infrastructure. I am hoping/expecting HP to support 10GbaseT soon, and drop the CX4 passive copper cabling. The Extreme Networks Summit X650 stands alone as the best 1U 10GbE (10GbaseT or SFP+) switch on the market. Whether it is line rate, or full layer 3, or high speed stacking, or lower power consuming 10GbaseT vs fiber optics,  or advanced layer 3 networking protocols to simplify management,  price and ease of use -- nobody else comes close. If you want bigger check out the Black Diamond 8900 series.

Second you can see with their designs that after the first block or two the whole idea of a vBlock sort of falls apart. That is pretty quickly your likely to just be adding more blades(especially if you have a V-MAX), rather than adding more storage and more blades.

Third you get the sense that these aren't really blocks at all. The first tier is composed of rack mount systems, the second tier is blade systems with CX4, the third tier is blade systems with V-MAX. Each tier has something unique which hardly makes it a solution you can build as a "block" as you might expect from something called a vBlock. Given the prices here I am honestly shocked that the first tier is using rack mount systems. Blade chassis do not cost much, I would of expected them to simply use a blade chassis with just one or two blades in it. Really shows that they didn't spend much time thinking about this.

I suppose if you treated these as blocks in their strictest sense and said yes we won't add more than 64 blades to a V-MAX, and add it like that you could get true blocks, but I can imagine the amount of waste doing something like that is astronomical.

I didn't touch on Vmware at all, I think their solution is solid, and they have quite a bit of choices. I'm certain with this vBlock they will pimp the enterprise plus version of software, but I really don't see a big advantage of that version with such a small number of physical systems(a good chunk of the reason to go to that is improved management with things like host profiles and distributed switches). As another blogger recently noted, Vmware has everything to lose out of this alliance, I'm sure they have been fighting hard to maintain their independence and openness, this reeks of the opposite, they will have to stay on their toes for a while when dealing with their other partners like HP, IBM, NetApp, and others..