July 27, 2012

FCoE: stillborn 3 years later

At least the hype for the most part has died off, as the market has not really transitioned much over to FCoE since it’s launch a few years ago. I mentioned it last year, and griped about it in one of my early posts in 2009 around when Cisco was launching their UCS, along with NetApp both proclaiming FCoE was going to take over.

Brocade has been saying for some time that FCoE adoption was lacking, a few short months ago Emulex came out and said about the same, and more recently Qlogic chiming in with another me too story.

FCoE – the emulated version of Fibre Channel running over Ethernet – is not exactly selling like hot cakes and is not likely to do so anytime soon, so all that FCoE-flavoured Ethernet development is not paying off yet.

More and more switches out there are supporting the Data Center Bridging protocols but those die hard Fibre Channel users aren’t showing much interest in it. I imagine the problem is more political than anything else at many larger organizations. The storage group doesn’t trust the networking group and would rather have control over their own storage network, and not share anything with the network group. I’ve talked to several folks over recent years where storage divisions won’t even consider something that is exclusively iSCSI for example for the company because it means the networking folks have to get involved and that’s not acceptable. Myself, I have had a rash of issues with certain Qlogic 10GbE network cards over the past 7 months which makes me really glad I’m not reliant on ethernet-based storage (there is some of it but all of the critical stuff is good ‘ol Fibre channel – on entirely Qlogic infrastructure again). The rash of issues finally ressurected a bad set of memories I had trying to troubleshoot network issues on some Broadcom NICs a few years ago with regards to something buggy called MSI-X. It took about six months to track that problem down, the symptoms were just so bizarre. My current issues with 10GbE NICs aren’t all that critical because of the level of redundancy that I have and the fact that storage is run over regular ‘ol FC.

I know Qlogic is not alone in their issues with 10GbE, a little company by the name of Clearwire in Seattle I know had what amounted to something like a 14 hour outage a year or two ago on their Cisco UCS platform because of bugs in the Cisco stuff that they had(I think it was bugs around link flapping or something). I know others have had issues too, it sort of surprises me how long 10GbE has been around and we still seem to have quite a few issues with it, at least on the HBA side.

iSCSI has had it’s issues too over the years, at least iSCSI in the HBAs, I was talking to one storage company late last year who has an iSCSI-only product and they said how iSCSI is ready for prime time, but after further discussion they clarified well you really only should use it with offloading NIC X or Y or software stack Z. iSCSI was a weak point for a long time on the 3PAR platform, they’ve addressed it to some extent on the new V-series, but I wouldn’t be surprised if they still don’t support anything other than pure software initiators.

TCP is very forgiving to networking issues, storage of course is not. In the current world of virtualization with people consolidating things on fewer, larger systems, the added cost of FC really isn’t that much. I wouldn’t be slapping FC cards in swaths of $3-5k servers, most servers that run VMs have gobs of memory which of course drives the price quite a bit higher than that.

Data center bridging really does nothing when your NIC decides to stop forwarding jumbo frame packets, or when the link starts flapping, or when the firmware crashes, or if the ASIC overheats. The amount of time it often takes for software to detect a problem with the link and fail over to a backup link alone is big enough to cause major issues with storage if it’s a regular occurrence. All of the networks I’ve worked on at least in the past decade or so always have operated at a tiny fraction of their capacity, the bottlenecks are typically things like firewalls between zones (and whenever possible I prefer to rely on switch ACLs to handle that).

February 14, 2011

Lackluster FCoE adoption

I wrote back in 2009, wow was it really that long ago, one of my first posts, about how I wasn’t buying into the FCoE movement, at first glance it sounded really nice until you got into the details and then that’s when it fell apart. Well it seems that I’m not alone, not long ago in an earnings announcement Brocade said they were seeing lackluster FCoE adoption, lower than they expected.

He discussed what Stifel’s report calls “continued lacklustre FCoE adoption.” FCoE is the running of Fibre Channel storage networking block-access protocol over Ethernet instead of using physical Fibre Channel cabling and switchgear. It has been, is being, assumed that this transition to Ethernet would happen, admittedly taking several years, because Ethernet is cheap, steamrolls all networking opposition, and is being upgraded to provide the reliable speed and lossless transmission required by Fibre Channel-using devices.

Maybe it’s just something specific to investors, I was at a conference for Brocade products I think it was in 2009 even, where they talked about FCoE among many other things and if memory serves they didn’t expect much out of FCoE for several years so maybe it was management higher up that was setting the wrong expectations or something I don’t know.

Then more recently I saw this article posted from slashdot which basically talks about the same thing.

Even today I am not sold of FCoE, I do like Fibre Channel as a protocol but don’t see a big advantage at this point to running it over native Ethernet. These days people seem to be consolidating on fewer, larger systems, I would expect the people more serious about consolidation are using quad socket systems, and much much larger memory configurations (hundreds of gigs). You can power that quad socket system with hundreds of gigs of memory with a single dual port 8Gbps fibre channel HBA.Those that know about storage and random I/O understand more than anyone how much I/O it would really take to max out an 8Gbps Fibre channel card, your not likely to ever really manage to do it with a virtualization workload, even with most database workloads. And if you do you’re probably running at a 1:1 ratio of storage arrays to servers.

The cost of the Fibre network is trivial at that point (assuming you have more than one server). I really like the latest HP blades because well you just get a ton of bandwidth options with them right out of the box, why stop with running everything over a measly single dual port 10Gbe NIC when you can have double the NICs, AND throw in a dual port Fibre adapter for not much more cash. Not only does this give more bandwidth, but more flexibility and traffic isolation as well(storage/network etc). On the blades at least it seems you can go even beyond that(more 10gig ports), I was reading in one of the spec sheets for the PCIe 10GbE cards that on the Proliant servers no more than two adapters are supported

NOTE: No more than two 10GbE I/O devices are supported in a single ProLiant server.

I suspect that NOTE may be out of date with the more recent Proliant systems that have been introduced, after all they are shipping a quad socket Intel Proliant blade with three dual port 10GbE devices on it from the get go. And I can’t help but think the beast DL980 has enough PCI busses to handle a handful of 10GbE ports. The 10GbE flexfabric cards list the BL685c G7 as supported as well, meaning you can get at least six ports on that blade as well. So who knows…..

Do the math, the added cost of a dedicated fibre channel network really is nothing. Now if you happen to go out and chose the most complicated to manage fibre channel infrastructure along with the most complicated fibre channel storage array(s) then all bets are off. But just because there are really complicated things out there doesn’t mean your forced to use them of course.

Another factor is staff I guess, if you have monkeys running your IT department maybe Fibre channel is not a good thing and you should stick to something like NFS, and you can secure your network by routing all of your VLANs through your firewall while your at it, because you know your firewall can keep up with your line rate gigabit switches right? riiight.

I’m not saying FCoE is dead, I think it’ll get here eventually, I’m not holding my breath for it though, it’s really more of a step back than a step forwards with present technology.

