TechOpsGuys.com Diggin' technology every day

May 11, 2012

More 10GbaseT Coming..

Filed under: Networking — Tags: , — Nate @ 2:38 pm

I wrote a couple of times about the return of 10GbaseT, a standard that tried to come out a few years ago but for various reasons didn’t quite make it. I just noticed that two new 10GbaseT switching products were officially announced a few days ago at Interop Las Vegas. They are, of course from Extreme, and they are, of course not shipping yet (and knowing Extreme’s recent history with product announcements it may be a while before they do actually ship – though they say for the 1U switch by end of year).

The new products are

  • 48 port 10Gbase-T module for the Black Diamond X-series – for up to 384 x 10GbaseT ports in a 14U chassis – note this is of course half the density you can achieve using the 40GbE modules and break out cables, there’s only so many plugs you can put in 14U!
  • Summit X670V-48t (I assume that’s what it’ll be called) – a 48-port 10GbaseT switch with 40GbE uplinks (similar to the Arista 7100 – the only 48-port 10GbaseT switch I’m personally aware of – just with faster uplinks and I’m sure there will be stacking support for those that like to stack)

From this article it’s claimed a list price of about $25k for the 1U switch which is a good price – about the same price as the existing 24-port X650 10GbaseT product. Also in line with the current generation X670V-48x which is a 48-port SFP+ product, so little to no premium for the copper which is nice to see! (note there is a lower cost X670 (non “V”) that does not have 40GbE ability available for about half the cost of the “V” model)

Much of the hype seems to be around the new Intel 10Gbase-T controller that is coming out with the latest CPUs from them.

With the Intel Ethernet Controller X540, Intel is delivering on its commitment to drive down the costs of 10GbE. We’ve ditched two-chip 10GBASE-T designs of the past in favor of integrating the media access controller (MAC) and physical layer (PHY) controller into a single chip. The result is a dual-port 10GBASE-T controller that’s not only cost-effective, but also energy-efficient and small enough to be included on mainstream server motherboards. Several server OEMs are already lined up to offer Intel Ethernet Controller X540-based LOM connections for their Intel Xeon processor E5-2600 product family-based servers.

With Broadcom also having recently announced (and shipping too perhaps?) their own next generation 10GbaseT chips, built for LOM (among other things), which apparently can push power utilization down to under 2W per port, using a 10 meter mode (perhaps?), 10m is plenty long enough for most connections of course! Given that Broadcom also has a quad port version of this chipset, could they be the ones powering the newest boxes from Oracle ?

Will Broadcom be able to keep their strong hold on the LOM market (really can’t remember the last time I came across Intel NICs on motherboards outside of maybe Supermicro or something)?

So the question remains – when will the rest of the network industry jump on board – after having been burned somewhat in the past by the first iterations of 10GbaseT.

April 19, 2012

A Terabit of application switching throughput

Filed under: Networking — Tags: — Nate @ 6:02 pm

That’s a pretty staggering number to me. I had some friends that worked at a company that is now defunct (acquired by F5) called Crescendo Networks.

One of their claims to fame was the ability to “cluster” their load balancers so that you could add more boxes on the fly and it would just go faster, instead of having to rip and replace, or add more boxes and do funky things with DNS load balancing in trying to balance between multiple groups of load balancers

Crescendo's Scale out design - too bad the company didn't last long enough to see anyone leverage a 24-month expansion

Another company, A10 Networks (who is still around, though I think Brocade and F5 are trying to make them go away), whom introduced similar technology about a year ago called virtual chassis (details are light on their site). There may be other companies that have similar things too – they all seem gunning for the F5 VIPRION, which is a monster system, they took a chassis approach and support up to 4 blades of processing power. Then they do load balancing of the blades themselves to distribute the load. I have a long history in F5 products and know them pretty well, going back to their 4.x code base which was housed in (among other things) generic 4U servers with BSDI and generic motherboards.

I believe Zeus does it as well, I have used Zeus but have not gone beyond a 2 node cluster. I forgot, Riverbed bought them and changed the name to Stingray. I think Zeus sounds cooler.

With Crescendo the way they implemented their cluster was quite basic, it was very similar to how other load balancing companies improved their throughput for streaming media applications – some form of direct response from the server to the client, instead of having the response go back through the load balancer a second time. Here is a page from a long time ago on some reasons why you may not want to do this. I’m not sure how A10 or Zeus do it.

I am a Citrix customer now, having heard some good things about them over the years, but never having tried the product. I found it curious why the likes of Amazon and Google gobble up Netscaler appliances like M&Ms when for everything else they seem to go out of their way to build things themselves. I know Facebook is a big user of the F5 VIPRION system as well.

You’d think (or at least I think) companies like this – if they could leverage some sort of open source product and augment it with their own developer resources they would – I’m sure they’ve tried – and maybe they are using such products in certain areas. My information about who is using what could be out of date. I’ve used haproxy(briefly), nginx(more) at least for load balancers and wasn’t happy with either product. Give me a real load balancer please! Zeus seems to be a pretty nice platform – and open enough that you can run it on regular server hardware, rather than being forced into buying fixed appliances.

Anyways, I had a ticket open with Citrix today about a particular TLS issue regarding SSL re-negotiation, after a co-worker brought it to my attention that our system was reported as vunerable by her browser / plugins. During my research I came across this excellent site which shows a ton of useful info about a particular SSL site.

I asked Citrix how I could resolve the issues the site was reporting and they said the only way to do it was to upgrade to the latest major release of code (10.x). I don’t plan to do that, resolving this particular issue doesn’t seem like a big deal (though would be nice – not worth the risk of using this latest code so early after it’s release for this one reason alone). Add to that our site is fronted by Akamai  (which actually posted poorer results on the SSL check than our own load balancers). We even had a “security scan” run against our servers for PCI compliance and it didn’t pick up anything related to SSL.

Anyways, back on topic. I was browsing through the release notes for the 10.x code branch and saw that Netscaler now supports clustering as well

You can now create a cluster of nCore NetScaler appliances and make them work together as a single system image. The traffic is distributed among the cluster nodes to provide high availability, high throughput, and scalability. A NetScaler cluster can include as few as 2 or as many as 32 NetScaler nCore hardware or virtual appliances.

With their top end load balancers tapping out at 50Gbps, that comes to 1.6Tbps with 32 appliances. Of course you won’t reach top throughput depending on your traffic patterns so taking off 600Gbps seems reasonable, still 1Tbps of throughput! I really can’t imagine what kind of service could use that sort of throughput at one physical site.

It seems, at least compared to the Crescendo model the Citrix model is a lot more like a traditional cluster, probably a lot more like a VIPRION design –

The NetScaler cluster uses Equal Cost Multiple Path (ECMP), Linksets (LS), or Cluster Link Aggregation Group (CLAG) traffic distribution mechanisms to determine the node that receives the traffic (the flow receiver) from the external connecting device. Each of these mechanisms uses a different algorithm to determine the flow receiver.

 

Citrix Netscaler Traffic Flow

The flow reminds me a lot of the 3PAR cluster design actually.

My Thoughts on Netscaler

My experience so far with the Netscalers is mixed, some things I really like such as an integrated mature SSL VPN (note I said mature! well at least for windows – nothing for Linux and their Mac client is buggy and incomplete), application aware MySQL and DNS load balancing, and a true 64-bit multithreaded, shared memory design. I also really like their capacity on demand offering as well. These boxes are always CPU bound, so to have the option to buy a technically lower end box with the same exact CPU setup as a higher end box (that is rated for 2x the throughput) is really nice. It means I can turn on more of those CPU heavy features without having to fork over the cash for a bigger box.

Citrix nCore

While for the most part, at least last I checked – F5 was still operating on 32-bit TMOS (on top of 64-bit Linux kernels) leveraging a multi process design instead of a multi threaded design. So they were forced to add some hacks to load balance across multiple CPUs in the same physical load balancer in order to get the system to scale more (and there has been limitations over the years as to what could actually be distributed over multiple cores and what features were locked to a single core — as time has gone on they have addressed most of those that I am aware of). One in particular I remember (which may be fixed now I’m not sure – would be curious to know if it was fixed how they fixed it) – was that each CPU core had it’s own local memory with no knowledge of other CPUs – which means when doing HTTP caching  – each CPU had to cache the content individually – massively duplicating the cache and slashing the effectiveness of the memory you had on the box. This was further compounded by the 32-bitness of TMM itself in it’s limited ability to address larger amounts of memory.

In any case the F5 design is somewhat arcane, they chose to bolt on software features early on instead of re-building the core. The strategy seems to have paid off though from a market share and profits standpoint, just from a technical standpoint it’s kinda lame 🙂

To be fair there are some features in the multi threaded Citrix Netscaler that are not available that are available in the older legacy code.

Things I don’t like about the Netscaler include their Java GUI which is slow as crap (they are working on a HTML 5 GUI – maybe that is in v10?), I mean it can literally take about 4 minutes to load up all of my server groups (Citrix term for F5 Pools). F5 I can load them in about half a second. I think the separation of services with regards to content switching on Citrix is, well pretty weird to say the least. If I want to do content filtering I have to have an internal virtual server and an external virtual server, the external one does the content filtering and forwards to the internal one. With F5 it was all in one (same for Zeus too). The terminology has been difficult to adjust to vs my F5 (and some Zeus) background.

I do miss the Priority Activation feature F5 has, there is no similar feature on Citrix as far as I know (well I think you can technically do it but the architecture of the Netscaler makes it a lot more complex). This feature allows you to have multiple pools of servers within a single pool at different priorities. By default the load balancer sends to the highest (or lowest? I forgot, it’s been almost 2 years) group of servers, if that group fails then it goes to the next, and the next. I think you can even specify the minimum number of nodes to have in a group before it fails over entirely to the next group.

Not being able to re-use objects with the default scripting language just seems brain dead to me, so I am using the legacy scripting language.

So I do still miss F5 for some things, Zeus for some other things, though Netscaler is pretty neat in it’s own respects. F5 obviously has a strong presence where I spent the last decade of my life in and around Seattle, being that it was founded and has it’s HQ in Seattle. Still have a buncha friends over there. Some pretty amazing stories I’ve heard come out of that place, they grew so fast, it’s hard to believe they are still in one piece after all they’ve been through, what a mess!

If you want to futz around with a Netscaler you have the option of downloading their virtual appliance (VPX) for free – I believe it has a default throughput limit of 1Mbit. Upgrades to as high as 3Gbps. Though the VPX is limited to two CPU cores last I recall. F5 and A10 have virtual appliances as well.

Crescendo did not have a virtual appliance, which is one of the reasons I wasn’t particularly interested in perusing their offering back when they were around. The inside story of the collapse of Crescendo is the stuff geek movies are made out of. I won’t talk about it here but it was just amazing to hear what happened.

The load balancing market is pretty interesting to see the different regions and where various players are stronger vs weaker. Radware for example is apparently strong over on the east coast but much less presence in the west. Citrix did a terrible job marketing the Netscaler for many years (a point they acknowledged to me), then there are those folks out there that still use Cisco (?!) which just confuses me.  Then there are the smaller guys like A10, Zeus, Brocade – Foundry networks (acquired by Brocade, of course) really did themselves a disservice when they let their load balancing technology sit for a good five years between hardware refreshes, they haven’t been able to recover from that from what I’ve seen/heard. They tried to pitch me their latest iron a couple of years ago after it came out – only to find out that it didn’t support SSL at the time – I mean come on — of course they later fixed that lack of a feature but it was too late for my projects.

And in case you didn’t know – Extreme used to have a load balancer WAY BACK WHEN. I never used it. I forget what it’s called off the top of my head. Extreme also partnered with F5 in the early days and integrated F5 code into their network chipsets so their switches could do load balancing too (the last switch that had this was released almost a decade ago – nothing since). Though the code in the chipsets was very minimal and not useful for anything serious.

April 10, 2012

Oracle first to release 10GbaseT as standard ?

Filed under: Networking — Tags: , , — Nate @ 2:21 pm

Sun has had some innovative x86-64 designs in the past, particularly on the AMD front. Of course Oracle dumped AMD a while back, and focus on Intel, despite that their market share continues to collapse (in good part probably because they screwed  over many of their partners from what I recall by going direct with so many customers, among other things).

In any case they launched a new server line up today, which otherwise is not really news since who uses Sun/Oracle x86-64 boxes anyways? But I thought the news was interesting since it seems to include 4 x 10GbaseT ports on board as standard.

Rear of Sun Fire X4170 M3 Server

The Sun Fire X4170 M3 and the X4270 M3 systems both appear to have quad port 10GbaseT on the motherboard. I haven’t heard of any other severs yet that have this as standard. Out of curioisity if you know of others I’d be interested to hear who they are.

The data sheet is kind of confusing, saying it has 4 onboard 10GbE ports but then it says Four 100/1,000/10 Base-T Ethernet ports in the network section below. Of course it was frequent to have 10/100/1000 BaseT before, so after seeing the physical rear of the system it seems convincing that they are using 10GbaseT.

Nice goin’ Oracle.

 

March 19, 2012

10GbaseT making a comeback ?

Filed under: Networking — Tags: — Nate @ 12:20 pm

Say it’s true… I’ve been a fan of 10GbaseT for a while now. Though it hasn’t really caught on in the industry, off the top of my head I can only think of Arista and Extreme who have embraced the standard from a switching perspective, with everyone else going with SFP+, or XFP or something else. Both Arista and Extreme obviously have SFP+ products as well, maybe XFP too, though I haven’t looked into why someone would use XFP over SFP+ or vise versa.

From what I know, the biggest thing that has held back adoption of 10GbaseT has been power usage. Also I think other industry organizations had given up waiting for 10GbaseT to materialize. Also cost was somewhat of a factor too, I recall at least with Extreme their 24-port 10GbaseT switch was about $10k more than their SFP+ switch (without any SFP+ adapters or cables), so it was priced similarly to an optical switch that was fairly fully populated with modules, making entry level pricing if you only needed say 10 ports initially quite a bit higher.

But I have read two different things (and heard a third)  recently, which I’m sure are related which hopefully points to a turning point in 10GbaseT adoption.

The first was a banner on Arista’s website.

The second  is this blog post talking about a new 10GbaseT chip from Intel.

Then the third thing I probably can’t talk about, so I won’t 🙂

I would love to have 10GbaseT over the passive copper cabling that most folks use now, that stuff is a pain to work with. While there are at least two switching companies that have 10GbaseT (I recall a Dell blade switch that had 10GbaseT support too), the number of NICs out there that support it is just about as sparse.

Not only that but I do like to color code my cables, and while CAT6 cables are obviously easy to get in many colors, it’s less common and harder to get those passive 10GbE cables in multiple colors, seems most everyone just has black.

Also, cable lengths are quite a bit more precise with CAT6 than with passive copper. For example from Extreme at least (I know I could go 3rd party if I wanted), their short cables are 1 meter and 3 meters. There’s a massive amount of distance between those two. CAT6 can be easily customized to any length and pre-made cables(I don’t make my own), can be fairly easily found to be in 1 foot (or even half a foot) increments.

SFP Passive copper 10GbE cable

I wonder if there are (or will there be) 10GbaseT SFP+ GBICs (so existing switches could support 10GbaseT without wholesale replacement) ? I know there are 1GbE SFP+ GBICs.

 

February 16, 2012

30 billion packets per second

Filed under: Networking — Tags: — Nate @ 11:28 pm

Still really busy getting ready to move out of the cloud. Ran into a networking issue today noticed my 10GbE NICs are going up and down pretty often. Contacted HP and apparently these Qlogic 10GbE NICs have overheating issues and the thing that has fixed it for other customers is to set the bios to increased cooling, basically ramp up those RPMs (as well as a firmware update, which is already applied) Our systems may have more issues given that each server has two of these NICs right on top of each other. So maybe we need to split them apart, waiting to see what support says. There haven’t been any noticeable issues with the link failures since everything is redundant across both NICs (both NIC ports on the same card seem to go down at about the same time).

Anyways, you know I’ve been talking about it a lot here, and it’s finally arrived (well a few days ago), the Black Diamond X-Series (self proclaimed world’s largest cloud switch). Officially announced almost a year ago, their marketing folks certainly liked to grab the headlines early. They did the same thing with their first 24-port 10GbE stackable switch.

The numbers on this thing are just staggering, I mean 30 billion packets per second, a choice of either switching fabric based on 2.5Tbps (meaning up to 10Tbps in the switch), or 5.1Tbps (with up to 20Tbps in the switch). They are offering both 48x10GbE line cards as well 12 and 24x40GbE line cards (all line rate, duh). You already know it has up to 192x40GbE or 768x10GbE(via breakout cables) in a 14U footprint – half the size of the competition which was already claiming to be a fraction of the size of their other competition.

5Tbps Switching fabric module (max 4 per switch)

 

Power rated at 5.5 watts per 10GbE port or 22W per 40GbE port.

They are still pushing their Direct Attach approach, support for which still hasn’t exactly caught on in the industry. I didn’t really want Direct Attach but I can see the value in it. They apparently have a solution for KVM ready to go, though their partner in crime that supported VMware abandoned the 10GbE market a while ago (it required special NIC hardware/drivers for VMware). I’m not aware of anyone that has replaced that functionality from another manufacturer.

I found it kind of curious they rev’d their Operating system 3 major versions for this product. (from version 12 to 15). They did something similar when they re-wrote their OS about a decade ago (jumped from version 7 to 10).

Anyways not much else to write about, I do plan to write more on the cloud soon just been so busy recently – this is just a quick post / random thought.

So the clock starts – how long until someone else comes out with a midplane-less chassis design ?

December 13, 2011

Extreme Grey

Filed under: Networking — Nate @ 10:51 am

I was on Netgear‘s site earlier this morning planning on filing a support request for one of my home switches, when I managed to resolve the problem myself, at least for the moment, I’m half expecting the problem to return. Over the past 12-13 years or so I’ve never had even one issue with the small (8 ports or less) metal-enclosed Netgear switches so have stuck to them they have worked very well for me (one company I was at bought a single netgear 48-port gig switch which didn’t work very well by contrast).

I remember reading a while ago how Netgear teamed up with Extreme to re-sell their Black Diamond series of switches.

I didn’t think too much of it till I was on Netgear’s site today so I decided to try to poke around and see if I could find the product(s) that were being resold or OEM’d, and I found them, here is one.

The Netgear 8800

When I saw that it just looked so strange! It’s HP-grey in color, not the usual purple I’m used to seeing. Speaking of HP and purple someone at HP recently speculated to me that the 3PAR arrays will likely stick to being yellow instead of HP-grey because it makes them stand out in the data center.

Tangent comin’ hold onto your butts…

While troubleshooting my home network this morning I think I let some of the smoke out of my HP workstation. Which reminds me of this quote I came across on slashdot years ago

 There is no such thing as a "safe" capacitor! They are filled with SMOKE and that smoke is DEADLY. ALWAYS let the smoke out of the capacitors before attempting to handle them! This should only be done by PROFESSIONALS. Do NOT try this at home.

 Always assume a CAPACITOR is holding a charge. And: Capacitors don't kill people, it's the circuit of which the person is a part that is dangerous...

I thought the networking issue may of been somehow caused by the HP box, so I rebooted it, while it was in the midst of rebooting(middle of POST before the screen came up), I powered it off(by holding down the power button), to reset the network chipset entirely. When I did that I heard a weird clicking sound coming from either the HP box(I think so) or my Cyberpower UPS which was right next to it. Within about 10 seconds I swear a little puff of smoke came out of the HP box(I think), there’s a remote chance it was just dust but I don’t think so. I unplugged the HP box and the clicking stopped. Then I plugged it back in about 30 seconds later, which caused it to turn on automatically, it booted like a champ, no errors, the UPS event log reported nothing. So I don’t know what inside the HP box released the smoke but I guess it was not vital?

Back on topic..

Anyways I poked around in the user manuals and they did a pretty good job of replacing all references of the original product and making it look like a Netgear product through and through (with a couple minor exceptions in diagrams).

I remember about 11 years ago now when I was shopping for a Summit 48 on Ebay for my company(this product wasn’t known for quality at the time though I didn’t know it at the time), I came across some Compaq OEM’d Summit 48s that I think were white in color.

If I was building a bigger network I really would be tempted to opt for this Netgear product if for nothing else to see the expression on people’s faces when I tell them I’m using Netgear, not a brand that comes to most people’s minds when it comes to data center networks! Speaking of data center it looks like Extreme’s 40GbE offerings are leading the market pretty good, I’m so proud of them! Hopefully they can sustain the execution and gain market share. They’ve had some missteps in the past which has knocked them back a few notches(at the time), but they certainly have another opportunity here.

I remember when HP used to OEM/re-sell Foundry Networks chassis switches though I seem to recall HP not making any modifications to the chassis itself(at least according to the pictures on the website, I don’t think it even had an HP logo on the thing). The product at the time was the MG8 which I was entertaining for a data center build out back in 2004/2005. I wasn’t going to buy from HP but was just one of those days that I was poking around and came across it on HP’s site.

Oh and in case your wondering my home network used to be powered by Extreme, I had a trusty Summit 48 for many years, which I eventually upgraded to a Summit 48si (which I still have now). I stopped using it many years ago because I just didn’t have enough ports at home to justify the power usage or more importantly the noise, 1U data center switches are so noisy for home use! I went so far as to replace all of the fans in the 48si (I believe I used Sunon Maglev fans) with quieter ones which reduced the noise by at least half but it was still really loud.

The patented MagLev design is based on magnetic principles and forces that not only propel the fan but also ensure stable rotation over its entire 360 degrees of movement. Utilizing the attraction of the magnetic levitation force, MagLev eliminates the wobbling and shaking problems of traditional motor fans. With this new technology, the MagLev fan propeller is suspended in air during rotation so that the shaft and bearing do not come into direct contact with each other to create friction.

(I dig technology even when it comes to fans!)

The Summit 48 by contrast was 2U and had 80mm fans which spun slower and were quieter. At the moment I have 9 devices wired into my netgear-powered home switching network (one 8-port switch and one 5-port). I used to have a couple Foundry load balancers, and a Cisco switch and a couple other things I think but I recycled them with my Summit 48 years ago, was too lazy to try to re-sell them.

I just saw that picture and was just fascinated by it. It also gives me another opportunity to add more color onto this blog.

November 28, 2011

Info-tech report on data center switching solutions

Filed under: Networking — Tags: — Nate @ 10:28 pm

I came across this report on Extreme’s site which seems to be from somewhat of an “independent 3rd party”, but I’ve not heard of them so I can’t vouch for them.

I’d like to consider myself at least somewhat up to date on what is out there so when things like this come out I do find it interesting to see what they say.

The thing that stands out to me the most: Arista Networks has only 75 employees ?!? Wow, they’ve been able to do all of that work with only 75 employees? Really? Good job.. that is very surprising to me, I mean most of the companies I have worked at have had more than 75 employees and they’ve accomplished (in my opinion) a fraction of what Arista seems to have, at least from a technology stand point (revenue wise is a different story assuming again the report is accurate).

The thing that made me scratch my head the most: Cisco allows you to run virtual machines on their top of rack switches? Huh? Sounds like EMC and them wanting you to run VMs on their VMAX controllers? I recall at one point Citrix and Arista teamed up to allow some sort of VM to run Netscaler embedded in the Arista switches, though never heard of anyone using it and never heard Citrix promoting it over their own stuff. Seemed like an interesting concept, though no real advantage to doing it I don’t think (main advantage I can think of is non blocking access to the switch fabric which really isn’t a factor with lower end load balancers since they are CPU bound not network bound).

The report seems to take a hypothetical situation where a fairly large organization is upgrading their global network and then went to each of the vendors and asked for a proposal. They leave out what each of the solutions was, specifically which is dissapointing.

They said HP was good because it was cheap, which is pretty much what I’ve heard in the field, it seems nobody that is serious runs HP Procurve.

They reported that Juniper and Brocade were the most “affordable” (having Juniper and affordable together makes no sense), and Arista and Force10 being least affordable (which seems backwards too – they are not clear on what they used to judge costs, because I can’t imagine a Force10 solution costing more than a Juniper one).

They placed some value on line cards that offered both copper and fiber at the same time, which again doesn’t make a lot of sense to me since you can get copper modules to plug into SFP/SFP+ slots fairly easily. The ability to “Run VMs on your switch” also seemed iffy at best, they say you can run “WAN optimization” VMs on the switches, which for a report titled “Data center networking” really should be a non issue as far as features go.

The report predicts Brocade will suffer quite a bit since Dell now has Force10. How Brocade doesn’t have as competitive products as they otherwise could have.

They tout Juniper’s ability to have multiple power supplies, switch fabrics, routing modules as if it was unique to Juniper, which makes no sense to me either. They do call out Juniper for saying their 640-port 10GbE switch is line rate only to 128 ports.

They believe Force10 will be forced into developing lower end solutions to fill out Dell’s portfolio rather than staying competitive on the high end, time will tell.

Avaya? Why bother? They say you should consider them if you’ve previously used Nortel stuff.

They did include their sample scenario that they sent to the vendors and asked for solutions for. I really would of liked to have seen the proposals that came back.

A four-site organization with 7850 employees located at a Canadian head office facility, and three branch offices located in the US, Europe, and Canada. The IT department consists of 100 FTE, and are located primarily at the Canadian head office, with a small proportion of IT staff and systems located at the branch offices.

The organization is looking at completing a data center refurbish/refresh:

The organization has 1000 servers, 50% of which are virtualized (500 physical). The data center currently contains 40 racks with end-of-row switches. None of the switching/routing layers have any redundancy/high availability built in, leaving many potential single points of failure in the network (looking for 30% H/A).

A requirements gathering process has yielded the need for:

  • A redundant core network, with capacity for 120 x 10Gbps SFP+ ports
  • Redundant top of rack switches, with capacity for 48 x 1Gbps ports in each rack
  • 1 ready spare ToR switch and 1 ready spare 10Gps chassis card
  • 8x5xNBD support & maintenance
  • Nameplate data – power consumption – watts/amps
  • 30% of the servers to be highly available

It is unclear how redundant they expect the network to be, would a single chassis with redundant fabrics and power supplies be enough or would you want two? They are also not clear as to what capabilities their ToR switches need other than the implied 10Gbps uplinks.

If I were building this network with Extreme gear I would start out with two pairs of stacked X670Vs at the core(each stack having 96x10GbE), each stack would be connected by 2x40GbE connections with passive copper cabling. The two stacks would be linked together with 4x40GbE connections with passive copper cabling, running (of course) ESRP as the fault tolerance protocol of choice between the two. This would provide 192x10GbE ports between the two stacks, with half being active half being passive.

Another, simpler approach would be to just stack three of the X670V switches together for 168x10GbE active-active ports. Though you know I’m not a big fan of stacking(any more than I am running off a single chassis), if I am connecting 1000 servers I want a higher degree of fault tolerance.

Now if you really could not tolerate a active/passive network, if you really needed that extra throughput then you can use M-LAG to go active-active at layer 2, but I wouldn’t do that myself unless you were really sure you needed that ability. I prefer the reduced complexity with active/passive.

As for the edge switches, they call for redundant 48 port 1GbE switches. Again they are not clear as to their real requirements, but what I would do (what I’ve done in the past) is two stand alone 48-port 1GbE switches, each with 2x10GbE (Summit X460) or 4x10GbE(Summit X480) connections to the core. These edge switches would NOT be stacked, they would be stand alone devices. You could go lower cost with Summit X450e, or even Summit X350 though I would not go with the X350 for this sort of data center configuration. Again I assume your using these switches in an active-passive way for the most part(as in 1 server is using 1 switch at any given time), though if you needed a single server to utilize both switches then you could go the stacking approach at the edge, all depends on what your own needs are – which is why I would of liked to have seen more detail in the report. Or you could do M-LAG at the edge as well, but ESRP’s ability to eliminate loops is hindered if you link the edge switches together since there is a path in the network that ESRP cannot deal with directly (see this post with more in depth info on the how, what, and why for ESRP).

I would NOT go with a Black Diamond solution (or any chassis-based approach) unless cost was really not an issue at all. Despite this example organization having 1,000 servers it’s still a small network they propose building, and the above approach would scale seamlessly to say 4 times that number non disruptively providing sub second layer 2 and layer 3 fail over. It is also seamlessly upgradeable to a chassis approach with zero downtime (well sub second), should the organization needs grow beyond 4,000 hosts. The number of organizations in the world that have more than 4,000 hosts I think is pretty small in the grand scheme of things. If I had to take a stab at a guess I would say easily less than 10% maybe less than 5%.

So in all, an interesting report, not very consistent in their analysis, lacked some more detail that would of been nice to see, but still interesting to see someone else’s thought patterns.

November 15, 2011

Dell’s distributed core

Filed under: Networking — Tags: — Nate @ 9:59 am

Dell’s Force10 unit must be feeling the heat from the competition. I came across this report which the industry body Tolly did on behalf of Dell/Force10.

Normally I think Tolly reports are halfway decent although they are usually heavily biased towards the sponsor (not surprisingly). This one though felt light on details. It felt like they rushed this to market.

Basically what Force10 is talking about is a distributed core architecture with their 32-port 40GbE Z9000 switches as what they call the spine(though sometimes they are used as the leaf), and their 48-port 10 GbE S4810 switches as what they call the leaf (though sometimes they are used as the spine).

They present 3 design options:

Force10 Distributed Core Design

I find three things interesting about these options they propose:

  • The minimum node count for spine is 4 nodes
  • They don’t propose an entirely non blocking fabric until you get to “large”
  • The “large” design is composed entirely of Z9000s, yet they keep the same spine/leaf configuration, whats keeping them from being entirely spine?

The distributed design is very interesting, though it would be a conceptual hurdle I’d have a hard time getting over if I was in the market for this sort of setup. It’s nothing against Force10 specifically I just feel safer with a less complex design (I mentioned before I’m not a fan of stacking for this same reason), less things talking to each other in such a tightly integrated fashion.

That aside though a couple other issues I have with the report is while they do provide the configuration of the switches (that IOS-like interface makes me want to stab my eyes with an ice pick) – I’m by no means familiar with Force10 configuration and they don’t talk about how the devices are managed. Are the spine switches all stacked together? Are the spine and leaf switches stacked together? Are they using something along the lines of Brocade’s VCS technology? Are the devices managed independently and they are relying on other protocols like MLAG? The web site mentions using TRILL at layer 2, which would be similar to Brocade.

The other issue I have with the report is the lack of power information, specifically would be interested (slightly, in the grand scheme of things I really don’t think this matters all that much) in the power per usable port (ports that aren’t being used for up links or cross connects). They do rightly point out that power usage can vary depending on the workload and so it would be nice to get power usage based on the same workload. Though conversely it may not matters as much, looking at the specs for the Extreme X670V (48x10GbE + 4x40GbE) says there is only 8 watts of difference between (that particular switch) 30% traffic load and 100% traffic load, seems like a trivial amount.

Extreme Networks X670V Power Usage

As far as I know the Force10 S4810 switch uses the same Broadcom chipset as the X670V.

On their web site they have a nifty little calculator where you input your switch fabric capacity and it spits out power/space/unit numbers. The numbers there don’t sound as impressive:

  • 10Tbps fabric = 9.6kW / 12 systems / 24RU
  • 15Tbps fabric = 14.4kW / 18 systems / 36RU
  • 20Tbps fabric = 19.2kW / 24 systems / 48RU

The aforementioned many times Black Diamond X-Series comes in at somewhere around 4kW (well if you want to be really conservative you could say 6.1kW assuming 8.1W/port which their report was likely high considering system configuration) and a single system to get up to 20Tbps of fabric(you could perhaps technically say it is has 15Tbps of fabric since the last 5Tbps is there for redundancy, 192 x 80Gbps = 1.5Tbps). 14.5RU worth of rack space too.

Dell claims non-blocking scalability up to 160Tbps, which is certainly a lot! Though I’m not sure what it would take for me to make the leap into a distributed system such as TRILL. Given TRILL is a layer 2 only protocol (which I complained about a while ago), I wonder how they handle layer 3 traffic, is it distributed in a similar manor? What is the performance at layer 3? Honestly I haven’t read much on TRILL at this point (mainly because it hasn’t really interested me yet), but one thing that is not clear to me(maybe someone can clarify), is is TRILL just a traffic management protocol or does it also include more transparent system management(e.g. manage multiple devices as one), or does that system management part require more secret sauce by the manufacturer.

My own, biased(of course), thoughts on this architecture, while innovative:

  • Uses a lot of power / consumes a lot of space
  • Lots of devices to manage
  • Lots of connections – complicated physical network
  • Worries over resiliency of TRILL (or any tightly integrated distributed design – getting this stuff right is not easy)
  • On paper at least seems to be very scalable
  • The Z9000 32-port 40GbE switch certainly seems to be a nice product from a pure hardware/throughput/formfactor perspective. I just came across Arista’s new 1U 40GbE switch and I think I’d prefer the Force10 design with twice the size and twice the ports purely for more line rate ports in the unit.

It would be interesting to read a bit more in depth about this architecture.

I wonder if this is going to be Force10s approach going forward, the distributed design, or if they are going to continue to offer more traditional chassis products for customers who prefer that type of setup. In theory it should be pretty easy to do both.

November 10, 2011

World’s fastest switch

Filed under: Networking — Tags: — Nate @ 8:42 pm

I came across this yesterday which is both a video, and more importantly an in-depth report on the about-to-be-released Black Diamond X-series switch. I have written a few times on this topic, I don’t have much that is really new, but then I ran across this PDF which has something I have been looking for – a better diagram on how this new next generation fabric is hooked up.

Up until now, most (all?) chassis switches relied on backplanes, or more modern systems used mid planes to transmit their electrical signals between the modules in the chassis.

Something I learned a couple of years ago (I’m not an electrical engineer) is that there are physical limits as to how fast you can push those electrons over those back and mid planes. There are serious distance limitations which makes the engineering ever more complicated the faster you push the system. Here I was thinking just crank up the clock speeds and make it go faster, but apparently it doesn’t work quite that way 🙂

For the longest time all of Extreme’s products were backplane based. Then they released a mid plane based product the Black Diamond 20808 a couple of years ago. This product was discontinued earlier this year when the X-series was announced. The 20808 had (in my simple mind) a similar design to what Force10 had been doing for many years – which is basically N+1 switch fabric modules (I believe the 20808 could go to something like 5 fabric modules), all of their previous switches had what they called MSMs, or Management Switch Modules. These were combination switch fabric and management modules., with a maximum of two per system, each providing half of the switch’s fabric capacity. Some other manufacturers like Cisco separated out their switch fabric from their management module. Having separate fabric modules really doesn’t buy you much when you only have two modules in the system. But if your architecture can go to many more (I seem to recall Force10 at one point having something like 8), then of course you can get faster performance. Another key point in the design is having separate slots for your switch fabric modules so they don’t consume space that would otherwise be used by ethernet ports.

Anyways, on the Black Diamond 20808 they did something else they had never done before, they put modules on both the front, AND on the back of the chassis. On top of that the modules were criss-crossed. The modules on the front were vertical, the modules on the back were horizontal. This is purely guessing here but I speculate the reason for that is, in part, to cut the distance needed to travel between the fabric and the switch ports. HP’s c-Class Blade enclosure has a similar mid plane design with criss crossed components. Speaking of which, I wonder if the next generation 3PAR will leverage the same “non stop” midplane technology of the c-Class. The 5 Terabits of capacity on the c-Class is almost an order of magnitude more than what is even available on the 3PAR V800. Whether or not the storage system needs that much fabric is another question.

Black Diamond 20808 (rear)

The 20808 product seemed to be geared more towards service providers and not towards high density enterprise or data center computing(if I remember right the most you could get out of the box was 64x10GbE ports which you can now get in a 1U X670V).

Black Diamond 20808 (front)

Their (now very old) Black Diamond 8000 series (with the 8900 model which came out a couple of years ago being the latest incarnation) has been the enterprise workhorse for them for many years, with a plethora of different modules and switch fabric options. The Black Diamond 8900 is a backplane based product.  I remember when it came out too – it was just a couple months after I bought my Black Diamond 10808s, in the middle of 2005. Although if I remember right the Black Diamond 8800, as it was originally released, did not support the virtual router capability that the 10808 supported that I intended to base my network design on. Nor did it support the Clear Flow security rules engine. Support for these features was added years later.

You can see the impact distance has on the Black Diamond 8900 for example, with the smaller 6-slot chassis getting at least 48Gbps more switching capacity per line card than the 10-slot chassis simply because it is smaller. Remember this is a backplane designed probably seven years ago, so it doesn’t have as much fabric capacity as a modern mid plane based system.

Anyways, back on topic, the Black Diamond X-series. Extreme’s engineers obviously saw the physics (?) limits they were likely going to hit when building a next generation platform and decided to re-think how the system works, resulting, in my opinion a pretty revolutionary way of building a switch fabric (at least I’m not aware of anything else like it myself). While much of the rest of the world is working with mid planes for their latest generation of systems, here we have the Direct Orthogonal Data Path Mating System or DOD PMS (yeah, right).

Black Diamond X-Series fabric

What got me started down this path, was I was on the Data Center Knowledge web site, and just happened to see a Juniper Qfabric advertisement. I’ve heard some interesting things about Qfabric since it was announced, it sounds similar to the Brocade VCS technology. I was browsing through some of their data sheets and white papers and it came across as something that’s really complicated. It’s meant to be simple, and it probably is, but the way they explain it to me at least makes it sound really complicated. Anyways I went to look at their big 40GbE switch which is at the core of their Qfabric interconnect technology. It certainly looks like a respectable switch from a performance stand point – 128 40GbE ports, 10 Terabits of switching fabric, weighs in at over 600 pounds(I think Juniper packs their chassis products with lead weights to make them feel more robust).

So back to the report that they posted. The networking industry doesn’t have anything like the SPC-1 or SpecSFS standardized benchmarks to measure performance, and most people would have a really hard time generating enough traffic to tax these high end switches.There is standard equipment that does it, but it’s very expensive.

So, to a certain extent you have to trust the manufacturer as to the specifications of the product, a way many manufacturers try to prove their claims of performance or latency is to hire “independent” testers to run tests on the products and give reports. This is one of those reports.

Reading it made me smile, seeing how well the X-Series performed but in the grand scheme of things it didn’t surprise me given the design of the system and the fabric capacity it has built into it.

The BDX8 breaks all of our previous records in core switch testing from performance, latency, power consumption, port density and packaging design. The BDX8 is based upon the latest Broadcom merchant silicon chipset.

For the Fall Lippis/Ixia test, we populated the Extreme Networks BlackDiamond® X8 with 256 10GbE ports and 24 40GbE ports, thirty three percent of its capacity. This was the highest capacity switch tested during the entire series of Lippis/Ixia cloud network test at iSimCity to date.

We tested and measured the BDX8 in both cut through and store and forward modes in an effort to understand the difference these latency measurements offer. Further, latest merchant silicon forward packets in store and forward for smaller packets, while larger packets are forwarded in cut-through making this new generation of switches hybrid cut-through/store and forward devices.

Reading through the latency numbers, they looked impressive, but I really had nothing to compare them with, so I don’t know how good. Surely for any network I’ll ever be on it’d be way more than enough.

The BDX8 forwards packets ten to six times faster than other core switches we’ve tested.

[..]

The Extreme Networks BDX8 did not use HOL blocking which means that as the 10GbE and 40GbE ports on the BDX8 became congested, it did not impact the performance of other ports. There was no back pressure detected. The BDX8 did send flow control frames to the Ixia test gear signaling it to slow down the rate of incoming traffic flow.

Back pressure? What an interesting term for a network device.

The BDX8 delivered the fastest IP Multicast performance measured to date being able to forward IP Multicast packets between 3 and 13 times faster then previous core switch measures of similar 10GbE density.

The Extreme Networks BDX8 performed very well under cloud simulation conditions by delivering 100% aggregated throughput while processing a large combination of east-west and north-south traffic flows. Zero packet loss was observed as its latency stayed under 4.6 μs and 4.8 μs measured in cut through and store and forward modes respectively. This measurement also breaks all previous records as the BDX8 is between 2 and 10 times faster in forwarding cloud based protocols under load.

[..]

While these are the lowest Watts/10GbE port and highest TEER values observed for core switches, the Extreme Networks BDX8’s actual Watts/10GbE port is actually lower; we estimate approximately 5 Watts/10GbE port when fully populated with 768 10GbE or 192 40GbE ports. During the Lippis/Ixia test, the BDX8 was only populated to a third of its port capacity but equipped with power supplies, fans, management and switch fabric modules for full port density population. Therefore, when this full power capacity is divided across a fully populated BDX8, its WattsATIS per 10GbE Port will be lower than the measurement observed [which was 8.1W/port]

They also mention the cost of power, and the % of list price that cost is, so we can do some extrapolation. I suspect the list price of the product is not final, and I am assuming the prices they are naming are based on the configuration they are testing with rather than a fully loaded system(which as mentioned above the switch was configured with enough fabric and power for the entire chassis but only ~50% of the port capacity was installed).

Anyways, they say the price to power it over 3 years is $10,424.05 and say that is less than 1.7% of it’s list price. Extrapolating that a bit I can guesstimate that the list price of this system as tested with 352 10GbE ports is roughly $612,000, or about $1,741 per 10GbE port.

The Broadcom technology is available to the competition, the real question is how long will it take for the competition develop something that can compete with this 20 Terabit switching fabric, which seems to be about twice as fast as anything else currently on the market.

HP has been working on some next generation stuff, I read about their optical switching technology earlier this year that their labs are working on, sounds pretty cool.

[..] Charles thinks this is likely to be sometime in the next 3-to-5 years.

So, nothing on the immediate horizon on that front.

October 18, 2011

Cisco’s new 10GbE push – a little HP and Dell too

Filed under: Networking — Tags: , , , , — Nate @ 7:56 pm

Just got done reading this from our friends at The Register.

More than anything else this caught my eye:

On the surface it looks pretty impressive, I mean it would be interesting to see exactly how Cisco configured the competing products as in which 60 Juniper devices or 70 HP devices did they use and how were they connected?

One thing that would of been interesting to call out in such a configuration, is the number of logical devices needed for management. For example I know Brocade’s VDX product is some fancy way of connecting lots of devices sort of like more traditional stacking just at a larger scale for ease of management. I’m not sure whether or not the VDX technology extends to their chassis product as Cisco’s configuration above seems to imply using chassis switches. I believe Juniper’s Qfabric is similar. I’m not sure if HP or Arista have such technology(I don’t believe they do). I don’t think Cisco does – but they don’t claim to need it either with this big switch. So a big part of the question is managing so many devices, or just managing one. Cost of the hardware/software is one thing..

HP recently announced a revamp of their own 10GbE products, at least the 1U variety. I’ve been working off and on with HP people recently and there was a brief push to use HP networking equipment but they gave up pretty quick. They mentioned they were going to have “their version” of the 48-port 10-gig switch soon, but it turns out it’s still a ways away – early next year is when it’s supposed to ship, even if I wanted it  (which I don’t) – it’s too late for this project.

I dug into their fact sheet, which was really light on information to see what, if anything stood out with these products. I did not see anything that stood out in a positive manor, I did see this which I thought was kind of amusing –

Industry-leading HP Intelligent Resilient Framework (IRF) technology radically simplifies the architecture of server access networks and enables massive scalability—this provides up to 300% higher scalability as compared to other ToR products in the market.

Correct me if I’m wrong – but that looks like what other vendors would call Stacking, or Virtual Chassis. An age-old technology, but the key point here was the up to 300% higher scalability. Another way of putting it is at least 50% less scalable – when your comparing it to the Extreme Networks Summit X670V(which is shipping I just ordered some).

The Summit X670 series is available in two models: Summit X670V and Summit X670. Summit X670V provides high density for 10 Gigabit Ethernet switching in a small 1RU form factor. The switch supports up to 64 ports in one system and 448 ports in a stacked system using high-speed SummitStack-V160*, which provides 160 Gbps throughput and distributed forwarding. The Summit X670 model provides up to 48 ports in one system and up to 352 ports in a stacked system using SummitStack-V longer distance (up to 40 km with 10GBASE-ER SFP+) stacking technology.

In short, it’s twice as scalable as the HP IRF feature, because it goes up to 8 devices (56x10GbE each), and HP’s goes up to 4 devices (48x10GbE each — or perhaps they can do 56 too with breakout cables since both switches have the same number of physical 10GbE and 40GbE ports).

The list price on the HP switches is WAY high too, The Register calls it out at $38,000 for a 24-port switch. The X670 from Extreme has a list price of about $25,000 for 48-ports(I see it on-line for as low as about $17k). There was no disclosure of HP’s pricing for their 48-port switch.

Extreme has another 48-port switch which is cheaper (almost half the cost if I recall right – I see it on-line going for as low as $11,300) but it’s for very specialized applications where latency is really important. If I recall right they removed the PHY (?) from the switch which dramatically reduces functionality and introduces things like very short cable length limits but also slashes the latency (and cost). You wouldn’t want to use those for your VMware setup(well if you were really cost constrained these are probably better than some other alternatives especially if your considering this or 1GbE), but you may want them if your doing HPC or something with shared memory or high frequency stock trading (ugh!).

The X670 also has (or will have? I’ll find out soon) a motion sensor on the front of the switch which I thought was curious, but seems like a neat security feature, being able to tell if someone is standing in front of your switch screwing with it. It also apparently has the ability(or will have the ability) to turn off all of the LEDs on the switch when someone gets near it, and turn them back on when they go away.

(ok back on topic, Cisco!)

I looked at the Cisco slide above, and thought to myself, really, can they be that far ahead? I certainly do not go out on a routine basis and see how many devices and connectivity between them that I need to achieve  X number of line rate ports, I’ll keep it simple, if you need a large number of line rate ports just use a chassis product(you may need a few of them). It is interesting to see though, assuming it’s anywhere close to being accurate.

When I asked myself the question “Can they be that far ahead?” I wasn’t thinking of Cisco, I think I’m up to 7 readers now — you know me better than that! 🙂

I was thinking of the Extreme Networks Black Diamond X-Series which was announced (note not yet shipping…) a few months ago.

  • Cisco claims to do 768 x 10GbE ports in 25U (Extreme will do it in 14.5U)
  • Cisco claims to do 10W per 10GbE/port (Extreme will do it in 5W/port)
  • Cisco claims to do it with 1 device .. Well that’s hard to beat but Extreme can meet them, it’s hard to do it with less than one device.
  • Cisco’s new top end taps out at very respectable 550Gbit per slot (Extreme will do 1.2Tb)
  • Cisco claims to do it with a list price of $1200/port. I don’t know what Extreme’s pricing will be but typically Cisco is on the very high end for costs.

Though I don’t know how Cisco gets to 768 ports, Extreme does it via 40GbE ports and breakout cables (as far as I know), so in reality the X-series is a 40GbE switch (and I think 40GbE only – to start with unless you use the break out cables to get to 10GbE).  It was a little over a year ago that Extreme was planning on shipping 40GbE at a cost of $1,000/port. Certainly the X-series is a different class of product than what they were talking about a while ago, but prices have also come down since.

X-Series is shipping “real soon now’.  I’m sure if you ask them they’ll tell you more specifics.

It is interesting to me, and kind of sad how far Force10 has fallen in the 10GbE area, I mean they seemed to basically build themselves on the back of 10GbE(or at least tried to), but I look at their current products on the very high end, and short from the impressive little 40GbE switch they have, they seem to top out at 140 line rate 10GbE in 21U. Dell will probably do well with them, I’m sure it’ll be a welcome upgrade to those customers using Procurve, uh I mean Powerconnect? That’s what Dell call(ed) their switches right?

As much as it pains me I do have to give Dell some props for doing all of these acquisitions recently and beefing up their own technology base, whether it’s in storage, or networking they’ve come a long way (more so in storage, need more time to tell in networking). I have not liked Dell myself for quite some time, a good chunk of it is because they really had no innovation, but part of it goes back to the days before Dell shipped AMD chips and Dell was getting tons of kick backs from Intel for staying an Intel exclusive provider.

In the grand scheme of things such numbers don’t mean a whole lot, I mean how many networks in the world can actually push this kind of bandwidth? Outside of the labs I really think any organization would be very hard pressed to need such fabric capacity, but it’s there — and it’s not all that expensive.

I just dug up an old price list I had from Extreme – from late November 2005. An 6-port 10GbE module for their Black Diamond 10808 switch (I had two at the time) had a list price of $36,000. For you math buffs out there that comes to $9,000 per line rate port.

That particular product was oversubscribed (hence it not being $6,000/port) as well having a mere 40Gbps of switch fabric capacity per slot, or a total of 320Gbps for the entire switch (it was marketed as a 1.2Tb switch but hardware never came out to push the backplane to those levels – I had to dig into the depths of the documentation to find that little disclosure – naturally I found it after I purchased, didn’t matter for us though I’d be surprised if we pushed more than 5Gbps at any one point!). If I recall right the switch was 24U too. My switches were 1GbE only, cost reasons 🙂

How far we’ve come..

« Newer PostsOlder Posts »

Powered by WordPress