TechOpsGuys.com Diggin' technology every day

13Nov/12Off

100GbE: Still a very hefty premium

TechOps Guy: Nate

UPDATED

Big Switch Networks decloaked today, and released their new OpenFlow controller, in partnership with many different networking vendors.

Arista Networks, Dell, Brocade, Juniper Networks, Brocade Communications, and Extreme Networks have all partnered with Big Switch, and their OpenFlow-enabled switches are certified to be control-freaked by Big Network Controller. Switches from IBM and HP have been tested for interoperability, but there are no formal partnerships.

All of this SDN stuff really is sort of confusing to me (it really seems like the whole software defined thing is riding on a big hype cloud). One thing that stands out to me here is that this OpenFlow stuff seems to only cover switching and routing. I don't see any mention of things like firewalls, or more importantly - load balancers.  Maybe those folks will integrate with OpenFlow at some point in some way.

On this article A10 Networks (load balancing company) is mentioned as a partner, but running a search for either OpenFlow or BigSwitch on the A10 site reveals no results.

For me if I'm going to be moving workloads between datacenters, at least those that deal with internet connectivity, I certainly want that inbound connectivity to move to the new datacenter as well, and not incur the costs/latency of forwarding such traffic over a back end connection. The only exception being if there is a fault at the new datacenter which is severe enough to want to route internet traffic from another facility to it. I suppose at the same time the fault would likely have to block the ability of moving the workload to another (non faulty) facility.

F5 networks had a demo they put out on long distance vMotion almost three years ago. Using their WAN Optimization, their Global Traffic Managers(Global DNS), and Local Traffic managers(load balancers), it was a pretty cool setup. Of course this was ages before VMware had such a solution in house, and I believe this solution (for the niche that it serves) can cover a significantly longer distance than what you get with VMware today.

Anyway that's not the topic of the post. At the same time I noticed Extreme announced their first 100GbE offering (per usual it looks like it won't be available to ship for at least 6 months - they like to announce early for some strange reason). On their X-8 platform which has 1.2Tbps of throughput per line card, and up to 20Tbps (15Tbps non blocking even with a fabric failure) per chassis. I say "up to" because there are multiple fabric modules, and there are two different speeds(2.5Tbps and 5Tbps).

The card is a combo 4-port 100GbE card. They also announced a newer larger scale 12-port 40GbE line card. What struck me(still) was the cost distinction between the two:

NTE list pricing includes: 40GbE 12 port XL module at US $6,000.00 per port; 100GbE 4 port XL module at US $35,000 per port.

I think I recall hearing/reading last year that 100GbE was going for around $100,000/port, if so this would be a great discount, but still pretty crazy expensive compared to 40GbE obviously!

UPDATE - It seems my comment was lost in the spam, the lack of approval wasn't intentional.

While I'm here let me rag on Extreme a bit here - I posted a comment on one of their blog posts (about 3 weeks ago) where they said they moved away from designing their own ASICs with the X-8 platform.

They never approved the comment.

My comment was basically asking them when their last ASIC design was - to my knowledge their last ASIC was the 4GNSS ASIC (they called it a programmable ASIC - I assume that meant more of a FPGA but who knows), that originally showed up in the Black Diamond 10808 back in 2003(I had a pair of these boxes in 2005). I believe they re-used it, perhaps refined it a bit in the following years but don't believe any new ASICs were designed since (sure I could be wrong but they haven't clarified). So I'd say their last ASIC design was more than a decade ago, and only now this blogger comes out and says they don't do ASICs any more. Before that the last one I know of was their Inferno chipset, a much better name, which was present in their older platforms running on the original ExtremeWare operating system, the last such switches to be sold were in their Alpine series and the Summit 48si (I still have one of these at home but it doesn't do much today - too loud for home use).

Anyway, shame on you for not approving my reasonable response to your post!

btw I approve all posts here, even those that try to attack me/my posts. If for some reason your post is not immediately available, contact me (see blurb on right) because your post may of been caught by the SPAM filter. I don't go through those caught posts often(there are a lot), maybe 2-3 times a year.

2Oct/12Off

Cisco drops price on Nexus vSwitch to free

TechOps Guy: Nate

I saw news yesterday that Cisco dropped the price of their vSwitch to $free, they still have a premium version which has a few more features.

I'm really not all that interested in what Cisco does, but what got me thinking again is the lack of participation by other vendors in making a similar vSwitch, of integrating their stack down to the hypervisor itself.

Back in 2009, Arista Networks launched their own vSwitch (though now that I read more on it, it wasn't a "real" vSwitch),  but you wouldn't know that by looking at their site today, I tried a bunch of different search terms I thought they still had it, but it seems the product is dead and buried. I have not heard myself of any other manufacturers making a software vSwitch of any kind (for VMware at least). I suppose customer demand is not there.

I asked Extreme back then if they would come out with a software vSwitch, and at the time at least they said there was no plans, instead they were focusing on direct attach, a strategy at least for VMware, appears to be dead for the moment, as the manufacturer of the NICs used to make it happen is no longer making NICs(as of about 1-2 years ago). I don't know why they have the white paper on their site still, I guess to show the concept, since you can't build it today.

Direct attach - at least taken to it's logical conclusion is a method to force all inter-VM switching out of the host and into the physical switches layer. I was told that this is possible with Extreme(and possibly others too) with KVM today (I don't know the details), just not with VMware.

They do have a switch that runs in VMware, though it's not a vSwitch, more of a demo/type thing where you can play with commands. Their switching software has run on Intel CPUs since the initial release in 2003 (and they still have switches today that use Intel CPUs), so I imagine the work involved is not herculean to make a vSwitch happen if they wanted to.

I have seen other manufacturers (Brocade at least if I remember right) that were also looking forward to direct attach as the approach to take instead of a vSwitch. I can never remember the official networking name for the direct attach technology...

With VMware's $1.2B purchase of Nicira it seems they believe the future is not direct attach.

Myself I like the concept of switching within the host, though I have wanted to have an actual switching fabric (in hardware) to make it happen. Some day..

Off topic - but it seems the global economic cycle has now passed the peak and now for sure headed down hill? One of my friends said yesterday the economy is "complete garbage", I see tech company after company missing or warning, layoffs abound, whether it's massive layoffs at HP, or smaller layoffs at Juniper that was announced this morning. Meanwhile the stock market is hitting new highs quite often.

I still maintain we are in a great depression. Lots of economists try to dispute that, though if you take away the social safety nets that we did not have in the '20s and '30s during the last depression I am quite certain you'd see massive numbers of people lined up at soup kitchens and the like. I think the economists try to dispute it more because they fear a self fulfilling prophecy rather than their willingness to have a serious talk on the subject. Whether or not we can get out of the depression, I don't know. We need a catalyst - last time it was WWII, at least the last two major economic expansions were bubbles, it's been a long time since we've had a more normal economy. If we don't get a catalyst then I see stagnation for another few years, perhaps a decade while we drift downwards towards a more serious collapse (something that would make 2008 look trivial by comparison).

1Jun/12Off

London Internet Exchange downed by Loop

TechOps Guy: Nate

This probably doesn't happen very often at these big internet exchanges but found the news sort of interesting.

I had known  for a few years that the LINX was a dual vendor environment, one side was Foundry/Brocade the other was Extreme, they are one of the few places that go out of their way to advertise what they use. I'm sure it gets them a better discount :)   It seems the LINX replaced the Foundry/Brocade with Juniper at some point since I last checked(less than a year ago). Though their site still mentions usage of EAPS (Extreme's ring protocol) and MRP (Foundry's ring protocol). I assume Juniper has not adopted MRP, though they probably have something similar. Looking at the design of the Juniper LAN vs the Extreme LAN (and the Brocade LAN before Juniper), the Juniper one looks a lot more complicated.  I wonder if they are using Juniper's new protocol(s) to manage it? Qfabric I think it's called? It seems LINX still has some Brocade in one of their edge networks.

Apparently the Juniper side is what suffered the loop -

"Linx is trying to determine where the loop originated and we are also addressing why the protection on Juniper's LAN didn't work."

I wanted to point out again, since it's been a while since I covered it (and only then was it buried in the post, wasn't part of the title), that Extreme has a protocol (that as far as I know is unique - let me know if there is another vendor or protocol that is similar - note of course I am not referring to anything like STP) that can detect and recover(in some cases) loops automatically. I've only used it in detect mode to-date. I was also telling someone about this protocol who was learning the ropes on Extreme gear after coming from a Juniper background so thought I would mention it again.

The protocol is the Extreme Loop Recovery Protocol (ELRP). The documentation does a better job at explaining it than I can.

The Extreme Loop Recovery Protocol (ELRP) is used to detect network loops in a Layer 2 network. A switch running ELRP transmits multicast packets with a special MAC destination address out of some or all of the ports belonging to a VLAN. All of the other switches in the network treat this packet as a regular, multicast packet and flood it to all of the ports belonging to the VLAN.

When the packets transmitted by a switch are received back by that switch, this indicates a loop in the Layer 2 network. After a loop is detected through ELRP, different actions can be taken such as blocking certain ports to prevent loop or logging a message to system log. The action taken is largely dependent on the protocol using ELRP to detect loops in the network.

The design seems simple enough to me, I'm not sure why others haven't come up with something similar (or if they have let me know!)

It's rare to have a loop in a data center environment but I do remember a couple loops I came across in an office environment many years ago that ELRP helped trace down. I'm not sure what method one would use to trace down a loop without something like ELRP - perhaps just looking at port stats and trying to determine where the bulk of the traffic is and disabling ports or unplugging cables until it stops.

[Tangent]

I remember an outage one company I was at took one time to upgrade some of our older 10/100 3COM switches to gigabit Extreme switches. It was a rushed migration, I was working with the network engineer that we had, the switches were installed in a centralized location with tons of cables, none of which were labeled. So I guess it comes as little surprised while during the migration someone (probably me) happened to plug the same cable back into one of the switches causing a loop. It took a few minutes to track down, at one point our boss was saying get ready to roll back. The network engineer and I looked at each other and laughed there was no roll back, well not one that was going to be smooth it would of taken another hour of downtime to remove the Extreme switches and re-install the 3COM and re-cable stuff. Fortunately I found the loop. This was about a year or so before I was aware of the existence of ELRP. We discovered the loop mainly after all the switch lights started blinking in sequence, normally a bad thing. Then users reported they lost all connectivity.

One of my friends who is another network engineer told me a story when I was in Atlanta earlier in the year about a customer who was a university or something. They had major network performance problems but could not track them down. These problems had been going on for literally months. My friend went out as a consultant and they brought him into their server/network room and his jaw dropped, they had probably 2 dozen switches and ALL of them were blinking in sequence. He knew what the problem was right away and informed the customer. But the customer was adamant that the lights were supposed to blink that way and the problem was elsewhere(not kidding here). The customer had other issues like running overlapping networks on the same VLAN etc. My friend had a lot of suggestions for the customer but the customer felt insulted by him telling them their network had so many problems so they kicked him out and told the company not to send him back. A couple months later the customer went through some sort of audit process and failed miserably and grudgingly asked (begged) to get my friend back since he was the only one they knew that seemed to know what he was doing. He went back and fixed the network I assume (I forgot that last bit of the story).

[End Tangent]

ELRP can detect a loop immediately and give a very informative system log entry as to the port(s) the loop is occurring on so you can take action. It works best of course if it is running on all ports, so you can pinpoint down to the edge port itself. But if for some reason the edge is not an Extreme switch at least you can get it at a higher layer and can isolate it further that way.

You can either leave it running periodically every X seconds it will send a probe out, or you can run it on demand for a real time assessment. There is also integration with ESRP which I wrote about a while ago, although I don't use the integrated mode (see the original post as to how that works and why). I normally leave it running sending requests out at least say once every 30 seconds.

LINX had another outage (which was the last time I looked at their vendor stats) a couple of years ago (this one affected me since my company had gear hosted in London at the time and our monitors were tripped by this event), though no mention of which LAN the outage occurred on. One user wrote

It wasn't a port upgrade, a member's session was being turned up and due to miscommunication between the member's engineer and the LINX engineer a loop was somehow introduced in to the network which caused a broadcast storm and a switches CPU to max out cue packet loss and dropped BGP sessions.

As a cause to the outage that occurred two years ago. So I guess it was another loop! For all I know LINX is not running ELRP in their environment either.

It's not exactly advertised by Extreme if you talk to them, it's one of those things that's buried in the docs. Same goes for ESRP. Two really useful protocols that Extreme almost never mentions, two things that make them stand out in the industry and they don't talk about them. I'm told that one reason could be is they are proprietary(vs EAPS which is not and Extreme touts EAPS a lot but EAPS is layer 2 only!), though as I have mentioned in the past ESRP doesn't require any software at the edge to function and can support managed and unmanaged devices. So you don't require an Extreme-only network to run (just at the core, like most any other protocol). ELRP is even less stringent - can be run on any Extreme switch, no interoperability issues. If there were open variants of the protocols that'd be better of course, but again, these seem to be unique in the industry so tout what you got! Customers don't have to use them if they don't want to and it can make a network administrator's life vastly simpler in many cases by leveraging what you have available to you. Good luck integrating Extreme or Cisco or Brocade into Juniper's Qfabric ? Or into Force10's distributed core setup ? There are interoperability issues abound with most of the systems out there.

11May/12Off

More 10GbaseT Coming..

TechOps Guy: Nate

I wrote a couple of times about the return of 10GbaseT, a standard that tried to come out a few years ago but for various reasons didn't quite make it. I just noticed that two new 10GbaseT switching products were officially announced a few days ago at Interop Las Vegas. They are, of course from Extreme, and they are, of course not shipping yet (and knowing Extreme's recent history with product announcements it may be a while before they do actually ship - though they say for the 1U switch by end of year).

The new products are

  • 48 port 10Gbase-T module for the Black Diamond X-series - for up to 384 x 10GbaseT ports in a 14U chassis - note this is of course half the density you can achieve using the 40GbE modules and break out cables, there's only so many plugs you can put in 14U!
  • Summit X670V-48t (I assume that's what it'll be called) - a 48-port 10GbaseT switch with 40GbE uplinks (similar to the Arista 7100 - the only 48-port 10GbaseT switch I'm personally aware of - just with faster uplinks and I'm sure there will be stacking support for those that like to stack)

From this article it's claimed a list price of about $25k for the 1U switch which is a good price - about the same price as the existing 24-port X650 10GbaseT product. Also in line with the current generation X670V-48x which is a 48-port SFP+ product, so little to no premium for the copper which is nice to see! (note there is a lower cost X670 (non "V") that does not have 40GbE ability available for about half the cost of the "V" model)

Much of the hype seems to be around the new Intel 10Gbase-T controller that is coming out with the latest CPUs from them.

With the Intel Ethernet Controller X540, Intel is delivering on its commitment to drive down the costs of 10GbE. We’ve ditched two-chip 10GBASE-T designs of the past in favor of integrating the media access controller (MAC) and physical layer (PHY) controller into a single chip. The result is a dual-port 10GBASE-T controller that’s not only cost-effective, but also energy-efficient and small enough to be included on mainstream server motherboards. Several server OEMs are already lined up to offer Intel Ethernet Controller X540-based LOM connections for their Intel Xeon processor E5-2600 product family-based servers.

With Broadcom also having recently announced (and shipping too perhaps?) their own next generation 10GbaseT chips, built for LOM (among other things), which apparently can push power utilization down to under 2W per port, using a 10 meter mode (perhaps?), 10m is plenty long enough for most connections of course! Given that Broadcom also has a quad port version of this chipset, could they be the ones powering the newest boxes from Oracle ?

Will Broadcom be able to keep their strong hold on the LOM market (really can't remember the last time I came across Intel NICs on motherboards outside of maybe Supermicro or something)?

So the question remains - when will the rest of the network industry jump on board - after having been burned somewhat in the past by the first iterations of 10GbaseT.

16Feb/12Off

30 billion packets per second

TechOps Guy: Nate

Still really busy getting ready to move out of the cloud. Ran into a networking issue today noticed my 10GbE NICs are going up and down pretty often. Contacted HP and apparently these Qlogic 10GbE NICs have overheating issues and the thing that has fixed it for other customers is to set the bios to increased cooling, basically ramp up those RPMs (as well as a firmware update, which is already applied) Our systems may have more issues given that each server has two of these NICs right on top of each other. So maybe we need to split them apart, waiting to see what support says. There haven't been any noticeable issues with the link failures since everything is redundant across both NICs (both NIC ports on the same card seem to go down at about the same time).

Anyways, you know I've been talking about it a lot here, and it's finally arrived (well a few days ago), the Black Diamond X-Series (self proclaimed world's largest cloud switch). Officially announced almost a year ago, their marketing folks certainly liked to grab the headlines early. They did the same thing with their first 24-port 10GbE stackable switch.

The numbers on this thing are just staggering, I mean 30 billion packets per second, a choice of either switching fabric based on 2.5Tbps (meaning up to 10Tbps in the switch), or 5.1Tbps (with up to 20Tbps in the switch). They are offering both 48x10GbE line cards as well 12 and 24x40GbE line cards (all line rate, duh). You already know it has up to 192x40GbE or 768x10GbE(via breakout cables) in a 14U footprint - half the size of the competition which was already claiming to be a fraction of the size of their other competition.

5Tbps Switching fabric module (max 4 per switch)

 

Power rated at 5.5 watts per 10GbE port or 22W per 40GbE port.

They are still pushing their Direct Attach approach, support for which still hasn't exactly caught on in the industry. I didn't really want Direct Attach but I can see the value in it. They apparently have a solution for KVM ready to go, though their partner in crime that supported VMware abandoned the 10GbE market a while ago (it required special NIC hardware/drivers for VMware). I'm not aware of anyone that has replaced that functionality from another manufacturer.

I found it kind of curious they rev'd their Operating system 3 major versions for this product. (from version 12 to 15). They did something similar when they re-wrote their OS about a decade ago (jumped from version 7 to 10).

Anyways not much else to write about, I do plan to write more on the cloud soon just been so busy recently - this is just a quick post / random thought.

So the clock starts - how long until someone else comes out with a midplane-less chassis design ?

28Nov/11Off

Info-tech report on data center switching solutions

TechOps Guy: Nate

I came across this report on Extreme's site which seems to be from somewhat of an "independent 3rd party", but I've not heard of them so I can't vouch for them.

I'd like to consider myself at least somewhat up to date on what is out there so when things like this come out I do find it interesting to see what they say.

The thing that stands out to me the most: Arista Networks has only 75 employees ?!? Wow, they've been able to do all of that work with only 75 employees? Really? Good job.. that is very surprising to me, I mean most of the companies I have worked at have had more than 75 employees and they've accomplished (in my opinion) a fraction of what Arista seems to have, at least from a technology stand point (revenue wise is a different story assuming again the report is accurate).

The thing that made me scratch my head the most: Cisco allows you to run virtual machines on their top of rack switches? Huh? Sounds like EMC and them wanting you to run VMs on their VMAX controllers? I recall at one point Citrix and Arista teamed up to allow some sort of VM to run Netscaler embedded in the Arista switches, though never heard of anyone using it and never heard Citrix promoting it over their own stuff. Seemed like an interesting concept, though no real advantage to doing it I don't think (main advantage I can think of is non blocking access to the switch fabric which really isn't a factor with lower end load balancers since they are CPU bound not network bound).

The report seems to take a hypothetical situation where a fairly large organization is upgrading their global network and then went to each of the vendors and asked for a proposal. They leave out what each of the solutions was, specifically which is dissapointing.

They said HP was good because it was cheap, which is pretty much what I've heard in the field, it seems nobody that is serious runs HP Procurve.

They reported that Juniper and Brocade were the most "affordable" (having Juniper and affordable together makes no sense), and Arista and Force10 being least affordable (which seems backwards too - they are not clear on what they used to judge costs, because I can't imagine a Force10 solution costing more than a Juniper one).

They placed some value on line cards that offered both copper and fiber at the same time, which again doesn't make a lot of sense to me since you can get copper modules to plug into SFP/SFP+ slots fairly easily. The ability to "Run VMs on your switch" also seemed iffy at best, they say you can run "WAN optimization" VMs on the switches, which for a report titled "Data center networking" really should be a non issue as far as features go.

The report predicts Brocade will suffer quite a bit since Dell now has Force10. How Brocade doesn't have as competitive products as they otherwise could have.

They tout Juniper's ability to have multiple power supplies, switch fabrics, routing modules as if it was unique to Juniper, which makes no sense to me either. They do call out Juniper for saying their 640-port 10GbE switch is line rate only to 128 ports.

They believe Force10 will be forced into developing lower end solutions to fill out Dell's portfolio rather than staying competitive on the high end, time will tell.

Avaya? Why bother? They say you should consider them if you've previously used Nortel stuff.

They did include their sample scenario that they sent to the vendors and asked for solutions for. I really would of liked to have seen the proposals that came back.

A four-site organization with 7850 employees located at a Canadian head office facility, and three branch offices located in the US, Europe, and Canada. The IT department consists of 100 FTE, and are located primarily at the Canadian head office, with a small proportion of IT staff and systems located at the branch offices.

The organization is looking at completing a data center refurbish/refresh:

The organization has 1000 servers, 50% of which are virtualized (500 physical). The data center currently contains 40 racks with end-of-row switches. None of the switching/routing layers have any redundancy/high availability built in, leaving many potential single points of failure in the network (looking for 30% H/A).

A requirements gathering process has yielded the need for:

  • A redundant core network, with capacity for 120 x 10Gbps SFP+ ports
  • Redundant top of rack switches, with capacity for 48 x 1Gbps ports in each rack
  • 1 ready spare ToR switch and 1 ready spare 10Gps chassis card
  • 8x5xNBD support & maintenance
  • Nameplate data – power consumption - watts/amps
  • 30% of the servers to be highly available

It is unclear how redundant they expect the network to be, would a single chassis with redundant fabrics and power supplies be enough or would you want two? They are also not clear as to what capabilities their ToR switches need other than the implied 10Gbps uplinks.

If I were building this network with Extreme gear I would start out with two pairs of stacked X670Vs at the core(each stack having 96x10GbE), each stack would be connected by 2x40GbE connections with passive copper cabling. The two stacks would be linked together with 4x40GbE connections with passive copper cabling, running (of course) ESRP as the fault tolerance protocol of choice between the two. This would provide 192x10GbE ports between the two stacks, with half being active half being passive.

Another, simpler approach would be to just stack three of the X670V switches together for 168x10GbE active-active ports. Though you know I'm not a big fan of stacking(any more than I am running off a single chassis), if I am connecting 1000 servers I want a higher degree of fault tolerance.

Now if you really could not tolerate a active/passive network, if you really needed that extra throughput then you can use M-LAG to go active-active at layer 2, but I wouldn't do that myself unless you were really sure you needed that ability. I prefer the reduced complexity with active/passive.

As for the edge switches, they call for redundant 48 port 1GbE switches. Again they are not clear as to their real requirements, but what I would do (what I've done in the past) is two stand alone 48-port 1GbE switches, each with 2x10GbE (Summit X460) or 4x10GbE(Summit X480) connections to the core. These edge switches would NOT be stacked, they would be stand alone devices. You could go lower cost with Summit X450e, or even Summit X350 though I would not go with the X350 for this sort of data center configuration. Again I assume your using these switches in an active-passive way for the most part(as in 1 server is using 1 switch at any given time), though if you needed a single server to utilize both switches then you could go the stacking approach at the edge, all depends on what your own needs are - which is why I would of liked to have seen more detail in the report. Or you could do M-LAG at the edge as well, but ESRP's ability to eliminate loops is hindered if you link the edge switches together since there is a path in the network that ESRP cannot deal with directly (see this post with more in depth info on the how, what, and why for ESRP).

I would NOT go with a Black Diamond solution (or any chassis-based approach) unless cost was really not an issue at all. Despite this example organization having 1,000 servers it's still a small network they propose building, and the above approach would scale seamlessly to say 4 times that number non disruptively providing sub second layer 2 and layer 3 fail over. It is also seamlessly upgradeable to a chassis approach with zero downtime (well sub second), should the organization needs grow beyond 4,000 hosts. The number of organizations in the world that have more than 4,000 hosts I think is pretty small in the grand scheme of things. If I had to take a stab at a guess I would say easily less than 10% maybe less than 5%.

So in all, an interesting report, not very consistent in their analysis, lacked some more detail that would of been nice to see, but still interesting to see someone else's thought patterns.

10Nov/11Off

World’s fastest switch

TechOps Guy: Nate

I came across this yesterday which is both a video, and more importantly an in-depth report on the about-to-be-released Black Diamond X-series switch. I have written a few times on this topic, I don't have much that is really new, but then I ran across this PDF which has something I have been looking for - a better diagram on how this new next generation fabric is hooked up.

Up until now, most (all?) chassis switches relied on backplanes, or more modern systems used mid planes to transmit their electrical signals between the modules in the chassis.

Something I learned a couple of years ago (I'm not an electrical engineer) is that there are physical limits as to how fast you can push those electrons over those back and mid planes. There are serious distance limitations which makes the engineering ever more complicated the faster you push the system. Here I was thinking just crank up the clock speeds and make it go faster, but apparently it doesn't work quite that way :)

For the longest time all of Extreme's products were backplane based. Then they released a mid plane based product the Black Diamond 20808 a couple of years ago. This product was discontinued earlier this year when the X-series was announced. The 20808 had (in my simple mind) a similar design to what Force10 had been doing for many years - which is basically N+1 switch fabric modules (I believe the 20808 could go to something like 5 fabric modules), all of their previous switches had what they called MSMs, or Management Switch Modules. These were combination switch fabric and management modules., with a maximum of two per system, each providing half of the switch's fabric capacity. Some other manufacturers like Cisco separated out their switch fabric from their management module. Having separate fabric modules really doesn't buy you much when you only have two modules in the system. But if your architecture can go to many more (I seem to recall Force10 at one point having something like 8), then of course you can get faster performance. Another key point in the design is having separate slots for your switch fabric modules so they don't consume space that would otherwise be used by ethernet ports.

Anyways, on the Black Diamond 20808 they did something else they had never done before, they put modules on both the front, AND on the back of the chassis. On top of that the modules were criss-crossed. The modules on the front were vertical, the modules on the back were horizontal. This is purely guessing here but I speculate the reason for that is, in part, to cut the distance needed to travel between the fabric and the switch ports. HP's c-Class Blade enclosure has a similar mid plane design with criss crossed components. Speaking of which, I wonder if the next generation 3PAR will leverage the same "non stop" midplane technology of the c-Class. The 5 Terabits of capacity on the c-Class is almost an order of magnitude more than what is even available on the 3PAR V800. Whether or not the storage system needs that much fabric is another question.

Black Diamond 20808 (rear)

The 20808 product seemed to be geared more towards service providers and not towards high density enterprise or data center computing(if I remember right the most you could get out of the box was 64x10GbE ports which you can now get in a 1U X670V).

Black Diamond 20808 (front)

Their (now very old) Black Diamond 8000 series (with the 8900 model which came out a couple of years ago being the latest incarnation) has been the enterprise workhorse for them for many years, with a plethora of different modules and switch fabric options. The Black Diamond 8900 is a backplane based product.  I remember when it came out too - it was just a couple months after I bought my Black Diamond 10808s, in the middle of 2005. Although if I remember right the Black Diamond 8800, as it was originally released, did not support the virtual router capability that the 10808 supported that I intended to base my network design on. Nor did it support the Clear Flow security rules engine. Support for these features was added years later.

You can see the impact distance has on the Black Diamond 8900 for example, with the smaller 6-slot chassis getting at least 48Gbps more switching capacity per line card than the 10-slot chassis simply because it is smaller. Remember this is a backplane designed probably seven years ago, so it doesn't have as much fabric capacity as a modern mid plane based system.

Anyways, back on topic, the Black Diamond X-series. Extreme's engineers obviously saw the physics (?) limits they were likely going to hit when building a next generation platform and decided to re-think how the system works, resulting, in my opinion a pretty revolutionary way of building a switch fabric (at least I'm not aware of anything else like it myself). While much of the rest of the world is working with mid planes for their latest generation of systems, here we have the Direct Orthogonal Data Path Mating System or DOD PMS (yeah, right).

Black Diamond X-Series fabric

What got me started down this path, was I was on the Data Center Knowledge web site, and just happened to see a Juniper Qfabric advertisement. I've heard some interesting things about Qfabric since it was announced, it sounds similar to the Brocade VCS technology. I was browsing through some of their data sheets and white papers and it came across as something that's really complicated. It's meant to be simple, and it probably is, but the way they explain it to me at least makes it sound really complicated. Anyways I went to look at their big 40GbE switch which is at the core of their Qfabric interconnect technology. It certainly looks like a respectable switch from a performance stand point - 128 40GbE ports, 10 Terabits of switching fabric, weighs in at over 600 pounds(I think Juniper packs their chassis products with lead weights to make them feel more robust).

So back to the report that they posted. The networking industry doesn't have anything like the SPC-1 or SpecSFS standardized benchmarks to measure performance, and most people would have a really hard time generating enough traffic to tax these high end switches.There is standard equipment that does it, but it's very expensive.

So, to a certain extent you have to trust the manufacturer as to the specifications of the product, a way many manufacturers try to prove their claims of performance or latency is to hire "independent" testers to run tests on the products and give reports. This is one of those reports.

Reading it made me smile, seeing how well the X-Series performed but in the grand scheme of things it didn't surprise me given the design of the system and the fabric capacity it has built into it.

The BDX8 breaks all of our previous records in core switch testing from performance, latency, power consumption, port density and packaging design. The BDX8 is based upon the latest Broadcom merchant silicon chipset.

For the Fall Lippis/Ixia test, we populated the Extreme Networks BlackDiamond® X8 with 256 10GbE ports and 24 40GbE ports, thirty three percent of its capacity. This was the highest capacity switch tested during the entire series of Lippis/Ixia cloud network test at iSimCity to date.

We tested and measured the BDX8 in both cut through and store and forward modes in an effort to understand the difference these latency measurements offer. Further, latest merchant silicon forward packets in store and forward for smaller packets, while larger packets are forwarded in cut-through making this new generation of switches hybrid cut-through/store and forward devices.

Reading through the latency numbers, they looked impressive, but I really had nothing to compare them with, so I don't know how good. Surely for any network I'll ever be on it'd be way more than enough.

The BDX8 forwards packets ten to six times faster than other core switches we’ve tested.

[..]

The Extreme Networks BDX8 did not use HOL blocking which means that as the 10GbE and 40GbE ports on the BDX8 became congested, it did not impact the performance of other ports. There was no back pressure detected. The BDX8 did send flow control frames to the Ixia test gear signaling it to slow down the rate of incoming traffic flow.

Back pressure? What an interesting term for a network device.

The BDX8 delivered the fastest IP Multicast performance measured to date being able to forward IP Multicast packets between 3 and 13 times faster then previous core switch measures of similar 10GbE density.

The Extreme Networks BDX8 performed very well under cloud simulation conditions by delivering 100% aggregated throughput while processing a large combination of east-west and north-south traffic flows. Zero packet loss was observed as its latency stayed under 4.6 μs and 4.8 μs measured in cut through and store and forward modes respectively. This measurement also breaks all previous records as the BDX8 is between 2 and 10 times faster in forwarding cloud based protocols under load.

[..]

While these are the lowest Watts/10GbE port and highest TEER values observed for core switches, the Extreme Networks BDX8’s actual Watts/10GbE port is actually lower; we estimate approximately 5 Watts/10GbE port when fully populated with 768 10GbE or 192 40GbE ports. During the Lippis/Ixia test, the BDX8 was only populated to a third of its port capacity but equipped with power supplies, fans, management and switch fabric modules for full port density population. Therefore, when this full power capacity is divided across a fully populated BDX8, its WattsATIS per 10GbE Port will be lower than the measurement observed [which was 8.1W/port]

They also mention the cost of power, and the % of list price that cost is, so we can do some extrapolation. I suspect the list price of the product is not final, and I am assuming the prices they are naming are based on the configuration they are testing with rather than a fully loaded system(which as mentioned above the switch was configured with enough fabric and power for the entire chassis but only ~50% of the port capacity was installed).

Anyways, they say the price to power it over 3 years is $10,424.05 and say that is less than 1.7% of it's list price. Extrapolating that a bit I can guesstimate that the list price of this system as tested with 352 10GbE ports is roughly $612,000, or about $1,741 per 10GbE port.

The Broadcom technology is available to the competition, the real question is how long will it take for the competition develop something that can compete with this 20 Terabit switching fabric, which seems to be about twice as fast as anything else currently on the market.

HP has been working on some next generation stuff, I read about their optical switching technology earlier this year that their labs are working on, sounds pretty cool.

[..] Charles thinks this is likely to be sometime in the next 3-to-5 years.

So, nothing on the immediate horizon on that front.

18Oct/11Off

Cisco’s new 10GbE push – a little HP and Dell too

TechOps Guy: Nate

Just got done reading this from our friends at The Register.

More than anything else this caught my eye:

On the surface it looks pretty impressive, I mean it would be interesting to see exactly how Cisco configured the competing products as in which 60 Juniper devices or 70 HP devices did they use and how were they connected?

One thing that would of been interesting to call out in such a configuration, is the number of logical devices needed for management. For example I know Brocade's VDX product is some fancy way of connecting lots of devices sort of like more traditional stacking just at a larger scale for ease of management. I'm not sure whether or not the VDX technology extends to their chassis product as Cisco's configuration above seems to imply using chassis switches. I believe Juniper's Qfabric is similar. I'm not sure if HP or Arista have such technology(I don't believe they do). I don't think Cisco does - but they don't claim to need it either with this big switch. So a big part of the question is managing so many devices, or just managing one. Cost of the hardware/software is one thing..

HP recently announced a revamp of their own 10GbE products, at least the 1U variety. I've been working off and on with HP people recently and there was a brief push to use HP networking equipment but they gave up pretty quick. They mentioned they were going to have "their version" of the 48-port 10-gig switch soon, but it turns out it's still a ways away - early next year is when it's supposed to ship, even if I wanted it  (which I don't) - it's too late for this project.

I dug into their fact sheet, which was really light on information to see what, if anything stood out with these products. I did not see anything that stood out in a positive manor, I did see this which I thought was kind of amusing -

Industry-leading HP Intelligent Resilient Framework (IRF) technology radically simplifies the architecture of server access networks and enables massive scalability—this provides up to 300% higher scalability as compared to other ToR products in the market.

Correct me if I'm wrong - but that looks like what other vendors would call Stacking, or Virtual Chassis. An age-old technology, but the key point here was the up to 300% higher scalability. Another way of putting it is at least 50% less scalable - when your comparing it to the Extreme Networks Summit X670V(which is shipping I just ordered some).

The Summit X670 series is available in two models: Summit X670V and Summit X670. Summit X670V provides high density for 10 Gigabit Ethernet switching in a small 1RU form factor. The switch supports up to 64 ports in one system and 448 ports in a stacked system using high-speed SummitStack-V160*, which provides 160 Gbps throughput and distributed forwarding. The Summit X670 model provides up to 48 ports in one system and up to 352 ports in a stacked system using SummitStack-V longer distance (up to 40 km with 10GBASE-ER SFP+) stacking technology.

In short, it's twice as scalable as the HP IRF feature, because it goes up to 8 devices (56x10GbE each), and HP's goes up to 4 devices (48x10GbE each -- or perhaps they can do 56 too with breakout cables since both switches have the same number of physical 10GbE and 40GbE ports).

The list price on the HP switches is WAY high too, The Register calls it out at $38,000 for a 24-port switch. The X670 from Extreme has a list price of about $25,000 for 48-ports(I see it on-line for as low as about $17k). There was no disclosure of HP's pricing for their 48-port switch.

Extreme has another 48-port switch which is cheaper (almost half the cost if I recall right - I see it on-line going for as low as $11,300) but it's for very specialized applications where latency is really important. If I recall right they removed the PHY (?) from the switch which dramatically reduces functionality and introduces things like very short cable length limits but also slashes the latency (and cost). You wouldn't want to use those for your VMware setup(well if you were really cost constrained these are probably better than some other alternatives especially if your considering this or 1GbE), but you may want them if your doing HPC or something with shared memory or high frequency stock trading (ugh!).

The X670 also has (or will have? I'll find out soon) a motion sensor on the front of the switch which I thought was curious, but seems like a neat security feature, being able to tell if someone is standing in front of your switch screwing with it. It also apparently has the ability(or will have the ability) to turn off all of the LEDs on the switch when someone gets near it, and turn them back on when they go away.

(ok back on topic, Cisco!)

I looked at the Cisco slide above, and thought to myself, really, can they be that far ahead? I certainly do not go out on a routine basis and see how many devices and connectivity between them that I need to achieve  X number of line rate ports, I'll keep it simple, if you need a large number of line rate ports just use a chassis product(you may need a few of them). It is interesting to see though, assuming it's anywhere close to being accurate.

When I asked myself the question "Can they be that far ahead?" I wasn't thinking of Cisco, I think I'm up to 7 readers now -- you know me better than that! :)

I was thinking of the Extreme Networks Black Diamond X-Series which was announced (note not yet shipping...) a few months ago.

  • Cisco claims to do 768 x 10GbE ports in 25U (Extreme will do it in 14.5U)
  • Cisco claims to do 10W per 10GbE/port (Extreme will do it in 5W/port)
  • Cisco claims to do it with 1 device .. Well that's hard to beat but Extreme can meet them, it's hard to do it with less than one device.
  • Cisco's new top end taps out at very respectable 550Gbit per slot (Extreme will do 1.2Tb)
  • Cisco claims to do it with a list price of $1200/port. I don't know what Extreme's pricing will be but typically Cisco is on the very high end for costs.

Though I don't know how Cisco gets to 768 ports, Extreme does it via 40GbE ports and breakout cables (as far as I know), so in reality the X-series is a 40GbE switch (and I think 40GbE only - to start with unless you use the break out cables to get to 10GbE).  It was a little over a year ago that Extreme was planning on shipping 40GbE at a cost of $1,000/port. Certainly the X-series is a different class of product than what they were talking about a while ago, but prices have also come down since.

X-Series is shipping "real soon now'.  I'm sure if you ask them they'll tell you more specifics.

It is interesting to me, and kind of sad how far Force10 has fallen in the 10GbE area, I mean they seemed to basically build themselves on the back of 10GbE(or at least tried to), but I look at their current products on the very high end, and short from the impressive little 40GbE switch they have, they seem to top out at 140 line rate 10GbE in 21U. Dell will probably do well with them, I'm sure it'll be a welcome upgrade to those customers using Procurve, uh I mean Powerconnect? That's what Dell call(ed) their switches right?

As much as it pains me I do have to give Dell some props for doing all of these acquisitions recently and beefing up their own technology base, whether it's in storage, or networking they've come a long way (more so in storage, need more time to tell in networking). I have not liked Dell myself for quite some time, a good chunk of it is because they really had no innovation, but part of it goes back to the days before Dell shipped AMD chips and Dell was getting tons of kick backs from Intel for staying an Intel exclusive provider.

In the grand scheme of things such numbers don't mean a whole lot, I mean how many networks in the world can actually push this kind of bandwidth? Outside of the labs I really think any organization would be very hard pressed to need such fabric capacity, but it's there -- and it's not all that expensive.

I just dug up an old price list I had from Extreme - from late November 2005. An 6-port 10GbE module for their Black Diamond 10808 switch (I had two at the time) had a list price of $36,000. For you math buffs out there that comes to $9,000 per line rate port.

That particular product was oversubscribed (hence it not being $6,000/port) as well having a mere 40Gbps of switch fabric capacity per slot, or a total of 320Gbps for the entire switch (it was marketed as a 1.2Tb switch but hardware never came out to push the backplane to those levels - I had to dig into the depths of the documentation to find that little disclosure - naturally I found it after I purchased, didn't matter for us though I'd be surprised if we pushed more than 5Gbps at any one point!). If I recall right the switch was 24U too. My switches were 1GbE only, cost reasons :)

How far we've come..

11May/11Off

2000+ 10GbE ports in a single rack

TechOps Guy: Nate

The best word I can come up with when I saw this was

oof

What I'm talking about is the announcement of the Black Diamond X-Series from my favorite switching company Extreme Networks. I have been hearing a lot about other switching companies coming out with new next gen 10 GbE and 40GbE switches, more than one using Broadcom chips (which Extreme uses as well), so have been patiently awaiting their announcements.

I don't have a lot to say so I'll let the specs do the talking

Extreme Networks Black Diamond X-Series

 

  • 14.5 U
  • 20 Tbps switching fabric (up ~4x from previous models)
  • 1.2 Tbps fabric per line slot (up ~10x from previous models)
  • 2,304 line rate 10GbE ports per rack (5 watts per port) (768 line rate per chassis)
  • 576 line rate 40GbE ports per rack (192 line rate per chassis)
  • Built in support to switch up to 128,000 virtual machines using their VEPA/ Direct Attach system

 

 

 

This was fascinating to me:

Ultra high scalability is enabled by an industry-leading fabric design with an orthogonal direct mating system between I/O modules and fabric modules, which eliminates the performance bottleneck of pure backplane or midplane designs.

I was expecting their next gen platform to be a mid plane design (like that of the Black Diamond 20808), their previous 10GbE high density Enterprise switch Black Diamond 8800, by contrast was a backplane design (originally released about six years ago). The physical resemblance to the Arista networks chassis switches is remarkable. I would like to see how this direct mating system looks in a diagram of some kind to get a better idea on what this new design is.

Mini RJ21 adapters, 1 plug on the switch, goes to 6x1GbE ports

To put that port density in to some perspective, their older system (Black Diamond 8800), by comparison, has an option to use Mini RJ21 adapters to achieve 768 1GbE ports in a chassis (14U), so an extra inch of space gets you the same number of ports running at 10 times the speed, and line rate (the 768x1GbE is not quite to line rate but still damn fast). It's the only way to fit so many copper ports in such a small space.

 
 
 

It seems they have phased out the Black Diamond 10808 (I deployed a pair of these several years ago first released 2003), the Black Diamond 12804C (first released about 2007), the Black Diamond 12804R (also released around 2007) and the Black Diamond 20808 (this one is kind of surprising given how recent it was though didn't have anything approaching this level of performance of course, I think it was released in around 2009). They also finally seemed to drop the really ancient Alpine series (10+ year old technology) as well.

Also they seem to have announced a new high density stackable 10GbE switch the Summit X670, the successor to the X650 which was already an outstanding product offering several features that until recently nobody else in the market was providing.

Extreme Networks Summit X670

  • 1U
  • 1.28 Tbps switching fabric (roughly double that of the X650)
  • 48 x 10Gbps line rate standard (64 x 10Gbps max)
  • 4 x 40Gbps line rate (or 16 x 10Gbps)
  • Long distance stacking support (up to 40 kilometers)

The X670 from purely a port configuration standpoint looks similar to some of other recently announced products from other companies, like Arista and Force10, both of whom are using the Broadcom Trident+ chipset, I assume Extreme is using the same. These days given so many manufacturers are using the same type of hardware you have to differentiate yourself in the software, which is really what drives me to Extreme more than anything else, their Linux-based easy-to-use Extremeware XOS operating system.

Neither of these products appear to be shipping, not sure when they might ship, maybe sometime in Q3 or something.

40GbE has taken longer than I expected to finalize, they were one of the first to demonstrate 40GbE at Interop Las Vegas last year, but the parts have yet to ship (or if they have the web site is not updated).

For the most part, the number of companies that are able to drive even 10% of the performance of these new lines of networking products is really tiny. But the peace of mind that comes with everything being line rate, really is worth something !

x86 or ASIC? I'm sure performance boosts like the ones offered here pretty much guarantees that x86 (or any general purpose CPU for that matter) will not be driving high speed networking for a very long time to come.

Myself I am not yet sold on this emerging trend in the networking industry that is trying to drive everything to be massive layer 2 domains. I still love me some ESRP! I think part of it has to do with selling the public on getting rid of STP. I haven't used STP in 7+ years so not using any form of STP is nothing new for me!

11Nov/10Off

Extreme VMware

TechOps Guy: Nate

So I was browsing some of the headlines of the companies I follow during lunch and came across this article (seems available on many outlets), which I thought was cool.

I've known VMware has been a very big happy user of Extreme Networks gear for a good long time now though I wasn't aware of anything that was public about it, at least until today. It really makes me feel good that despite VMware's partnerships with EMC and NetApp that include Cisco networking gear, at the end of the day they chose not to run Cisco for their own business.

But going beyond even that it makes me feel good that politics didn't win out here, obviously the people running the network have a preference, and they were either able to fight, or didn't have to fight to get what they wanted. Given VMware is a big company and given their big relationship with Cisco I would kind of think that Cisco would try to muscle their way in. Many times they can succeed depending on the management at the client company, but fortunately for the likes of VMware they did not.

SYDNEY, November 12. Extreme Networks, Inc., (Nasdaq: EXTR) today announced that VMware, the global leader in virtualisation and cloud infrastructure, has deployed its innovative enterprise, data centre and Metro Ethernet networking solutions.

VMware’s network features over 50,000 Ethernet ports that deliver connectivity to its engineering lab and supports the IT infrastructure team for its converged voice implementation.

Extreme Networks met VMware’s demanding requirements for highly resilient and scalable network connectivity. Today, VMware’s thousands of employees across multiple campuses are served by Extreme Networks’ leading Ethernet switching solutions featuring 10 Gigabit Ethernet, Gigabit Ethernet and Fast Ethernet, all powered by the ExtremeXOS® modular operating system.

[..]

“We required a robust, feature rich and energy efficient network to handle our data, virtualised applications and converged voice, and we achieved this through a trusted vendor like Extreme Networks, as they help it to achieve maximum availability so that we can drive continuous development,” said Drew Kramer, senior director of technical operations and R&D for VMware. “Working with Extreme Networks, from its high performance products to its knowledgeable and dedicated staff, has resulted in a world class infrastructure.”

Nice to see technology win out for once instead of back room deals which often end up screwing the customer over in the long run.

Since I'm here I guess I should mention the release of the X460 series of switches which came out a week or two ago, intended to replace the now 4-year old X450 series(both "A" and "E"). Notable differences & improvements include:

  • Dual hot swap internal power supplies
  • User swappable fan tray
  • Long distance stacking over 10GbE - up to 40 kilometers
  • Clear-Flow now available when the switches are stacked (prior hardware switches could not be stacked to use Clear-Flow
  • Stacking module is now optional (X450 it was built in)
  • Standard license is Edge license (X450A was Advanced Edge) - still software upgradable all the way to Core license (BGP etc). My favorite protocol ESRP requires Advanced Edge and not Core licensing.
  • Hardware support for IPFIX, which they say is complimentary to sFlow
  • Lifetime hardware warranty with advanced hardware replacement (X450E had lifetime, X450A did not)
  • Layer 3 Virtual Switching (yay!) - I first used this functionality on the Black Diamond 10808 back in 2005, it's really neat.

The X460 seems to be aimed at the mid to upper range of GbE switches, with the X480 being the high end offering.