(here is a link to in depth analysis on the issue)
Fortunately I didn't notice any direct impact to anything I personally use. But I first got notification from one of the data center providers we use that they were having network problems they traced it down to memory errors and they frantically started planning for emergency memory upgrades across their facilities. My company does not and has never relied upon this data center for network connectivity so it never impacted us.
A short time later I noticed a new monitoring service that I am using sent out an outage email saying their service providers were having problems early this morning and they had migrated customers away from the affected data center(s).
Then I contacted one of the readers of my blog whom I met a few months ago and told him the story of my data center that is having this issue which sounded similar to a story he told me at the time about his data center provider. He replied with a link to this Reddit article which talks about how the internet routing table exceeded 512,000 routes for the first time today, and that is a hard limit in some older equipment which causes them to either fail, or to perform really slowly as some routes have to be processed in software instead of hardware.
I also came across this article (which I commented on) which mentions similar problems but no reference to BGP or routing tables (outside my comments at the bottom).
[..]as part of a widespread issue impacting major network providers including Comcast, AT&T, Time Warner and Verizon.
One of my co-workers said he was just poking around and could find no references to what has been going on today other than the aforementioned Reddit article. I too am surprised if so many providers are having issues that this hasn't made more news.
(UPDATE - here is another article from zdnet)
I looked at the BGP routing capacity of some core switches I had literally a decade ago and they could scale up to 1 million unique routes of BGP4 routes in hardware, and 2 million non unique (not quite sure what the difference is anything beyond static routing has never been my thing). I recall seeing routers again many years ago that could hold probably 10 times that (I think the main distinction between a switch and a router is the CPU and memory capacity ? at least for the bigger boxes with dozens to hundreds of ports?)
So it's honestly puzzling to me how any service provider could be impacted by this today. How any equipment not capable of handling 512k routes is still in use in 2014 (I can understand for smaller orgs but not service providers). I suppose this also goes to show that there is wide spread lack of monitoring of these sorts of metrics. In the Reddit article there is mention of talks going on for months people knew this was coming -- well apparently not everyone obviously.
Someone wasn't watching the graphs.
I'm planning on writing a blog post on the aforementioned monitoring service I recently started using soon too, I've literally spent probably five thousand hours over the past 15 years doing custom monitoring stuff and this thing just makes me want to cry it's so amazingly powerful and easy to use. In fact just yesterday I had someone email me about a MRTG document I wrote 12 years ago and how it's still listed on the MRTG site even today (I asked the author to remove the link more than a year ago that was the last time someone asked me about it, that site has been offline for 10 years but is still available in the internet archive).
This post was just a quickie inspired by my co-worker who said he couldn't find any info on this topic, so hey maybe I'm among the first to write about it.
The best word I can come up with when I saw this was
What I'm talking about is the announcement of the Black Diamond X-Series from my favorite switching company Extreme Networks. I have been hearing a lot about other switching companies coming out with new next gen 10 GbE and 40GbE switches, more than one using Broadcom chips (which Extreme uses as well), so have been patiently awaiting their announcements.
I don't have a lot to say so I'll let the specs do the talking
- 14.5 U
- 20 Tbps switching fabric (up ~4x from previous models)
- 1.2 Tbps fabric per line slot (up ~10x from previous models)
- 2,304 line rate 10GbE ports per rack (5 watts per port) (768 line rate per chassis)
- 576 line rate 40GbE ports per rack (192 line rate per chassis)
- Built in support to switch up to 128,000 virtual machines using their VEPA/ Direct Attach system
This was fascinating to me:
Ultra high scalability is enabled by an industry-leading fabric design with an orthogonal direct mating system between I/O modules and fabric modules, which eliminates the performance bottleneck of pure backplane or midplane designs.
I was expecting their next gen platform to be a mid plane design (like that of the Black Diamond 20808), their previous 10GbE high density Enterprise switch Black Diamond 8800, by contrast was a backplane design (originally released about six years ago). The physical resemblance to the Arista networks chassis switches is remarkable. I would like to see how this direct mating system looks in a diagram of some kind to get a better idea on what this new design is.
To put that port density in to some perspective, their older system (Black Diamond 8800), by comparison, has an option to use Mini RJ21 adapters to achieve 768 1GbE ports in a chassis (14U), so an extra inch of space gets you the same number of ports running at 10 times the speed, and line rate (the 768x1GbE is not quite to line rate but still damn fast). It's the only way to fit so many copper ports in such a small space.
It seems they have phased out the Black Diamond 10808 (I deployed a pair of these several years ago first released 2003), the Black Diamond 12804C (first released about 2007), the Black Diamond 12804R (also released around 2007) and the Black Diamond 20808 (this one is kind of surprising given how recent it was though didn't have anything approaching this level of performance of course, I think it was released in around 2009). They also finally seemed to drop the really ancient Alpine series (10+ year old technology) as well.
Also they seem to have announced a new high density stackable 10GbE switch the Summit X670, the successor to the X650 which was already an outstanding product offering several features that until recently nobody else in the market was providing.
- 1.28 Tbps switching fabric (roughly double that of the X650)
- 48 x 10Gbps line rate standard (64 x 10Gbps max)
- 4 x 40Gbps line rate (or 16 x 10Gbps)
- Long distance stacking support (up to 40 kilometers)
The X670 from purely a port configuration standpoint looks similar to some of other recently announced products from other companies, like Arista and Force10, both of whom are using the Broadcom Trident+ chipset, I assume Extreme is using the same. These days given so many manufacturers are using the same type of hardware you have to differentiate yourself in the software, which is really what drives me to Extreme more than anything else, their Linux-based easy-to-use Extremeware XOS operating system.
Neither of these products appear to be shipping, not sure when they might ship, maybe sometime in Q3 or something.
40GbE has taken longer than I expected to finalize, they were one of the first to demonstrate 40GbE at Interop Las Vegas last year, but the parts have yet to ship (or if they have the web site is not updated).
For the most part, the number of companies that are able to drive even 10% of the performance of these new lines of networking products is really tiny. But the peace of mind that comes with everything being line rate, really is worth something !
x86 or ASIC? I'm sure performance boosts like the ones offered here pretty much guarantees that x86 (or any general purpose CPU for that matter) will not be driving high speed networking for a very long time to come.
Myself I am not yet sold on this emerging trend in the networking industry that is trying to drive everything to be massive layer 2 domains. I still love me some ESRP! I think part of it has to do with selling the public on getting rid of STP. I haven't used STP in 7+ years so not using any form of STP is nothing new for me!