TechOpsGuys.com Diggin' technology every day

August 12, 2014

Some internet routers ran out of memory today

Filed under: Networking — Tags: , , — Nate @ 5:03 pm

(here is a link to in depth analysis on the issue)

Fortunately I didn’t notice any direct impact to anything I personally use. But I first got notification from one of the data center providers we use that they were having network problems they traced it down to memory errors and they frantically started planning for emergency memory upgrades across their facilities. My company does not and has never relied upon this data center for network connectivity so it never impacted us.

A short time later I noticed a new monitoring service that I am using sent out an outage email saying their service providers were having problems early this morning and they had migrated customers away from the affected data center(s).

Then I contacted one of the readers of my blog whom I met a few months ago and told him the story of my data center that is having this issue which sounded similar to a story he told me at the time about his data center provider. He replied with a link to this Reddit article which talks about how the internet routing table exceeded 512,000 routes for the first time today, and that is a hard limit in some older equipment which causes them to either fail, or to perform really slowly as some routes have to be processed in software instead of hardware.

I also came across this article (which I commented on) which mentions similar problems but no reference to BGP or routing tables (outside my comments at the bottom).

[..]as part of a widespread issue impacting major network providers including Comcast, AT&T, Time Warner and Verizon.

One of my co-workers said he was just poking around and could find no references to what has been going on today other than the aforementioned Reddit article. I too am surprised if so many providers are having issues that this hasn’t made more news.

(UPDATE – here is another article from zdnet)

I looked at the BGP routing capacity of some core switches I had literally a decade ago and they could scale up to 1 million unique routes of BGP4 routes in hardware, and 2 million non unique (not quite sure what the difference is anything beyond static routing has never been my thing). I recall seeing routers again many years ago that could hold probably 10 times that (I think the main distinction between a switch and a router is the CPU and memory capacity ? at least for the bigger boxes with dozens to hundreds of ports?)

So it’s honestly puzzling to me how any service provider could be impacted by this today. How any equipment not capable of handling 512k routes is still in use in 2014 (I can understand for smaller orgs but not service providers). I suppose this also goes to show that there is wide spread lack of monitoring of these sorts of metrics. In the Reddit article there is mention of talks going on for months people knew this was coming — well apparently not everyone obviously.

Someone wasn’t watching the graphs.

I’m planning on writing a blog post on the aforementioned monitoring service I recently started using soon too, I’ve literally spent probably five thousand hours over the past 15 years doing custom monitoring stuff and this thing just makes me want to cry it’s so amazingly powerful and easy to use. In fact just yesterday I had someone email me about a MRTG document I wrote 12 years ago and how it’s still listed on the MRTG site even today (I asked the author to remove the link more than a year ago that was the last time someone asked me about it, that site has been offline for 10 years but is still available in the internet archive).

This post was just a quickie inspired by my co-worker who said he couldn’t find any info on this topic, so hey maybe I’m among the first to write about it.

August 21, 2013

More IPv6 funnies…

Filed under: Networking,Random Thought — Tags: — Nate @ 5:56 pm

Random, off topic, boring post but I felt compelled to write it after reading a fairly absurd comment on slashdot from another hard core IPv6 fan.

Internet hippies at it again!

I put the original comments in italics, and the non italic stuff is the IPv6 person responding. I mean honestly I can’t help but laugh.

I was a part of the internet when it started and was the wild wild west.  Everyone had nearly unlimited ip addresses and NOBODY used them for several reasons. First nobody put everything on the internet.

That was then. Now is now. The billion people on Facebook, Twitter, Flickr don’t put anything online? Sure, it’s all crap, but it sure is not nothing.

It’s just Dumb to put workstations on the internet… Sally in accounting does not need a public IP and all it does is make her computer easier to target and attack. Hiding behind that router on a separate private network is far more secure. Plus it is easier to defend a single point of entry than it is to defend a 255.255.0.0 address space from the world.

Bullsh*t. If in IPv4 your internal network would be 192.168.10.0/24, you can define an IPv6 range for that as well, e.g. 2001:db8:1234:10::/72. And then you put in your firewall:

2001:db8:1234:10::/72 Inbound: DENY ALL

Done. Hard? No. Harder than IPv4? No. Easier? Yes. Sally needs direct connection to Tom in the other branch (for file transfer, video conference, etc):

2001:db8:1234:10::5411/128 Inbound: ALLOW ALL FROM 2001:db8:1234:11::703/128

Good luck telling your IPv4 CGN ISP you need a port forwarded.

Second I have yet to have someone give me a real need for having everything on the internet with a direct address. you have zero need to have your toaster accessible from the internet.

Oh yeah? Sally might need that 30 GB Powerpoint presentation of her coworker in the other branch. Or that 100 MB customer database. Well, you know, this [xkcd.com]. How much easier would that be with a very simple app that even you could hack together that sends a file from one IP address to the other. Simple and fast, with IPv6. Try it with IPv4.

It’s amazing to me how folks like this think that everything should just be directly connected to the internet. Apparently this IPv6 person hasn’t heard of a file server before, or a site to site VPN. Even with direct accessibility I would want to enforce VPN between the sites, if nothing else to not have to worry that any communications would not be encrypted (or in some cases WAN optimized). Same goes for remote workers – if your at a remote location and wanting to talk to a computer on the corporate LAN or data center – get on VPN. I don’t care if you have a direct route to it or not (in fact I would ensure you did not so you have no choice).

The problems this person cites have been solved for over a decade.

I’m sorry but anyone that argues that 2001:db8:1234:10::5411/128 is simpler than 192.168.10.0/24 is simpler is just …not all there.

The solutions perhaps may not be as clean as something more native, though the thought of someone wanting to move 30GB of data over anyone’s internet connection at the office would be a very bad thing to do without arranging something with IT first (do it off hours, throttle it, something).

The point is the solutions exist, and they work. Fact remains that if you go native IPv6 your going to have MUCH MORE PAIN than any of the hacks that you may have to do with IPv4 today. IPv6 fans fail to acknowledge that up front. They attack IPv4/NAT/etc and just want the world to turn the switch off of IPv4 and flip everyone over.

I have said for years I don’t look forward to IPv6 myself (mainly for the numbering scheme, it sucks hard). If the time comes where I need IPv6 for myself or the organization I work for there are other means to get it (e.g. NAT – at the load balancer level in my case) that will work for years to come (until perhaps there is some sort of mission critical mass of outbound IPv6 connectivity that I need – I don’t see that in the next 5-8 years – beyond that who knows maybe I won’t be doing networking anymore so won’t care).

I’m sure people like me are the kind of folks IPv6 people hate. I don’t blame ’em I suppose.

There is nothing – absolutely nothing that bugs me about IPv4 today. Not a damn thing hinders me or the organizations I have worked for. At one point SSL virtual hosting was an issue, but even that is solved with SNI (which I just started using fairly recently actually).

The only possibility of having an issue I think is perhaps if my organization merged with another and there was some overlapping IP space. Haven’t personally encountered that problem though in a very long time (9 years – and even then we just setup a bunch of 1:1 NATs I think – I wasn’t the network engineer at the time so wasn’t my problem).

I remember one company I worked for 13 years ago – they registered their own /24 network back in the early 90s, because the people at the time believed they had to in order to run an internal network. The IP space never got used (to my knowledge) and it was just lingering around – the contact info was out of date and we didn’t have any access to it (not that we needed it, was more a funny story to tell).

When I set this server up at Hurricane Electric, one of the things they asked me was if I wanted IPv6 connectivity, since they do it natively I believe (one of the biggest IPv6 providers out there I think globally ?). I thought about it for a few seconds and declined, don’t need it.

IPv6 fans need to come up with better justification for the world to switch other than “the internet is peer to peer and everyone needs a unique address” (because that reason doesn’t cut it for folks like me, and given the world’s glacial pace of migration I think my view is the norm rather than the exception). I’ve never really cared about peer to peer anything. The internet in general has been client-server and will likely remain so for some time (especially given the average gap between download and upload bandwidth on your typical broadband connection)

Given I have a server with ~3.6TB of usable space on a 100Mbps unlimited bandwidth connection less than 25 milliseconds from my home I’d trade download bandwidth for upload bandwidth in a HEARTBEAT – I’d love to be able to get something like 25/25Mbps unfortunately the best upload i can get is 5Mbps – while I can get 150Mbps down — my current plan is more like 2Mbps up and 16Mbps down.

Speedtest.net results for this server. I had to try several different test servers before I found one that was fast enough to handle me.

Speedtest.net results for this server. I had to try several different test servers before I found one that was fast enough to handle me.

ANYWAY…….. I had a good laugh at least.

Back to your regularly scheduled programming..

August 7, 2013

Nth Symposium 2013 Keynote: SDN

Filed under: Networking — Tags: , — Nate @ 9:11 am

Travel to HP Storage Tech Day/Nth Generation Symposium was paid for by HP; however, no monetary compensation is expected nor received for the content that is written in this blog.

“So, SDN solves a problem for me which doesn’t exist, and never has.”

– Nate (techopsguys.com)

(I think the above quote sums up my thoughts very well so I put it in at the top, it’s also buried down below too)

One of the keynotes of the Nth Generation Symposium last week was from Martin Casado, who is currently a Chief Architect at VMware, and one of the inventors of OpenFlow and the SDN concept in general.

I have read bits and pieces about what Martin has said in the past, he seems like a really smart guy and his keynote was quite good. It was nice to hear confirmation from him many of the feelings I have on SDN in general. There are some areas that I disagree with him on, that is mainly based on my own personal experience in environments I have worked in – the differences are minor, my bigger beef with SDN is not even inside the scope of SDN itself, more on that in a bit.

First off, I was not aware that the term Software Defined Networking was created on the spot by some reporter of the MIT Technology Review. Apparently this reporter who was interviewing Martin had just done an article on Software Defined Radio, the reporter asked Martin what should they call this thing he created? He didn’t know, so the reporter suggested Software Defined Networking since that term was still fresh in the reporter’s head. He agreed and the term was born..

Ripping from one of his slides:

What does SDN Promise?

  • Enable rapid innovation in Networking
  • Enable new forms of network control
  • It’s a mechanism for implementers
  • Not a solution for customers

That last bit I did not notice until a few moments ago, that is great to see as well.

He says network virtualization is all about operational simplification

Martin's view of Network Virtualization

Martin’s view of Network Virtualization

What Network Virtualization is

  • Decoupling of the services provided by a virtualized network from the physical network
  • Virtual network is a container of network services (L2-L7) provisioned by software
  • Faithful reproduction of services provided by physical network

He showed an interesting stat claiming that half of all server access ports are already virtualized, and we’re on track to get to 67% in 2 years. Also apparently 40% of virtualization admins also manage virtual switching.

Here is an interesting slide showing a somewhat complex physical network design and how that can be adapted to be something more flexible with SDN and network virtualization:

The migration of physical to virtual

The migration of physical to virtual

Top three reasons for deploying software defined networks

  1. Speed
  2. Speed
  3. Speed

(from another one of Martin’s slides – and yes he had #1,#2,#3 as all the same anything beyond speed was viewed as a distant reason relative to speed)

Where I stand on Martin’s stuff

So first off let me preface this as I am a customer. I have managed L2-L7 networks off and on for the past 12 years now, on top of all of my other stuff. I have designed and built from the ground up a few networks. Networking has never been my primary career path. I couldn’t tear apart an IP packet and understand it if my life depended on it. That being said I have been able to go toe to toe with every “Network Engineer” I have worked with(on almost everything except analyzing packet dumps beyond the most basic of things). I don’t know if that says something about me, or them, or both.

I have worked in what you might consider nothing but “web 2.0” stuff for the past decade. I have never had to support big legacy applications, everything has been modern web based stuff. In two cases it was a three tier application (web+app+db) the others were two tier. I have supported Java, PHP, Ruby and Perl apps (always on Linux).

None of the applications I supported were “web scale” (and I will argue till I am blue in the face that most(99%) organizations will never get to web scale). The biggest scaling application was at the same time my first application – I calculated the infrastructure growth as 1,500%(based on raw CPU capacity) over roughly 3 years – to think the ~30 racks of servers could today fit into a single blade enclosure with room to spare..

What does SDN solve?

Going briefly to another keynote by someone at Intel they had this slide, which goes to show some of the pain they have –

Intel's network folks take 2-3 weeks to provision a service

Intel’s network folks take 2-3 weeks to provision a service

Intel’s own internal IT estimates say it takes them 2-3 weeks to provision a new service. This makes really no sense to me, but there is no description as to what is involved with configuring a new service.

So going back to SDN. From what I read, SDN operates primarily at L2-L3. The firewalls/load balancers etc are less SDN and more network virtualization and seem to be outside the scope of core SDN (OpenFlow). To-date I have not seen a single mention of the term SDN when it comes to these services from any organization. It’s all happening at the switch/routing layer.

So I have to assume here for a moment that it takes Intel 2-3 weeks to provision new VLANs, perhaps deploy some new switches, or update some routes or something like that (they must use Cisco if it takes that long!).

My own network designs

Going to my own personal experience – keeping things simple.  Here is a sample network design of mine that is recent:

Basic Network Zoning architecture

Basic Network Zoning architecture

There is one major zone for the data center itself, which is a /16(levering Extreme’s Layer 3 Virtual switching), then within that, at the moment are three smaller zones (I think supernet may be the right word to describe them), and within those supernets are sub zones (aka subnets aka VLANs). A couple of different sizes for different purposes. Some of the sub zones have jumbo frames enabled, most do not. There is a dedicated sub zone for Vmotion(this VLAN has no router interface on it, in part for improved security perhaps), infrastructure management interfaces, etc. Each zone (A-C) has a sub zone dedicated to load balancer virtual IPs for internal load balancing. The load balancer is directly connected to all of the major zones. Routing to this data center (over VPN – either site to site, or end user VPN) is handled by a simple /16 route, and individual WAN-based ACLs are handled by the VPN appliance.

There are a few misc zones in the middle for various purposes, these have no access restrictions on them at all. Well the VPN client stuff, the ACLs for those are handled by the VPN appliance, not by the rest of the network.

This specific network design is not meant to be extremely high security as that need does not exist in this organization (realistic need, I have seen on several occasions network engineers over engineer something for security when it really was not required and as a result introduce massive bottlenecks into the network – this became an even greater concern for me with all servers running with multiple 10GbE links). The access controls are mainly to protect casual mistakes. Internet facing services in all zones have the same level of security, so if you happen to be able to exploit one of them(I’ve never seen this happen at any company on anything I’ve been responsible for – not that I go to paranoid lengths to secure things either), there’s nothing stopping you from exploiting the others in the exact same way. Obviously nothing is directly connected to the internet other than the load balancer(which runs a hardened operating system), and a site to site VPN appliance(also hardened).

The switch blocks TCP SYN  & UDP packets between the respective zones above, since it is not stateful. The switch operates at line rate 10GbE w/ASIC-based ACLs, and performing this function in a hardware (or software) firewall I figured would be too much complexity and reduce performance (not to mention the potential costs of a firewall that is capable of line rate 10Gbps+ – given multiple servers each with multiple 10GbE ports – the possibility exists of throughput far exceeding that of 10Gbps – with the switch it is line rate on every port – up to 1.2Tbps on this switching platform – how much is that firewall again?).

There are four more VLANs related to IP-based storage- two for production and two for non production though they have never really been used to-date. I have the 3PAR iSCSI on these VLANs, with jumbo frames(the purpose of the VLANs), though all of the client systems at the moment use standard frame sizes (iSCSI runs on top of TCP which provides MTU auto negotiation).

There is a pair of hardware load balancers, each has a half dozen or so VLANs, each zone has a dedicated load balancer VLAN for that zone, for services in that zone. The LBs are also the connected to the internet of course, in a two-armed configuration.

Sample two-arm configuration for a LB

Sample two-arm configuration for a LB from Citrix documentation

I have a similar configuration in another data center using a software load balancer of the same type – however the inability to support more than 4 NICs (4 VLANs at least in vSphere 4.1 – not sure if this is increased in 5.x) limits the flexibility of that configuration relative to the physical appliances, so I had to make a few compromises in the virtual’s case.

So I have all these VLANs, a fully routed layer 3 switching configuration, some really basic ACLs to prevent certain types of communication, load balancers to route traffic from the internet as well as distribute load in some cases.

Get to the point already!

The point of all of this is things were designed up front, provisioned up front, and as a result over the past 18 months we have not had to make any changes to this configuration despite  more than doubling in size during that time. We could double again and not have a problem. Doubling again beyond that I may need to add one or two VLANs (sub zones), though I believe the zones as they exist today could continue to exist, I would not have to expand them. I really do not think the organization running this will ever EVER get to that scale. If they do then they’re doing  many billions in revenue a year and we can adapt the system if needed(and probably at that point we’d have one or more dedicated network engineers who’d likely promptly replace whatever I have built with something significantly more(overly so) complicated because they can).

If we are deploying a new application, or a new environment we just tell VMware where to plop the VM. If it is QA/Dev then it goes in that zone, if it is testing, it goes in another, production etc.. blah blah…

More complexity outside switching+routing

The complexity when deploying a new network service really lies in the load balancer from an network infrastructure perspective. Not that it is complicated but that stuff is not pre-provisioned up front. Tasks include:

  • Configuring server name to IP mappings (within the LB itself)
  • Creating Service group(s) & adding servers to the service groups
  • Creating virtual server(s) & assigning IPs + DNS names to them
  • Creating content switching virtual server(s) & assigning IPs + DNS names to them
  • Configuring content switching virtual server(s) – (adding rules to parse HTTP headers and route traffic accordingly)
  • Importing SSL cert(s) & assigning them to the virtual servers & cs virtual servers

The above usually takes me maybe 5-20 minutes depending on the number of things I am adding. Some of it I may do via GUI, some I may do via CLI.

None of this stuff is generic, unless we know specifically what is coming we can’t provision that in advance(I’m a strong believer in solid naming conventions – which means no random names!!!).

The VMs by contrast are always very generic(other than the names of course), there’s nothing special to them, drop them in the VLAN they need to be and they are done – we have no VMs that I can think of that have more than one vNIC other than the aforementioned software load balancers. Long gone are the days (for me) where a server was bridged between two different networks – that’s what routers are for.

Network is not the bottleneck for deploying a new application

In fact in my opinion the most difficult process of getting a new application up and running is getting the configuration into Chef. That is by far the longest part of any aspect of the provisioning process. It can take me, or even us, hours to days to get it properly configured and tested. VMs take minutes, load balancer takes minutes. Obviously a tool like Chef makes it much easier to scale an existing application since the configuration is already done. This blog post is all about new applications or network services.

Some of the above could be automated with using the APIs on the platform(they’ve been there for years), and some sort of dynamic DNS or whatever. The amount of work involved to build such a system for an operation of our scale isn’t worth the investment.

The point here is, the L2/L3 stuff is trivial – at least for an operation that we run at today – and that goes for all of the companies I have worked at for the past decade. The L2/L3 stuff flat out doesn’t change very often and doesn’t need to. Sometimes if there are firewalls involved perhaps some new holes need to be poked in them but that just takes a few minutes — and from what I can tell is outside the scope of SDN anyway.

I asked Martin a question on that specific topic. It wasn’t well worded but he got the gist of it. My pain when it comes to networking is not the L2/L3 area – it is the L7 area. Well if we made extensive use of firewalls than L3 fire-walling would be an issue as well. So I asked him how SDN addresses that(or does it). He liked the question and confirmed that SDN does not in fact address that. That area should be addressed by a “policy management tool” of some kind.

I really liked his answer – it just confirms my thoughts on SDN are correct.

Virtual Network limitations

I do like the option of being able to have virtual network services, whether it is a load balancer or firewall or something. But those do have limitations that need to be accounted for. Whether it is performance, flexibility (# of VLANs etc), as well as dependency (you may not want to have your VPN device in a VM if your storage shits itself you may lose VPN too!). Managing 30 different load balancers may in fact be significantly more work(I’d wager it is- the one exception is service provider model where you are delegating administrative control to others – which still means more work is involved it is just being handled by more staff) than managing a single load balancer that supports 30 applications.

Citrix Netscaler Traffic Flow

Citrix Netscaler Cluster Traffic Flow

Above is a diagram from Citrix from an earlier blog post I wrote about last year. At the time their clustering tech scaled to 32 systems, which if that still holds true today, at the top end @ 120Gbps/system that’d be nearly 4Tbps of theoretical throughput. Maybe cut that in half to be on the safe side, so roughly 2Tbps..that is quite a bit.

Purpose built hardware network devices have long provided really good performance and flexibility. Some of them even provide some layer of virtualization built in. This is pretty common in firewalls. More than one load balancing company has appliances that can run multiple instances of their software as well in the event that is needed. I think the instances that would be required (outside of a service provider giving each customer their own LB) is quite limited.

Design the network so when you need such network resources you can route to them easily – it is a network service after all, addressable via the network – it doesn’t matter if it lives in a VM or on a physical appliance.

VXLAN

One area that I have not covered with regards to virtualization is something that VXLAN offers, which is make the L2 network more portable between data centers and stuff. This is certainly an attractive feature to have for some customers, especially if perhaps you rely on something like VMware’s SRM to provide fail over.

My own personal experience says VXLAN is not required, nor is SRM. Application configurations for the most part are already in a configuration management platform. Building new resources at a different data center is not difficult (again in my experience most of the applications I have supported this could even be done in advance), in the different IP space and slightly different host names (I leverage the common airportcode.domain for each DC to show where each system is physically located). Replicate the data(use application based replication where available e.g. internal database replication) that is needed(obviously that does not include running VM images) and off you go. Some applications are more complex, most web-era applications are not though.

So, SDN solves a problem for me which doesn’t exist, and never has.

I don’t see it existing in the future for most smaller scale (sub hyper scale) applications unless your network engineers are crazy about over engineering things. I can’t imagine what is involved that takes 2-3 weeks to provision a new network service at Intel. I really can’t.  Other than perhaps procuring new equipment, which can be a problem regardless.

Someone still has to buy the hardware

Which leads me into a little tangent. Just because you have cloud doesn’t mean you automatically have unlimited capacity. Even if your Intel, if someone internally built something on their cloud platform(assuming they have one), and said “I need 100,000 VMs each with 24 CPUs and I plan to drive them to 100% utilization 15 hours a day, even with cloud, I think it is unlikely they have that much capacity provisioned as spare just sitting around(and if they do that is fairly wasteful!).

Someone has to buy and provision the hardware, whether it is in a non cloud setup, or in a cloud setup. Obviously once provisioned into a pool of “cloud” (ugh) it is easier to adapt that system to be used for multiple purposes. But the capacity has to exist, in advance of the service using it. Which means someone is going to spend some $$ and there is going to be some lead time to get the stuff in & set it up. An extreme case for sure, but consider if you need to deploy on the order of 10s of thousands of new servers that lead time may be months, to get the floor space/power/cooling alone.

I remember a story I heard from SAVVIS many years ago, a data center they operated in the Bay Area had a few 10s of thousands of square feet available, and it was growing slow and steady. One day Yahoo! walks in and says I want all of your remaining space. Right Now.  And poof it was given to them. There was a data center company Microsoft bought (forgot who now) and there was one/more facilities up in the Seattle area where (I believe) they kicked out the tenants of the company they bought so they could take over the facility entirely(don’t recall how much time they granted the customers to GTFO but I don’t recall hearing them being polite about it).

So often — practically all the time — when I see people talk about cloud they think that the stuff is magical and no matter how much capacity you need it just takes minutes to be made available (Intel slide above). Now if you are a massive service provider like perhaps Amazon, Microsoft, Google  – you probably do have 100,000 systems available at any given time. Though the costs of public cloud are ..not something I will dive into again in this post, I have talked about that many times in the past.

Back to Martin’s Presentation

Don’t get me wrong — I think Martin is a really smart guy and created a wonderful thing. My issue isn’t with SDN itself, it’s much more with the marketing and press surrounding it, making it sound like everyone needs this stuff! Buy my gear and get SDN!! You can’t build a network today without SDN!! Even my own favorite switching company Extreme Networks can’t stop talking about SDN.

Networking has been boring for a long time, and SDN is giving the industry something exciting to talk about. Except that it’s not exciting – at least not to me, because I don’t need it.

Anyway one of Martin’s last slides is great as well

Markitechture war with SDN

Markitechture war with SDN

Self explanatory, I especially like the SDN/python point.

 

 Conclusion

I see SDN as a great value primarily for service providers and large scale operations at this point. Especially in situations where providers are provisioning dedicated network resources for each customer(network virtualization here works great too).

At some point, perhaps when SDN matures more and it becomes more transparent, then mere mortals will probably find it more useful. As Martin says in one of his first slides, SDN is not for customers(me?), it’s for implementers (that may be me too depending on what he means there, but I think it’s more for the tools builders, people who make things like cloud management interfaces, vCenter programmers etc).

Don’t discount the power/performance benefits of ASICs too much. They exist for a reason, if network manufacturers could build 1U switches to shift 1+Tbps of data around with nothing more than x86 CPUs and have a reasonable power budget I have no doubt they would. Keep this in mind when you think about a network running in software.

If you happen to have a really complicated network then SDN may provide some good value there. I haven’t worked in such an organization, though my first big network(my biggest) was a bit complicated (though it was simpler than the network that it replaced), I learned some good things from that experience and adapted future designs accordingly.

I’ll caveat this all by saying the network design work I have done again has been built for modern web applications, I don’t cover ultra security things like say processing credit cards (that IMO would be a completely physically separate infrastructure for that subsystem to limit the impact of PCI and other compliance things – that being said my first network again did process credit cards directly – this was before PCI compliance existed though, there were critical flaws in the application with regards to credit card security at the time as well). Things are simple, and fairly scalable (not difficult to get to low thousands of systems easily, and that already eclipses the networks of most organizations out there by a big margin).

I believe if your constantly making changes to your underlying L2/L3 network (other than say perhaps adding physical devices to support more capacity) then you probably didn’t design it right to begin with (maybe not your fault). If you need to deploy a new network service, just plug it in and go..

For myself – my role has always been a hybrid of server/storage/network/etc management. So I have visibility into all layers of the application running on the network. So perhaps that makes me better equipped to design things in a way vs. someone who is in a silo and has no idea what the application folks are doing.

Maybe an extreme example but now that I wrote that I remember back many years ago, we had a customer who was a big telco, and their firewall rule change process was once a month a dozen or more people from various organizations(internal+external) get on a conference call to co-ordinate firewall rule changes(and to test connectivity post changes). It was pretty crazy to see. You probably would of had to get the telco’s CEO approval to get a firewall change in that was outside that window!

Before I go let me give a shout out to my favorite L3 switching fault tolerance protocol: ESRP.

I suppose the thing I hesitate most about this post, is paranoid around missing some detail which invalidates every network design I’ve ever done and makes me look like even more of an idiot than I already am!! Though I have talked with enough network people over the years that I don’t believe that will happen…

If you’re reading this and are intimately familiar with an organization that takes 2-3 weeks to spin up a network service I’d like to hear from you (publicly or privately) as to the details around what specifically takes the bulk of that time. Anonymous is fine too, I won’t write anything on it if you don’t want me to. I suspect the bulk of the time is red tape – processes, approvals etc..and not related to the technology.

So, thanks Martin for answering my questions at the conference last week! (I wonder if he will read this…some folks have google alerts for things that are posted and stuff). If you are reading this and you are wondering – yes I really have been a VMware customer for 14 years – going back to pre 1.0 days when I was running VMware on top of Linux. I still have my CD of Vmware 1.0.2 around here somewhere — I think that was the first available physical media distributed. Though my loyalty to VMware has eroded significantly in recent years for various reasons.

May 7, 2013

Internet Hippies at it again

Filed under: Networking — Tags: , — Nate @ 8:50 am

I was just reading a discussion on slashdot about IPv6 again.  So apparently BT has announced plans to deploy carrier grade NAT (CGN) for some of their low tier customers. Which is of course just a larger scope higher scale deployment of NAT.

I knew how the conversation would go but I found it interesting regardless. The die hard IPv6 folks came out crying fowl

Killing IPv4 is the only solution. This is a stopgap measure like carpooling and congestion charges that don’t actually fix the original problem of a diminishing resource.

(disclaimer – I walk to work)

[..]how on earth can you make IPv6 a premium option if you don’t make IPv4 unbearably broken and inconvenient for users?

These same folks often cry out about how NAT will break the internet, because they can’t do peer to peer stuff (as easily in some cases, others may not be possible at all). At the same time they advocate a solution (IPv6) that will break FAR more things than NAT could ever hope to break. At least an order of magnitude more.

They feel the only way to make real progress is essentially to tax the usage of IPv4 high enough that people are discouraged from using it, thus somehow bringing immediate global change to the internet and get everyone to switch to IPv6.  Which brings me to my next somewhat related topic.

Maybe they are right – I don’t know. I’m in no hurry to get to IPv6 myself.

Stop! Tangent time.

The environmentalists are of course doing the same thing — not long ago a law took effect here in the county I am at where they have banned plastic bags at grocery stores and stuff. You can still get paper pags at a cost of $0.10/bag but no more plastic.  I was having a brief discussion on this with a friend last week and he was questioning the stores for charging folks he didn’t know it was the law that was mandating it. I have absolutely, not a shred of doubt that if the environmentalists could have their way they would of banned all disposable bags. That is their goal – the tax is only $0.10 now but it will go up in the future they will push it as high as they can for the same reason, to discourage use. Obviously customers were already paying for plastic and paper bags before – the cost was built into the margins of the products they buy – just like they were paying for the electricity to keep the dairy products cool.

In Washington state I believe there was one or two places that actually tried to ban ALL disposable bags. I don’t remember if the laws passed or not, but I remember thinking that I wanted to just go to one or more of their grocery stores, load up a cart full of stuff, go to checkout. Then they tell me I have to buy bags and I would just walk out. I wanted to soo badly though I am more polite than that so I didn’t.

Safeway gave me 3 “free” reusable bags the first time I was there after the law passed and I bought one more since. I am worried about contamination more than anything else, there have been several reports of the bags being contaminated mainly by meat and stuff because people don’t clean them regularly.

I’ll admit (as much as it pains me) that there is one good reason to use these bags over the disposable ones that didn’t really hit me until I went home that first night – they are a lot stronger, so they hold more. I was able to get a full night’s shopping in 3 bags, and those were easier to carry than the ~8 or so that would otherwise be used with disposable.

I think it’s terrible to have the tax on paper since that is relatively much more green than plastic. I read an article one time that talked about paper vs plastic and the various regions in our country at least – what is more green. The answer was it varied, on the coast lines like where I live paper is more green. In the middle parts of the country plastic was more green. I forgot the reasons given but they made sense at the time. I haven’t been able to dig up the article I have no idea where I read it.

I remember living in China almost 25 years ago now, and noticing how everyone was using reusable bags, similar to what we have now but they were, from what I remember, more like knitted plastic.  They used them I believe mainly because they didn’t have an alternative – they didn’t have the machines and stuff to cheaply mass produce those bags.  I believe I remember reading at some point the usage of disposable bags really went up in the following years before reversing course again towards the reusables.

Myself I have recycled my plastic bags (at Safeway) for a long time now, as long as I can remember.  Sad to see them go.

I’ll end with a quote from Cartman (probably not a direct quote I tried checking)

Hippies piss me off

(Hey Hippies – go ban bottled water now too while your at it – I go through about 60 bottles a week myself, I’ve been stocking up recently because it was cheap(er than normal) I think I have more than 200 bottles in my apartment now – I like the taste of Arrowhead water). I don’t drink much soda at home these days basically replaced it with bottled water so I think cost wise it’s an improvement 🙂 )

(same goes for those die hard IPv6 folks – you can go ahead, slap CGNAT on my internet connection at home – I don’t care. I already have CGNAT on my cell phone(it has a 10.x IP) and when it is in hotspot mode I notice nothing is broken. The only thing I do that is peer to peer is skype(for work, I don’t use it otherwise), everything else is pure client-server).  I have a server(a real server that this blog is hosted on) in a data center (a real data center not my basement) with 100Mbps and unlimited bandwidth to do things that I can’t do on my home connection (mainly due to bandwidth constraints and dynamic IP).

I proclaim IPv6 die hards as internet hippies!

My home network has a site to site VPN with the data center, and if I need to access my home network remotely, I just VPN to the data center and access it that way. If you don’t want to host a real server(it’s not cheap), there are other cheaper solutions like VPS or whatever that are available for pennies a day.

April 30, 2013

Openflow inventor admits SDN is hype

Filed under: Networking — Tags: — Nate @ 8:25 am

The whole SDN thing has bugged me for a long time now. The amount of hype behind it has driven me mad. I have asked folks to explain to me what SDN is, and I have never really gotten a good answer. I have a decent background in networking but it’s never been my full time responsibility (nor do I wish it to be).

I am happy to report that it seems I am not crazy. Yesterday I came across an article on slashdot from the inventor of Openflow, the same guy that sold his little networking startup Nicira for a cool $1.2B (and people thought HP paid too much for 3PAR).

He admits he doesn’t know what SDN is either anymore.

Reading that made me feel better 🙂

On top of that our friends over at El Reg recently described SDN as an industry “hype gasm”. That too was refreshing to see. Finally more people are starting to cut through the hype.

I’ve always felt that the whole SDN thing that is going on is entirely too narrow in vision – seemingly to be focused almost entirely on switching & routing.  Most of the interesting stuff happens higher up in the advanced layer 7 load balancing where you have more insight as to what is actually traversing the wire from an application perspective.

I have no doubt that the concepts behind SDN will be/are very useful for massive scale service providers and such (though somehow they managed without it as it is trying to be defined now anyway). I don’t see it as very useful for most of the rest of organizations, unlike say virtualized storage.

I cringed badly when I first saw the term software defined storage last year, it just makes me shudder as to the amount of hype people might try to pump into that. HP seems to be using this term more and more often. I believe others are too, though I can’t bring myself to google the term.

November 13, 2012

100GbE: Still a very hefty premium

Filed under: Networking — Tags: , — Nate @ 11:22 am

UPDATED

Big Switch Networks decloaked today, and released their new OpenFlow controller, in partnership with many different networking vendors.

Arista Networks, Dell, Brocade, Juniper Networks, Brocade Communications, and Extreme Networks have all partnered with Big Switch, and their OpenFlow-enabled switches are certified to be control-freaked by Big Network Controller. Switches from IBM and HP have been tested for interoperability, but there are no formal partnerships.

All of this SDN stuff really is sort of confusing to me (it really seems like the whole software defined thing is riding on a big hype cloud). One thing that stands out to me here is that this OpenFlow stuff seems to only cover switching and routing. I don’t see any mention of things like firewalls, or more importantly – load balancers.  Maybe those folks will integrate with OpenFlow at some point in some way.

On this article A10 Networks (load balancing company) is mentioned as a partner, but running a search for either OpenFlow or BigSwitch on the A10 site reveals no results.

For me if I’m going to be moving workloads between datacenters, at least those that deal with internet connectivity, I certainly want that inbound connectivity to move to the new datacenter as well, and not incur the costs/latency of forwarding such traffic over a back end connection. The only exception being if there is a fault at the new datacenter which is severe enough to want to route internet traffic from another facility to it. I suppose at the same time the fault would likely have to block the ability of moving the workload to another (non faulty) facility.

F5 networks had a demo they put out on long distance vMotion almost three years ago. Using their WAN Optimization, their Global Traffic Managers(Global DNS), and Local Traffic managers(load balancers), it was a pretty cool setup. Of course this was ages before VMware had such a solution in house, and I believe this solution (for the niche that it serves) can cover a significantly longer distance than what you get with VMware today.

Anyway that’s not the topic of the post. At the same time I noticed Extreme announced their first 100GbE offering (per usual it looks like it won’t be available to ship for at least 6 months – they like to announce early for some strange reason). On their X-8 platform which has 1.2Tbps of throughput per line card, and up to 20Tbps (15Tbps non blocking even with a fabric failure) per chassis. I say “up to” because there are multiple fabric modules, and there are two different speeds(2.5Tbps and 5Tbps).

The card is a combo 4-port 100GbE card. They also announced a newer larger scale 12-port 40GbE line card. What struck me(still) was the cost distinction between the two:

NTE list pricing includes: 40GbE 12 port XL module at US $6,000.00 per port; 100GbE 4 port XL module at US $35,000 per port.

I think I recall hearing/reading last year that 100GbE was going for around $100,000/port, if so this would be a great discount, but still pretty crazy expensive compared to 40GbE obviously!

UPDATE – It seems my comment was lost in the spam, the lack of approval wasn’t intentional.

While I’m here let me rag on Extreme a bit here – I posted a comment on one of their blog posts (about 3 weeks ago) where they said they moved away from designing their own ASICs with the X-8 platform.

They never approved the comment.

My comment was basically asking them when their last ASIC design was – to my knowledge their last ASIC was the 4GNSS ASIC (they called it a programmable ASIC – I assume that meant more of a FPGA but who knows), that originally showed up in the Black Diamond 10808 back in 2003(I had a pair of these boxes in 2005). I believe they re-used it, perhaps refined it a bit in the following years but don’t believe any new ASICs were designed since (sure I could be wrong but they haven’t clarified). So I’d say their last ASIC design was more than a decade ago, and only now this blogger comes out and says they don’t do ASICs any more. Before that the last one I know of was their Inferno chipset, a much better name, which was present in their older platforms running on the original ExtremeWare operating system, the last such switches to be sold were in their Alpine series and the Summit 48si (I still have one of these at home but it doesn’t do much today – too loud for home use).

Anyway, shame on you for not approving my reasonable response to your post!

btw I approve all posts here, even those that try to attack me/my posts. If for some reason your post is not immediately available, contact me (see blurb on right) because your post may of been caught by the SPAM filter. I don’t go through those caught posts often(there are a lot), maybe 2-3 times a year.

October 2, 2012

Cisco drops price on Nexus vSwitch to free

Filed under: Networking,Virtualization — Tags: , , , — Nate @ 10:02 am

I saw news yesterday that Cisco dropped the price of their vSwitch to $free, they still have a premium version which has a few more features.

I’m really not all that interested in what Cisco does, but what got me thinking again is the lack of participation by other vendors in making a similar vSwitch, of integrating their stack down to the hypervisor itself.

Back in 2009, Arista Networks launched their own vSwitch (though now that I read more on it, it wasn’t a “real” vSwitch),  but you wouldn’t know that by looking at their site today, I tried a bunch of different search terms I thought they still had it, but it seems the product is dead and buried. I have not heard myself of any other manufacturers making a software vSwitch of any kind (for VMware at least). I suppose customer demand is not there.

I asked Extreme back then if they would come out with a software vSwitch, and at the time at least they said there was no plans, instead they were focusing on direct attach, a strategy at least for VMware, appears to be dead for the moment, as the manufacturer of the NICs used to make it happen is no longer making NICs(as of about 1-2 years ago). I don’t know why they have the white paper on their site still, I guess to show the concept, since you can’t build it today.

Direct attach – at least taken to it’s logical conclusion is a method to force all inter-VM switching out of the host and into the physical switches layer. I was told that this is possible with Extreme(and possibly others too) with KVM today (I don’t know the details), just not with VMware.

They do have a switch that runs in VMware, though it’s not a vSwitch, more of a demo/type thing where you can play with commands. Their switching software has run on Intel CPUs since the initial release in 2003 (and they still have switches today that use Intel CPUs), so I imagine the work involved is not herculean to make a vSwitch happen if they wanted to.

I have seen other manufacturers (Brocade at least if I remember right) that were also looking forward to direct attach as the approach to take instead of a vSwitch. I can never remember the official networking name for the direct attach technology…

With VMware’s $1.2B purchase of Nicira it seems they believe the future is not direct attach.

Myself I like the concept of switching within the host, though I have wanted to have an actual switching fabric (in hardware) to make it happen. Some day..

Off topic – but it seems the global economic cycle has now passed the peak and now for sure headed down hill? One of my friends said yesterday the economy is “complete garbage”, I see tech company after company missing or warning, layoffs abound, whether it’s massive layoffs at HP, or smaller layoffs at Juniper that was announced this morning. Meanwhile the stock market is hitting new highs quite often.

I still maintain we are in a great depression. Lots of economists try to dispute that, though if you take away the social safety nets that we did not have in the ’20s and ’30s during the last depression I am quite certain you’d see massive numbers of people lined up at soup kitchens and the like. I think the economists try to dispute it more because they fear a self fulfilling prophecy rather than their willingness to have a serious talk on the subject. Whether or not we can get out of the depression, I don’t know. We need a catalyst – last time it was WWII, at least the last two major economic expansions were bubbles, it’s been a long time since we’ve had a more normal economy. If we don’t get a catalyst then I see stagnation for another few years, perhaps a decade while we drift downwards towards a more serious collapse (something that would make 2008 look trivial by comparison).

August 13, 2012

Freakish performance with Site to Site VPN

Filed under: Networking — Tags: , — Nate @ 6:07 pm

UPDATED I’ll be the first to admit – I’m not a network engineer. I do know networking, and can do the basics of switching, static routing, load balancing firewalls etc. But it’s not my primary background. I suppose you could call me a network engineer if you base my talents off of some past network engineers I’ve worked with (which is kinda sad really).

I’ve used quite a few different VPNs over the years, all of them without any special WAN Optimization, though the last “appliance” based VPN I was directly responsible for was a VPN between two sites connected by Cisco PIXs about 4-5 years ago. Since then either my VPN experience has been limited to using OpenVPN on my own personal stuff, or relying on other dedicated network engineer(s) to manage it.

In general, my experience has told me that site to site VPN performance generally equates to internet performance, you may get some benefit with the TCP compression and stuff, but without specialized WAN Optimization / protocol optimization / caching  etc – throughput is limited by latency.

I conferred with a colleague on this and his experience was similar – he expects site to site VPN performance to about match that of Internet site to site performance when no fancy WAN Opt is in use.

So imagine my surprise when a few weeks ago I hooked up a site to site VPN in between Atlanta and Amsterdam (~95ms of latency between the two), and I get 10-30 fold improvement in throughput over the VPN than over the internet.

  • Internet performance = ~600-700 Kilobytes/second sustained using HTTPS
  • Site to site VPN performance = ~5 Megabytes/second using NFS, ~12 Megabytes/second sustained using SCP, and 20 Megabytes/second sustained using HTTP

The links on each end of the connection are 1Gbps, tier 1 ISP on the Atlanta side, I would guesstimate tier 2 ISP (with tons of peering connections) on the Amsterdam side.

It’s possible the performance could be even higher, I noticed that speed continued to increase the longer the transfer was running. My initial tests were limited to ~700MB files – 46.6 seconds for a 697MB file with SCP. Towards the end of the SCP it was running at ~17MB/sec (at the beginning only 2MB/sec).

A network engineer who I believe is probably quite a bit better than me told me

By my calculation – the max for a non-jittered 95ms connection is about 690KB/s so it looks like you already have a clean socket.
Keep in mind that bandwidth does not matter at this point since latency dictates the round-trip-time.

I don’t know what sort of calculation was done, but the throughput matches what I see on the raw Internet.

These are all single threaded transfers. Real basic tests. In all cases the files being copied are highly compressed(in my main test case the 697MB file uncompresses to 14GB), and in the case of the SCP test the data stream is encrypted as well. I’ve done multiple tests over several weeks and the data is consistent.

It really blew my mind, even with fancy WAN optimization I did not expect this sort of performance using something like SCP. Obviously they are doing some really good TCP windowing and other optimizations, despite there still being ~95ms of latency between the two sites within the VPN itself the throughput is just amazing.

I opened a support ticket to try to get support to explain to me what more was going on but they couldn’t answer the question. They said because there is no hops in the VPN it’s faster. There may be no hops but there’s still 95ms of latency between the systems even within the VPN.

I mean just a few years ago I wrote a fancy distributed multi threaded file replication system for a company I was at to try to get around the limits of throughput between our regional sites because of the latency. I could of saved myself a bunch of work had we known at the time (and we had a dedicated network engineer at the time) that this sort of performance was possible without really high end gear or specialized protocols (I was using rsync over HPN-SSH).  I remember trying to setup OpenVPN between two sites at the company at that company for a while to test throughput there, and performance was really terrible(much worse than the existing Cisco VPN that we had on the same connection). For a while we had Cisco PIX or ASAs I don’t recall which but had a 100Mbit limit on throughput, we tapped them out pretty quickly and had to move on to something faster.

I ran a similar test between Atlanta and Los Angeles, where the VPN in Los Angeles was a Cisco ASA (vs the other sites are all Sonic Wall), and the performance was high there too – I’m not sure what the link speed is in Los Angeles but throughput was around 8 Megabytes/second for a compressed/encrypted data stream, easily 8-10x faster than over the raw Internet. I tested another VPN link between a pair of Cisco firewalls and their performance was the same as the raw Internet (15ms of latency between the two), I think the link was saturated in those tests(not my link so I couldn’t check it directly at the time).

I’m sure if I dug into the raw tcp packets the secrets would be there – but really even after doing all the networking stuff I have been doing for the past decade+ I still can’t make heads or tails of 90% of the stuff that is in a packet (I haven’t tried to either, hasn’t been a priority of mine, not something that really interests me).

But sustaining 10+ megabytes/second over a 95 millisecond link over 9 internet routers on a highly compressed and encrypted data stream without any special WAN optimization package is just amazing to me.

Maybe this is common technology now, I don’t know, I mean I’d sort of expect marketing information to advertise this kind of thing, if you can get 10-30x faster throughput over a VPN without high end WAN optimization  vs regular internet I’d be really interested in that technology. If you’ve seen similar massive increases in performance w/o special WAN Optimization on a site to site VPN I’d be interested to hear about it.

In this particular case, the products I’m using are Sonic Wall NSA3500s. The only special feature licensed is high availability, other than that it’s a basic pair of units on each end of the connection. (WAN Optimization is a software option but is NOT licensed). These are my first Sonic Walls, I had some friends trying to push me to use Juniper (SRX I think) or in one case Palo Alto networks, but Juniper is far too complicated for my needs, and Palo Alto networks is not suitable for Site to Site VPNs with their cost structure (the quote I had for 4 devices was something like $60k). So I researched a few other players and met with Sonic Wall about a year ago and was satisfied with their pitch and verified some of their claims with some other folks, and settled on them. So far it’s been a good experience, very easy to manage, and I’m still just shocked by this throughput. I really had terrible experiences managing those Cisco PIXs a few years back by contrast. OpenVPN is a real pain as well (once it’s up and going it’s alright, configuring and troubleshooting are a bitch).

Sonic Wall claimed they were the only ones (2nd to Palo Alto Networks) who had true deep packet inspection in their firewalls (vs having other devices do the work). That claim interested me, as I am not well versed in the space. I bounced the claim off of a friend that I trust (who knows Palo Alto inside and out) and said it was probably true, Palo Alto’s technology is better (less false positives) but nobody else offers that tech.  Not that I need that tech, this is for a VPN – but it was nice to know that we got the option to use it in the future. Sonic Wall’s claims go beyond that as well saying they are better than Palo Alto in some cases due to size limitations on Palo Alto (not sure if that is still true or not).

Going far beyond simple stateful inspection, the Dell®SonicWALL® Reassembly-Free Deep Packet Inspection™ (RFDPI) engine scans against multiple application types and protocols to ensure your network is protected from internal and external attacks as well as application vulnerabilities. Unlike other scanning engines, the RFDPI engine is not limited by file size or the amount of concurrent traffic it can scan, making our solutions second to none.

SonicWall Architecture - looked neat, adds some needed color to this page

The packet capture ability of the devices is really nice too, makes it very easy to troubleshoot connections. In the past I recall on Cisco devices at least I had to put the device in some sort of debug mode and it would spew stuff to the console(my Cisco experience is not current of course). With these Sonic Walls I can setup filters really easily to capture packets and it shows them in a nice UI and I can export the data to wireshark or plain text if needed.

My main complaint on these Sonic Walls I guess is they don’t support link aggregation(some other models do though). Not that I need it for performance I wanted it for reliability so that if a switch fails the Sonic wall can stay connected and not trigger a fail over there as well, but as-is I had to configure them so each Sonic wall is logically connected to a single switch (though they have physical connections to both – I learned of the limitation after I wired them up). Not that failures happen often of course but it’s too bad this isn’t supported in this model (which has 6x1Gbps ports on it).

The ONLY thing I’ve done on these Sonic Walls is VPN (site to site mainly, but have done some SSL VPN stuff too), so beyond that stuff I don’t know how well they work. Sonic wall traditionally has had a “SOHO” feel to it though it seems in recent years they have tried to shrug this off, with their high end reaching as high as 240 Gbps in an active-active cluster. Nothing to sneeze at.

UPDATE – I ran another test, and this time I captured a sample of the CPU usage on the Sonic Wall as well as the raw internet throughput as reported by my router, I mean switch, yeah switch.

2,784MB gzip’d file copied in 3 minutes 35 seconds using SCP. If my math is right that comes to an average of roughly 12.94 Megabytes/second ? This is for a single stream, basic file copy.

The firewall has a quad core 550 MHz Mips64 Octeon Processor (I assume it’s quad core and not four individual processors). CPU usage snapshot here:

SonicWall CPU usage snapshot across cores during big file xfer

The highest I saw it was CPU core #1 going to about 45% usage, with core #2 about 35% maybe, and core #3 maybe around 20%, with core #0 being idle (maybe that is reserved for management? given it’s low usage during the test.. not sure).

Raw network throughput topped out at 135.6 Megabits/second (well some of that was other traffic, so wager 130 Megabits for the VPN itself).

Raw internet throughput for VPN file transfer

Apparently this post found it’s way to Dell themselves and they were pretty happy to see it. I’m sorry I just can’t get over how bitchin’ fast this thing is! I’d love for someone at Dell/SonicWALL who knows more than the tier 1 support person I talked with a few weeks ago to explain it better.

July 27, 2012

FCoE: stillborn 3 years later

Filed under: Networking — Tags: — Nate @ 9:44 am

At least the hype for the most part has died off, as the market has not really transitioned much over to FCoE since it’s launch a few years ago. I mentioned it last year, and griped about it in one of my early posts in 2009 around when Cisco was launching their UCS, along with NetApp both proclaiming FCoE was going to take over.

Brocade has been saying for some time that FCoE adoption was lacking, a few short months ago Emulex came out and said about the same, and more recently Qlogic chiming in with another me too story.

FCoE – the emulated version of Fibre Channel running over Ethernet – is not exactly selling like hot cakes and is not likely to do so anytime soon, so all that FCoE-flavoured Ethernet development is not paying off yet.

More and more switches out there are supporting the Data Center Bridging protocols but those die hard Fibre Channel users aren’t showing much interest in it. I imagine the problem is more political than anything else at many larger organizations. The storage group doesn’t trust the networking group and would rather have control over their own storage network, and not share anything with the network group. I’ve talked to several folks over recent years where storage divisions won’t even consider something that is exclusively iSCSI for example for the company because it means the networking folks have to get involved and that’s not acceptable. Myself, I have had a rash of issues with certain Qlogic 10GbE network cards over the past 7 months which makes me really glad I’m not reliant on ethernet-based storage (there is some of it but all of the critical stuff is good ‘ol Fibre channel – on entirely Qlogic infrastructure again). The rash of issues finally ressurected a bad set of memories I had trying to troubleshoot network issues on some Broadcom NICs a few years ago with regards to something buggy called MSI-X. It took about six months to track that problem down, the symptoms were just so bizarre. My current issues with 10GbE NICs aren’t all that critical because of the level of redundancy that I have and the fact that storage is run over regular ‘ol FC.

I know Qlogic is not alone in their issues with 10GbE, a little company by the name of Clearwire in Seattle I know had what amounted to something like a 14 hour outage a year or two ago on their Cisco UCS platform because of bugs in the Cisco stuff that they had(I think it was bugs around link flapping or something). I know others have had issues too, it sort of surprises me how long 10GbE has been around and we still seem to have quite a few issues with it, at least on the HBA side.

iSCSI has had it’s issues too over the years, at least iSCSI in the HBAs, I was talking to one storage company late last year who has an iSCSI-only product and they said how iSCSI is ready for prime time, but after further discussion they clarified well you really only should use it with offloading NIC X or Y or software stack Z. iSCSI was a weak point for a long time on the 3PAR platform, they’ve addressed it to some extent on the new V-series, but I wouldn’t be surprised if they still don’t support anything other than pure software initiators.

TCP is very forgiving to networking issues, storage of course is not. In the current world of virtualization with people consolidating things on fewer, larger systems, the added cost of FC really isn’t that much. I wouldn’t be slapping FC cards in swaths of $3-5k servers, most servers that run VMs have gobs of memory which of course drives the price quite a bit higher than that.

Data center bridging really does nothing when your NIC decides to stop forwarding jumbo frame packets, or when the link starts flapping, or when the firmware crashes, or if the ASIC overheats. The amount of time it often takes for software to detect a problem with the link and fail over to a backup link alone is big enough to cause major issues with storage if it’s a regular occurrence. All of the networks I’ve worked on at least in the past decade or so always have operated at a tiny fraction of their capacity, the bottlenecks are typically things like firewalls between zones (and whenever possible I prefer to rely on switch ACLs to handle that).

June 1, 2012

London Internet Exchange downed by Loop

Filed under: Networking — Tags: , — Nate @ 8:08 am

This probably doesn’t happen very often at these big internet exchanges but found the news sort of interesting.

I had known  for a few years that the LINX was a dual vendor environment, one side was Foundry/Brocade the other was Extreme, they are one of the few places that go out of their way to advertise what they use. I’m sure it gets them a better discount :)  It seems the LINX replaced the Foundry/Brocade with Juniper at some point since I last checked(less than a year ago). Though their site still mentions usage of EAPS (Extreme’s ring protocol) and MRP (Foundry’s ring protocol). I assume Juniper has not adopted MRP, though they probably have something similar. Looking at the design of the Juniper LAN vs the Extreme LAN (and the Brocade LAN before Juniper), the Juniper one looks a lot more complicated.  I wonder if they are using Juniper’s new protocol(s) to manage it? Qfabric I think it’s called? It seems LINX still has some Brocade in one of their edge networks.

Apparently the Juniper side is what suffered the loop –

“Linx is trying to determine where the loop originated and we are also addressing why the protection on Juniper’s LAN didn’t work.”

I wanted to point out again, since it’s been a while since I covered it (and only then was it buried in the post, wasn’t part of the title), that Extreme has a protocol (that as far as I know is unique – let me know if there is another vendor or protocol that is similar – note of course I am not referring to anything like STP) that can detect and recover(in some cases) loops automatically. I’ve only used it in detect mode to-date. I was also telling someone about this protocol who was learning the ropes on Extreme gear after coming from a Juniper background so thought I would mention it again.

The protocol is the Extreme Loop Recovery Protocol (ELRP). The documentation does a better job at explaining it than I can.

The Extreme Loop Recovery Protocol (ELRP) is used to detect network loops in a Layer 2 network. A switch running ELRP transmits multicast packets with a special MAC destination address out of some or all of the ports belonging to a VLAN. All of the other switches in the network treat this packet as a regular, multicast packet and flood it to all of the ports belonging to the VLAN.

When the packets transmitted by a switch are received back by that switch, this indicates a loop in the Layer 2 network. After a loop is detected through ELRP, different actions can be taken such as blocking certain ports to prevent loop or logging a message to system log. The action taken is largely dependent on the protocol using ELRP to detect loops in the network.

The design seems simple enough to me, I’m not sure why others haven’t come up with something similar (or if they have let me know!)

It’s rare to have a loop in a data center environment but I do remember a couple loops I came across in an office environment many years ago that ELRP helped trace down. I’m not sure what method one would use to trace down a loop without something like ELRP – perhaps just looking at port stats and trying to determine where the bulk of the traffic is and disabling ports or unplugging cables until it stops.

[Tangent]

I remember an outage one company I was at took one time to upgrade some of our older 10/100 3COM switches to gigabit Extreme switches. It was a rushed migration, I was working with the network engineer that we had, the switches were installed in a centralized location with tons of cables, none of which were labeled. So I guess it comes as little surprised while during the migration someone (probably me) happened to plug the same cable back into one of the switches causing a loop. It took a few minutes to track down, at one point our boss was saying get ready to roll back. The network engineer and I looked at each other and laughed there was no roll back, well not one that was going to be smooth it would of taken another hour of downtime to remove the Extreme switches and re-install the 3COM and re-cable stuff. Fortunately I found the loop. This was about a year or so before I was aware of the existence of ELRP. We discovered the loop mainly after all the switch lights started blinking in sequence, normally a bad thing. Then users reported they lost all connectivity.

One of my friends who is another network engineer told me a story when I was in Atlanta earlier in the year about a customer who was a university or something. They had major network performance problems but could not track them down. These problems had been going on for literally months. My friend went out as a consultant and they brought him into their server/network room and his jaw dropped, they had probably 2 dozen switches and ALL of them were blinking in sequence. He knew what the problem was right away and informed the customer. But the customer was adamant that the lights were supposed to blink that way and the problem was elsewhere(not kidding here). The customer had other issues like running overlapping networks on the same VLAN etc. My friend had a lot of suggestions for the customer but the customer felt insulted by him telling them their network had so many problems so they kicked him out and told the company not to send him back. A couple months later the customer went through some sort of audit process and failed miserably and grudgingly asked (begged) to get my friend back since he was the only one they knew that seemed to know what he was doing. He went back and fixed the network I assume (I forgot that last bit of the story).

[End Tangent]

ELRP can detect a loop immediately and give a very informative system log entry as to the port(s) the loop is occurring on so you can take action. It works best of course if it is running on all ports, so you can pinpoint down to the edge port itself. But if for some reason the edge is not an Extreme switch at least you can get it at a higher layer and can isolate it further that way.

You can either leave it running periodically every X seconds it will send a probe out, or you can run it on demand for a real time assessment. There is also integration with ESRP which I wrote about a while ago, although I don’t use the integrated mode (see the original post as to how that works and why). I normally leave it running sending requests out at least say once every 30 seconds.

LINX had another outage (which was the last time I looked at their vendor stats) a couple of years ago (this one affected me since my company had gear hosted in London at the time and our monitors were tripped by this event), though no mention of which LAN the outage occurred on. One user wrote

It wasn’t a port upgrade, a member’s session was being turned up and due to miscommunication between the member’s engineer and the LINX engineer a loop was somehow introduced in to the network which caused a broadcast storm and a switches CPU to max out cue packet loss and dropped BGP sessions.

As a cause to the outage that occurred two years ago. So I guess it was another loop! For all I know LINX is not running ELRP in their environment either.

It’s not exactly advertised by Extreme if you talk to them, it’s one of those things that’s buried in the docs. Same goes for ESRP. Two really useful protocols that Extreme almost never mentions, two things that make them stand out in the industry and they don’t talk about them. I’m told that one reason could be is they are proprietary(vs EAPS which is not and Extreme touts EAPS a lot but EAPS is layer 2 only!), though as I have mentioned in the past ESRP doesn’t require any software at the edge to function and can support managed and unmanaged devices. So you don’t require an Extreme-only network to run (just at the core, like most any other protocol). ELRP is even less stringent – can be run on any Extreme switch, no interoperability issues. If there were open variants of the protocols that’d be better of course, but again, these seem to be unique in the industry so tout what you got! Customers don’t have to use them if they don’t want to and it can make a network administrator’s life vastly simpler in many cases by leveraging what you have available to you. Good luck integrating Extreme or Cisco or Brocade into Juniper’s Qfabric ? Or into Force10’s distributed core setup ? There are interoperability issues abound with most of the systems out there.

Older Posts »

Powered by WordPress