TechOpsGuys.com Diggin' technology every day

November 28, 2011

Info-tech report on data center switching solutions

Filed under: Networking — Tags: — Nate @ 10:28 pm

I came across this report on Extreme’s site which seems to be from somewhat of an “independent 3rd party”, but I’ve not heard of them so I can’t vouch for them.

I’d like to consider myself at least somewhat up to date on what is out there so when things like this come out I do find it interesting to see what they say.

The thing that stands out to me the most: Arista Networks has only 75 employees ?!? Wow, they’ve been able to do all of that work with only 75 employees? Really? Good job.. that is very surprising to me, I mean most of the companies I have worked at have had more than 75 employees and they’ve accomplished (in my opinion) a fraction of what Arista seems to have, at least from a technology stand point (revenue wise is a different story assuming again the report is accurate).

The thing that made me scratch my head the most: Cisco allows you to run virtual machines on their top of rack switches? Huh? Sounds like EMC and them wanting you to run VMs on their VMAX controllers? I recall at one point Citrix and Arista teamed up to allow some sort of VM to run Netscaler embedded in the Arista switches, though never heard of anyone using it and never heard Citrix promoting it over their own stuff. Seemed like an interesting concept, though no real advantage to doing it I don’t think (main advantage I can think of is non blocking access to the switch fabric which really isn’t a factor with lower end load balancers since they are CPU bound not network bound).

The report seems to take a hypothetical situation where a fairly large organization is upgrading their global network and then went to each of the vendors and asked for a proposal. They leave out what each of the solutions was, specifically which is dissapointing.

They said HP was good because it was cheap, which is pretty much what I’ve heard in the field, it seems nobody that is serious runs HP Procurve.

They reported that Juniper and Brocade were the most “affordable” (having Juniper and affordable together makes no sense), and Arista and Force10 being least affordable (which seems backwards too – they are not clear on what they used to judge costs, because I can’t imagine a Force10 solution costing more than a Juniper one).

They placed some value on line cards that offered both copper and fiber at the same time, which again doesn’t make a lot of sense to me since you can get copper modules to plug into SFP/SFP+ slots fairly easily. The ability to “Run VMs on your switch” also seemed iffy at best, they say you can run “WAN optimization” VMs on the switches, which for a report titled “Data center networking” really should be a non issue as far as features go.

The report predicts Brocade will suffer quite a bit since Dell now has Force10. How Brocade doesn’t have as competitive products as they otherwise could have.

They tout Juniper’s ability to have multiple power supplies, switch fabrics, routing modules as if it was unique to Juniper, which makes no sense to me either. They do call out Juniper for saying their 640-port 10GbE switch is line rate only to 128 ports.

They believe Force10 will be forced into developing lower end solutions to fill out Dell’s portfolio rather than staying competitive on the high end, time will tell.

Avaya? Why bother? They say you should consider them if you’ve previously used Nortel stuff.

They did include their sample scenario that they sent to the vendors and asked for solutions for. I really would of liked to have seen the proposals that came back.

A four-site organization with 7850 employees located at a Canadian head office facility, and three branch offices located in the US, Europe, and Canada. The IT department consists of 100 FTE, and are located primarily at the Canadian head office, with a small proportion of IT staff and systems located at the branch offices.

The organization is looking at completing a data center refurbish/refresh:

The organization has 1000 servers, 50% of which are virtualized (500 physical). The data center currently contains 40 racks with end-of-row switches. None of the switching/routing layers have any redundancy/high availability built in, leaving many potential single points of failure in the network (looking for 30% H/A).

A requirements gathering process has yielded the need for:

  • A redundant core network, with capacity for 120 x 10Gbps SFP+ ports
  • Redundant top of rack switches, with capacity for 48 x 1Gbps ports in each rack
  • 1 ready spare ToR switch and 1 ready spare 10Gps chassis card
  • 8x5xNBD support & maintenance
  • Nameplate data – power consumption – watts/amps
  • 30% of the servers to be highly available

It is unclear how redundant they expect the network to be, would a single chassis with redundant fabrics and power supplies be enough or would you want two? They are also not clear as to what capabilities their ToR switches need other than the implied 10Gbps uplinks.

If I were building this network with Extreme gear I would start out with two pairs of stacked X670Vs at the core(each stack having 96x10GbE), each stack would be connected by 2x40GbE connections with passive copper cabling. The two stacks would be linked together with 4x40GbE connections with passive copper cabling, running (of course) ESRP as the fault tolerance protocol of choice between the two. This would provide 192x10GbE ports between the two stacks, with half being active half being passive.

Another, simpler approach would be to just stack three of the X670V switches together for 168x10GbE active-active ports. Though you know I’m not a big fan of stacking(any more than I am running off a single chassis), if I am connecting 1000 servers I want a higher degree of fault tolerance.

Now if you really could not tolerate a active/passive network, if you really needed that extra throughput then you can use M-LAG to go active-active at layer 2, but I wouldn’t do that myself unless you were really sure you needed that ability. I prefer the reduced complexity with active/passive.

As for the edge switches, they call for redundant 48 port 1GbE switches. Again they are not clear as to their real requirements, but what I would do (what I’ve done in the past) is two stand alone 48-port 1GbE switches, each with 2x10GbE (Summit X460) or 4x10GbE(Summit X480) connections to the core. These edge switches would NOT be stacked, they would be stand alone devices. You could go lower cost with Summit X450e, or even Summit X350 though I would not go with the X350 for this sort of data center configuration. Again I assume your using these switches in an active-passive way for the most part(as in 1 server is using 1 switch at any given time), though if you needed a single server to utilize both switches then you could go the stacking approach at the edge, all depends on what your own needs are – which is why I would of liked to have seen more detail in the report. Or you could do M-LAG at the edge as well, but ESRP’s ability to eliminate loops is hindered if you link the edge switches together since there is a path in the network that ESRP cannot deal with directly (see this post with more in depth info on the how, what, and why for ESRP).

I would NOT go with a Black Diamond solution (or any chassis-based approach) unless cost was really not an issue at all. Despite this example organization having 1,000 servers it’s still a small network they propose building, and the above approach would scale seamlessly to say 4 times that number non disruptively providing sub second layer 2 and layer 3 fail over. It is also seamlessly upgradeable to a chassis approach with zero downtime (well sub second), should the organization needs grow beyond 4,000 hosts. The number of organizations in the world that have more than 4,000 hosts I think is pretty small in the grand scheme of things. If I had to take a stab at a guess I would say easily less than 10% maybe less than 5%.

So in all, an interesting report, not very consistent in their analysis, lacked some more detail that would of been nice to see, but still interesting to see someone else’s thought patterns.

November 22, 2011

Traveling Geek with remote monitoring

Filed under: Random Thought — Tags: — Nate @ 9:08 pm

[warning: non technical post, read at your own risk]

I told one of my friends this story yesterday, wow was it only yesterday, and he loved it so much he insisted that I write about it.

Maybe one and a half years ago I bought one of these, an IP-connected camera, which has an embedded computer, a RJ45 port and a wireless connection. So it’s completely self contained and does not rely on any other software or computer connection to function. The main purpose was to spy on my cats when I was away on trips. I let the camera sit for about a year, basically until I moved to California since I was too lazy to set it up.

One of my cats captured by motion activity

On Sunday morning I went through the process of configuring my firewall so I could access the camera remotely, I had originally set it up so I could access it through an Apache proxy but turns out that didn’t work and I had to rely entirely on port forwarding. So I set that up, routing the camera connection from my colo server through my site to site VPN to my home cable modem where the camera lives. I ended up having to connect to my win2k3 server on my colo internal network(which has access to my home internal network) in order to troubleshoot the connectivity issues from outside my network since I had never tried to access this camera from remote before. I want to set up OpenVPN on my laptop so I can just connect directly to my colo server and basically sit on my internal network but haven’t gotten round to doing that yet.

I fired up my browser and looked around — no cats.  I had previously configured the camera so that if it detected motion it would start taking pictures and upload them to my server, but disabled the functionality for a while since it resulted in a large number of false positives, I think the changes in the light level triggered the motion sensors.

Anyways, I pointed the camera at my cat tree, which the cats love to play in on Saturday afternoon and turned on motion detection.

On Monday morning(doesn’t seem like yesterday since I was up pretty much all night) I checked and there was no activity, which I thought was strange. Maybe the cats were dead? Not likely but very unusual for them not to spend any time on the cat tree during the night.

I panned the camera around the room again, no activity, no signs. The lights were off so it was dark, though the camera has limited night vision capabilities so I could still see some stuff.

So I figured I need to get their attention some how. So I looked around for some cat noises, and came across this which looked promising. So I uploaded it to my system at home, cranked up the volume to the max, and played it.

They came out, very slowly. I’m sure they were freaked out after having nothing but silence for two days all of a sudden cat noises coming from my computer speakers. They were very cautious and I could not see their bodies on the camera but saw their eyes glowing. One of the cats came forward and looked around. When I attempted this again a few minutes later they did not respond quite as well I suppose maybe it was too loud they were more scared and stayed further away.

So there you have it, my semi hi-tech way of keeping tabs on my cats while I’m out. It’s not formalized yet but it does the job. I am considering getting another couple of cameras to put in other areas.

The biggest downside to the camera is I don’t have a way to view live video from my tablet or phone since it relies on Java.

My trip and thoughts about moving to Cali

As for my trip, it was a good one, I did more of many things in the 4 days and 3 nights I was there than I have done in 4 months in California. I hit pretty much all of the venues I wanted to go to, and met up with most of the people that I wanted to see. Not as many folks showed up to party at Cowgirls as I was expecting. I’m constantly amazed as to the excuses people dig up to not go out and party a lot, whether at Cowgirls or elsewhere. They may be fine to go out for an hour or so after work, but if it’s 9-10PM on any night all of a sudden it gets real complicated.

I was so pumped up about going that I managed to stay up for 42 hours straight without any sleep whatsoever(Thurs morning -> Sat Morning). I wasn’t even that tired when I went to bed but I did fall asleep very fast when the time came. My drunk friend who crashed at my hotel on Friday night tried to wake me up at one point but could not.

I think the closest I’ve come to 42 hours was probably in the range of 34-35 hours before and that was being on the verge of passing out for a good 4-5 hours before that.

While I do miss most of the people that I know so well in the area, as well as my places to hang out, from a career perspective at this point it was a good move to leave Seattle for now anyways. I have friends that are still trying to recruit me back up there, and so far they haven’t come up with anything worthwhile. The economy in general is significantly weaker than it was even a year ago. Most of the tech companies are really not doing well or not interesting places to work for.  I interviewed with at least a dozen before I left and I was excited about exactly none of them. Some of them sounded cool initially, but then I learned more about what they did, how they did it, and in some cases inside information as to who is there, and in every case it was a turn off. Nothing interesting.

Speaking of companies, some folks from HP came out to visit today, one of them had a local publication of some sort, I didn’t catch the name but he asked me if I knew who they rated as the fastest growing company in the Bay Area? (I assume that was the scope of the article), I really had no idea. Turns out I had never heard of the company. But the SECOND fastest growing company is the company I am at! Woohoo!  But if I had not known people who worked there I would of never have heard of them either. Of all the companies I have worked for the one I am at now has the most name recognition at least in casual conversations with folks I knew in Seattle, I was shocked when I mentioned the name some folks would instantly know who it was, even though in the grand scheme of things I think it’s still a small company, I guess it is having a big impact among the target audience.

Looking more I think this publication was the San Fransisco Business Times, with the article being here, to get the real stuff you need to be a subscriber though. I saw some PDFs of previous versions of the list and the format was identical, and the article shows the company I’m at as #2, though it doesn’t talk about the revenues and stuff like the full article does.

My Car

Another minor update about my car, a large rock managed to hit my windshield just before I got to my old stomping grounds in Bellevue, WA.. of the dozen of so trips between Cali and WA over the years it had to happen to my new car. I also blew out the rear speakers on my stereo. I had Car Toys fix them, though either they fixed only one, or one of them blew out again between Saturday and this morning (I didn’t test till this morning I assumed they worked and really I can’t tell if the volume is cranked up unless I specifically fade the volume to the rear to test).

All in all I drove roughly 37 hours 15 minutes, averaging 52 MPH for a total of 1,954 miles over the span of four days. I think I averaged in the neighborhood of around 21 MPG, which was lower than I was expecting.

I got significant usage out of the all wheel drive on both the trip up and down as Oregon had some of the worst rain storms I have ever driven through. It was quite scary at times even with the AWD in the middle of the night in such a storm, especially driving by big trucks which I normally zoomed past them at around 80 just to get past them quicker. Last night the rain must’ve lasted a good 400 miles from somewhere in WA to about Redding, CA. Took a lot of concentration to maintain control, very mentally draining. I ended up sleeping at a rest stop for two hours north of Redding in the pounding rain because I just couldn’t go on any more. Managed to get in at around 7AM this morning, in time to meet with HP at 11(from which I learned a bunch of cool stuff that I can’t talk about!). So I was on the road for about 16 hours. If I hadn’t been planning on this meeting for weeks I would of just taken a longer break and came back to work tomorrow.

The traction wasn’t as good as I was expecting, I have never driven an AWD car before so I have nothing to compare it with. The one thing I did notice, is my car has a LCD where it shows in real time where the power is being applied to the wheels. If I even touch the accelerator it clearly shows more power going to all wheels, and if I corner real hard it shows power going away from one wheel and more to the others. But in cruise control I never saw the status change. Even if I accelerated from 60 to 75 MPH using nothing but cruise control the indicator never showed more power going to the wheels. I asked Nissan about that when I got to WA and either they didn’t know what they were talking about, or I wasn’t asking the question right but they basically said “trust it, it’s working”. I will talk to the local Nissan place here again when I get my oil change (I just got an oil change last Friday, and I think I will be up for another this Saturday). I really couldn’t tell if I was completely hydroplaning at some points or if only one or two wheels were and the AWD was correcting very quickly for it. It certainly felt like I had better traction when I was not in cruise control and the AWD indicator showed more power going to the wheels.

Maybe traction tires would be a good fit, I’ve been wanting to get those just for better grip, even if they do wear out faster.

One thing is certain though — I think four months between trips might be too long.

It looks as if my trip to Atlanta was pushed out a couple of weeks to the week of the 19th. I imagine I will fly in on the 17th, do some work on the 18th-20th and leave on 21st or 22nd.

November 15, 2011

LSI quietly smothers Onstor

Filed under: Storage — Nate @ 8:35 pm

About a year ago I was in the market for a new NAS box to hook up to my 3PAR T400, something to eventually replace the Exanet cluster that was hooked up to it since Exanet as a company went bust.

There wasn’t many options left, Onstor had been bought by LSI, I really couldn’t find anything on the Ibrix offering from HP at the time (at least for a SAN-attached Ibrix rather than a scale-out Ibrix), and then there was of course NetApp V-Series. I could not find any trace of Polyserve which HP acquired as well, other than something related to SQL server.

Old 3PAR graphic that I dug up from the Internet Archive

3PAR was suggesting I go with Onstor(this was, of course before the HP/Dell bidding war), claiming they still had a good relationship with Onstor through LSI. I think it was less about the partnership and more about NetApp using the V-series to get their foot in the door and then try to replace the back end disk with their own, a situation understandably 3PAR (or any other competition) doesn’t like to be in.

My VAR on the other hand had another story to tell, after trying to reach out to LSI/Onstor they determined that Onstor was basically on their death bed, with only a single reseller in the country authorized to sell the boxes, and it seemed like there was maybe a half dozen employees left working on the product.

So, I went with NetApp, then promptly left the company and left things in the hands of the rest of the team(there’s been greater than 100% turnover since I left both in the team and in the management).

One of my other friends who used to work for Exanet was suggesting to me that LSI bought Onstor with the possible intention of integrating the NAS technology into their existing storage systems, to be able to offer a converged storage option to the customers, and that the stand alone gateway would probably be going away.

Another product I had my eyes on at the time and 3PAR was working hard with to integrate was the Symantec Filestore product. I was looking forward to using it, other companies were looking to Filestore to replace their Exanet clusters as well. Though I got word through unofficial channels that Symantec planned to kill the software-only version and go the appliance route. It took longer than I was expecting but they finally did it, I was on their site recently and noticed that the only way to get it now is with integrated storage from their Chinese partner.

I kept tabs on Onstor now and then, wondering if it would come back to life in some form, the current state of the product at least from a SAN connectivity perspective seemed to be very poor – as in you couldn’t do things like live software upgrades on a cluster in most cases, the system had to have a full outage(in a lot of cases anyways). But no obvious updates ever came.

Then LSI sold their high end storage division to NetApp. I suppose that was probably the end of the road for Onstor.

So tonight, I was hitting some random sites and decided to check in on Onstor again, only to find most traces of the product erased from LSI’s site.

The only things I ever really heard about Onstor was how the likes of BlueArc and Exanet were replacing Onstor clusters, I talked to one service provider who had an Onstor system (I think connected to 3PAR too), I talked with them briefly while I was thinking about what NAS gateway to move to about a year ago and they seemed fairly content with it, no major complaints. Though it seemed like if they were to buy something new (at that time) they probably wouldn’t buy Onstor due to the uncertainty around it.

It seemed to be an interesting design – using dual processor quad core MIPS CPUs of all things.

RIP Onstor, another one bites the dust.

Hopefully LSI doesn’t do the same to 3ware. I always wondered why 3ware technology was never integrated(as far as I know anyways) into server motherboards, even after LSI acquired them, given that a lot of integrated RAID controllers (Dell etc) are LSI. I think for the most part the 3ware technology is better (if not why did they get acquired and continue to develop products?). I’ve been a 3ware user for what seems like 12 years now, and really have no complaints.

I really hope the HP X9000 NAS gateway works out, the entry level pricing for it as-is seems quite high to me though.

Dell’s distributed core

Filed under: Networking — Tags: — Nate @ 9:59 am

Dell’s Force10 unit must be feeling the heat from the competition. I came across this report which the industry body Tolly did on behalf of Dell/Force10.

Normally I think Tolly reports are halfway decent although they are usually heavily biased towards the sponsor (not surprisingly). This one though felt light on details. It felt like they rushed this to market.

Basically what Force10 is talking about is a distributed core architecture with their 32-port 40GbE Z9000 switches as what they call the spine(though sometimes they are used as the leaf), and their 48-port 10 GbE S4810 switches as what they call the leaf (though sometimes they are used as the spine).

They present 3 design options:

Force10 Distributed Core Design

I find three things interesting about these options they propose:

  • The minimum node count for spine is 4 nodes
  • They don’t propose an entirely non blocking fabric until you get to “large”
  • The “large” design is composed entirely of Z9000s, yet they keep the same spine/leaf configuration, whats keeping them from being entirely spine?

The distributed design is very interesting, though it would be a conceptual hurdle I’d have a hard time getting over if I was in the market for this sort of setup. It’s nothing against Force10 specifically I just feel safer with a less complex design (I mentioned before I’m not a fan of stacking for this same reason), less things talking to each other in such a tightly integrated fashion.

That aside though a couple other issues I have with the report is while they do provide the configuration of the switches (that IOS-like interface makes me want to stab my eyes with an ice pick) – I’m by no means familiar with Force10 configuration and they don’t talk about how the devices are managed. Are the spine switches all stacked together? Are the spine and leaf switches stacked together? Are they using something along the lines of Brocade’s VCS technology? Are the devices managed independently and they are relying on other protocols like MLAG? The web site mentions using TRILL at layer 2, which would be similar to Brocade.

The other issue I have with the report is the lack of power information, specifically would be interested (slightly, in the grand scheme of things I really don’t think this matters all that much) in the power per usable port (ports that aren’t being used for up links or cross connects). They do rightly point out that power usage can vary depending on the workload and so it would be nice to get power usage based on the same workload. Though conversely it may not matters as much, looking at the specs for the Extreme X670V (48x10GbE + 4x40GbE) says there is only 8 watts of difference between (that particular switch) 30% traffic load and 100% traffic load, seems like a trivial amount.

Extreme Networks X670V Power Usage

As far as I know the Force10 S4810 switch uses the same Broadcom chipset as the X670V.

On their web site they have a nifty little calculator where you input your switch fabric capacity and it spits out power/space/unit numbers. The numbers there don’t sound as impressive:

  • 10Tbps fabric = 9.6kW / 12 systems / 24RU
  • 15Tbps fabric = 14.4kW / 18 systems / 36RU
  • 20Tbps fabric = 19.2kW / 24 systems / 48RU

The aforementioned many times Black Diamond X-Series comes in at somewhere around 4kW (well if you want to be really conservative you could say 6.1kW assuming 8.1W/port which their report was likely high considering system configuration) and a single system to get up to 20Tbps of fabric(you could perhaps technically say it is has 15Tbps of fabric since the last 5Tbps is there for redundancy, 192 x 80Gbps = 1.5Tbps). 14.5RU worth of rack space too.

Dell claims non-blocking scalability up to 160Tbps, which is certainly a lot! Though I’m not sure what it would take for me to make the leap into a distributed system such as TRILL. Given TRILL is a layer 2 only protocol (which I complained about a while ago), I wonder how they handle layer 3 traffic, is it distributed in a similar manor? What is the performance at layer 3? Honestly I haven’t read much on TRILL at this point (mainly because it hasn’t really interested me yet), but one thing that is not clear to me(maybe someone can clarify), is is TRILL just a traffic management protocol or does it also include more transparent system management(e.g. manage multiple devices as one), or does that system management part require more secret sauce by the manufacturer.

My own, biased(of course), thoughts on this architecture, while innovative:

  • Uses a lot of power / consumes a lot of space
  • Lots of devices to manage
  • Lots of connections – complicated physical network
  • Worries over resiliency of TRILL (or any tightly integrated distributed design – getting this stuff right is not easy)
  • On paper at least seems to be very scalable
  • The Z9000 32-port 40GbE switch certainly seems to be a nice product from a pure hardware/throughput/formfactor perspective. I just came across Arista’s new 1U 40GbE switch and I think I’d prefer the Force10 design with twice the size and twice the ports purely for more line rate ports in the unit.

It would be interesting to read a bit more in depth about this architecture.

I wonder if this is going to be Force10s approach going forward, the distributed design, or if they are going to continue to offer more traditional chassis products for customers who prefer that type of setup. In theory it should be pretty easy to do both.

November 14, 2011

NetApp challenge falls without winners

Filed under: Storage — Tags: — Nate @ 9:52 am

This is just too funny.

Nobody won a million pounds of kit from NetApp because data centre nerds thought the offer was unbelievable or couldn’t be bothered with the paperwork.

NetApp UK offered an award of up to £1m in NetApp hardware, software, and services to a lucky customer that managed to reduce its storage usage by 50 per cent through using NetApp gear. The competition’s rules are still available on NetApp’s website.

For years, I have been bombarded with marketing from NetApp (indirectly from the VARs and consultants they have bribed over the years) about the efficiency of the NetApp platform, especially de-dupe, how it will save you tons of money on storage etc.

Indirectly because I absolutely refused to talk directly to the NetApp team in Seattle after how badly they treated me when I was interested in being a customer a few years ago. It seemed just as bad as EMC was (at the time). I want to be treated as a customer, not a number.

Ironically enough it was NetApp that drove me into the arms of 3PAR, long before I really understood what the 3PAR technology was all about. It was NetApp’s refusal to lend me an evaluation system for any length of time which is what sealed the deal for my first 3PAR purchase. Naturally, at the time 3PAR was falling head over heels at the opportunity to give us an eval, and so we did evaluate their product, an E200 at the time. Which in the end directly led to 4 more array purchases(with a 5th coming soon I believe), and who knows how many more indirectly as a result of my advocacy either here or as a customer reference.

Fortunately my boss at the time, was kind of like me, when the end of the road came for NetApp – and when they dropped their pants (prices) so low that most people could not ignore, my boss was to the point where NetApp could of given us the stuff for free and he would of still bought 3PAR. We asked for eval for weeks and they refused us every time until the last minute.

I have no doubt that de-dupe is effective, how effective is dependent on a large number of factors. Suffice to say I don’t buy it yet myself, at least not as a primary reason to purchase primary storage for online applications.

Anyways, I think this contest, more than anything else is a perfect example. You would think that the NetApp folks out there would of jumped on this, but there are no winners, that is too bad, but very amusing.

I can’t stop giggling.

I guess I should thank NetApp for pointing me in the right direction in the beginning of my serious foray into the storage realm.

So, thanks NetApp! 🙂

Four posts in one morning! And it’s not even 9AM yet! You’d think I have been up since 5AM writing and you’d be right.

AMD Launches Opteron 6200s

Filed under: General — Tags: , , — Nate @ 9:06 am

UPDATED I have three words:

About damn time.

I’ve been waiting for a long time for these, was expecting them months ago, had to put in orders with Opteron 6100s a few weeks ago because I couldn’t wait any longer for the 6200s. Sigh. I’m half hoping I can get HP to exchange my 6100s for 6200s since the 6100s are still sitting in boxes. Though that may be being too hopeful given my time line for deployment. One thing’s for sure though, if HP can pull it off they’ll make my decision on which version of vSphere to go with pretty easy since vSphere 4 tops out at 12 cores.

AMD has finally launched the 6200, which everyone knows is the world’s first 16-core x86-64 processor, and is socket compatible with the 6100 processor which launched over a year ago providing an easy upgrade path.

I’m just running through some of the new stuff now, one feature which is nice and I believe I mentioned it a while ago is the TDP Cap, which allows a user to set the maximum power usage of the processor, basically more granular control than technologies that were used previous to it. I don’t believe it has the ability to dynamically turn cores on and off based on this value though which is unfortunate – maybe next time. Excluding the new turbo core support which is different technology.

AMD Turbo Core

I thought this was pretty cool, I was just reading about it in their slide deck. I thought, at first it was going to be similar to the Intel Turbo or IBM Turbo technology where, if I recall right (don’t quote me), the system can more or less shut off all the other cores on the socket and turbo charge a single core to super sonic speeds. AMD Turbo core operates on all cores simultaneously by between 300-500Mhz if the workload fits the power envelope of the processor. It can do the same for half of the on board cores but instead of 300-500Mhz boost the frequency by up to 1Ghz.

Memory Enhancements

It also supports higher performance memory as well as something called LR-DIMMs, which I had never heard of before. Load Reduced DIMMs seem they allow you to add more memory to the system. Even after reading the stuff on Micron’s site I’m not sure of the advantage.

I recall on the 6100 there was a memory performance hit when you utilized all 12 memory slots per CPU socket (vs using only 8/socket). I don’t see whether this is different on the 6200 or not.

Power and Performance

The highest end, lowest power Opteron 6100 seems to be the 6176 (not to be confused with the 6176 SE). The 6176 (by itself) is not even mentioned on AMD’s site (though it is on HP’s site and my recent servers have it). It is a 2.3Ghz 12-core 80W (115W TDP) processor. It seems AMD has changed their power ratings from the ACP they were using before to the TDP (what Intel uses). If I recall right ACP was something like average processor power usage, vs TDP is peak usage(?).

The 6276 is the new high end lower power option, which is a 16-core 2.3Ghz processor with the same power usage. So they managed to squeeze in an extra 9.2Ghz worth of processing power in the same power envelope. That’s pretty impressive.

There’s not a lot of performance metrics out at this stage, but here’s something I found on AMD’s site:

SPEC Int rate_base2006 Mainstream CPUs

That’s a very good price/performance ratio. This graph is for “mainstream CPUs” that is CPUs with “normal” power usage, not ultra high end CPUs which consume a lot more power. Those are four socket systems so for the CPUs alone on the high end from Intel would run $8,236, and from AMD $3,152. Then there is the motherboard+chipset from Intel which will carry a premium over AMD as well since Intel has different price/scalability bands for their processors between their two socket and four socket systems (where AMD does not, though with Intel you can now get two socket versions of servers with the latest Intel processors they still seem to carry a decent premium since I believe they use the same chipsets as the four socket boxes the two socket versions are made more for memory capacity bound workloads rather than CPU bound).

They have floating point performance too though for the stuff I do floating point doesn’t really matter, more useful probably for SGI and Cray and their super computers.

It’s not the 3.5Ghz that AMD was talking about but I trust that is coming..at some point. AMD has been having some manufacturing issues recently which probably was the main cause for the delays of the 6200, hopefully they get those worked out in short order.

HP has already updated their stuff to reflect support for the latest processors in their existing platforms.

From HP’s site, here are the newest 16 core processors:

  • 6282SE (2.6GHz/16-core/16MB/140W TDP) Processor
  • 6276 (2.3GHz/16-core/16MB/115W TDP) Processor
  • 6274 (2.2GHz/16-core/16MB/115W TDP) Processor
  • 6272 (2.1GHz/16-core/16MB/115W TDP) Processor
  • 6262HE (1.6GHz/16-core/16MB/85W TDP) Processor

Few more stats –

  • L1 CPU Cache slashed from 128kB to 48kB (total 1,536kB to 768kB)
  • L2 CPU Cache increased from 512kB to 1,000 kB (total 6,144kB to 12,000kB)
  • L3 CPU Cache increased from 12,288 kB to 16,384 kB (1,024kB per core for both procs)
  • Memory controller clock speed increased from 1.8Ghz to 2Ghz
  • CMOS process shrunk from 45nm to 32nm

Interesting how they shifted focus away from the L1 cache to the L2 cache.

Anyone know how many transistors are on this thing? And how many were on the 6100 ? How about on some of the recent Intel chips?

Now to go figure out how much these things actually cost and what the lead times are.

UPDATE – I know pricing at least now, the new 16 core procs are, as the above graph implies actually cheaper than the 12-core versions! That’s just insane, how often does that happen?!?!

Bottom line

With so many things driving virtualization these days, and with such high consolidation ratios, especially with workloads that are not CPU constrained(which are most), myself I like the value that the 6000-series AMD chips give, especially the number of raw cores without hyperthreading. The AMD 6000 platform is the first AMD platform I have really, truly liked I want to say going back a long, long ways. I’ll admit I was mistaken in my ways for a few years when I was on the Intel bandwagon. Though I have been on the ‘give me more cores’ bandwagon ever since the first Intel quad core processor. Now that AMD has the most cores, on a highly efficient platform, I suppose I gravitate towards them now. There are limits to how far you go to get cores of course, I’m not sure what my limit is. I’ve mentioned in the past I wouldn’t be interested in something like a 48x200Mhz CPU for example. The Opteron 6000 has a nice balance of per-core performance (certainly can’t match Intel’s per core performance but it’s halfway decent especially given the price), and many, many cores.

Three blog posts in one morning, busy morning!

Oracle throws in Xen virtualization towel?

Filed under: Virtualization — Tags: , — Nate @ 7:03 am

This just hit me a few seconds ago and it gave me something else to write about so here goes.

Oracle recently released Solaris 11, the first major rev to Solaris in many many years. I remember using Solaris 10 back in 2005, wow it’s been a while!

They’re calling it the first cloud OS. I can’t say I really agree with that, vSphere, and even ESX before that has been more cloudy than Solaris for many years now, and remains today.

While their Xen-based Oracle VM is still included in Solaris 11, the focus clearly seems to be Solaris Zones, which, as far as I know is a more advanced version of User mode linux (which seems to be abandoned now?).

Zones, and UML are nothing new, Zones having been first released more than six years ago. It’s certainly a different approach to a full hypervisor approach so has less overhead, but overall I believe is an outdated approach to utility computing (using the term cloud computing makes me feel sick).

Oracle Solaris Zones virtualization scales up to hundreds of zones per physical node at a 15x lower overhead than VMware and without artificial limits on memory, network, CPU and storage resources.

It’s an interesting strategy, and a fairly unique one in today’s world, so it should give Oracle some differentiation.  I have been following the Xen bandwagon off and on for many years and never felt it a compelling platform, without a re-write. Red Hat, SuSE and several other open source folks have basically abandoned Xen at this point and now it seems Oracle is shifting focus away from Xen as well.

I don’t see many new organizations gravitating towards Solaris zones that aren’t Solaris users already (or at least have Solaris expertise in house), if they haven’t switched by now…

New, integrated network virtualization allows customers to create high-performance, low-cost data center topologies within a single OS instance for ultimate flexibility, bandwidth control and observability.

The terms ultimate flexibility and single OS instance seem to be in conflict here.

The efficiency of modern hypervisors is to the point now where the overhead doesn’t matter in probably 98% of cases. The other 2% can be handled by running jobs on physical hardware. I still don’t believe I would run a hypervisor on workloads that are truely hardware bound, ones that really exploit the performance of the underlying hardware. Those are few and far between outside of specialist niches these days though, I had one about a year and a half ago, but haven’t come across one since.

 

Travel, Data centers, cars

Filed under: Random Thought — Nate @ 5:03 am

It’s about 85 hours until I start my next road trip up to Seattle, I have been thinking about it and I think this is my most anticipated “vacation” I’ve gone on in as long as I can remember. I will be having a lot of fun during the 3 short days I will be in town for.

My next trip is going to be to Atlanta, which is in less than a month. I’m going to visit one of the largest data centers in the world, to install some equipment for my company. I visited a smaller data center outside of Seattle that weighed in at around 500,000 square feet, though I only saw a small portion of it, never really got to see the scale of the place because it was very closed off(the portion I was visiting was sub leased by Internap). Maybe they expanded it recently since they claim 1.2M square feet of data center space now….

So if you have any suggestions for places to eat or drink or hang out while I’m in Atlanta let me know, I’m not sure yet how much spare time I’ll have or how long I’ll be in town for. I have one friend in Atlanta that I plan to see while I am there, he said he thinks he has some equipment in the same facility.

Now a short update on my car. One of my friends said I should post an update on my car situation given that I bought a new car earlier this year, an uncommon one at that. It’s been almost 9 months and 10,000 miles.

10,000 miles on the odometer

The Good

It’s still very fun to drive, the torque vectoring all wheel drive corners like nobody’s business, really grips the road good. It’s very easy to park, being smaller than my previous SUV, and the rear parking camera helps a ton as well in that department. I routinely park in spots I would never even CONSIDER even TRYING to park in with my previous vehicle (which was a 2001 Nissan Pathfinder). I have had no issues with the car to-date.

The Bad

Nothing major. I suppose my biggest complaints for day-to-day use is the user interface to the stereo system. It’s only crashed once which is nice. My main complaints revolve around MP3 meta data. The stereo does not remember any of the meta data between restarts, it takes about 3-4 minutes to re-generate the meta data from the ID3 tags after a restart (it does pick up where it left off when it starts as far as music goes though — although song ordering is messed up at that point). The user interface also becomes completely unresponsive for a good four seconds when a new song loads so it can load the meta data for that song. I don’t know why this is, I mean my $40 portable MP3 players do a better job than the car stereo does at this point. The Garmin-powered navigation works quite well though. I also cannot access the video inputs when the car is not in park, which is annoying. I read on some sites it’s state law that you can’t do things like watch DVDs while driving so they have a connection to the parking brake that disables the video when the car is not in park. What I’d like to see more though is the rear camera on the road though, I think it would give a good viewing angle for changing lanes but I can’t get to that either.

Another minor complaint is my sub woofer in the trunk. It sounds awesome, it’s just a whole lot bigger than I was expecting when I asked Car Toys to install it. I really thought it was going to be flush with the floor allowing near full use of the trunk. But it is not flush it sticks up quite a bit, and thus I lose quite a bit of trunk capacity(the trunk as-is is already really small). For the most part it’s not a big deal though I rarely need the trunk. I have gotten used to using the back seats for my shopping.

I was really worried when I was moving to California, if I would have enough space for the last of my crap after the movers left(and my two cats). With the back seats down though I had more than enough space(was I relieved!).

The sound and navigation system(and backup camera) was all after market(which I paid a lot for), so it’s not really related to the car directly since it wasn’t included with the car.

Another minor complaint remains the lack of an arm wrest but I often rest my elbow on the passenger seat. Also the “tounge” of my sneakers have a habbit of triggering my gas cap to open when I leave the car on occasion.

The biggest drawback to buying a new car and moving to California less than a year after I bought it was .. buying a car and moving to California less than a year after I bought it.

I took my sweet time to register my car in this state, I didn’t check to see what the laws were but thought I had something like 60-90 days or whatever. In any case I didn’t register it until later, the last week of October to be precise. Technically I have 20 days to register the car I learned.

So what was my registration fees ? I asked my sister and she said “$49?” Not $49, not $99, not $299, not $599.

About $2,200. I think the actual registration was around $200. Then I had another $200 in late fees, THEN I had $1,500 in sales taxes (?!) Apparently the law in California now says that if you buy a new car and move here less than a year afterwards you have to pay CA state sales tax on the car (minus any taxes you paid in the origin state). A YEAR. The DMV person said it used to be 90 days, but they recently extended it to a year. That, of course I was not expecting at all. I showed them my original receipt and apparently I paid about $400 in WA state sales tax, and I owed CA nearly four times that. That certainly seems unfair, both to me and to WA. Not only that but they backdated my registration to August.

Apparently my CA drivers license which I had when I moved to WA in 2000 expired in 2003, so when I went to get a new drivers license they said it was technically a renewal instead of a new license so the test to take was shorter (which I barely passed, missing 3 out of a maximum of 3 questions, another person in front of me missed quite a bit more, she missed about 10 of 20 questions total).

I’ve kept my WA state plates on for now since I’m going back to WA this week for a few days.

I suppose the gas mileage isn’t as high as I was expecting it was going to be, with an official rating of 27 city, 32 highway I think I get closer to 20-21 city, and maybe 25 highway. It’s not a big deal though I didn’t buy this car thinking it would be a hybrid, it still gets quite a bit better milage than my Pathfinder which on an absolutely perfect day on the highway would get about 19.9, city I’m thinking more 10-12 range (premium gas for both). I push my Juke much harder than my Pathfinder, getting up to 3-4500 RPM on a regular basis, sometimes even 5,000+. I don’t think I ever pushed my Pathfinder beyond 4,000 RPM (didn’t try). I waste quite a bit of gas as the Juke encourages me to drive faster, that’s fine for me though I don’t mind.

The Funny/Strange

I’ve run into probably 15 other Jukes in my travels in the past 9 months. It’s strange because when I have someone in the car with me and I see a Juke I point it out, and get pretty excited. I see others do the same to me quite often as well (assuming they are in a Juke too).  I haven’t seen any in California for a month or so. One day I saw at least two the same day.

Lots o Miles

So even though I walk to work every day (I live 0.4 miles from the office), I still managed to put on quite a few miles so far, and about to put a whole lot more on. With about 2,000 miles to/from Seattle, and may be making a 2nd trip next week to Orange County which is about another 1,000 miles total.

So much for that ‘short update’ huh!

November 10, 2011

World’s fastest switch

Filed under: Networking — Tags: — Nate @ 8:42 pm

I came across this yesterday which is both a video, and more importantly an in-depth report on the about-to-be-released Black Diamond X-series switch. I have written a few times on this topic, I don’t have much that is really new, but then I ran across this PDF which has something I have been looking for – a better diagram on how this new next generation fabric is hooked up.

Up until now, most (all?) chassis switches relied on backplanes, or more modern systems used mid planes to transmit their electrical signals between the modules in the chassis.

Something I learned a couple of years ago (I’m not an electrical engineer) is that there are physical limits as to how fast you can push those electrons over those back and mid planes. There are serious distance limitations which makes the engineering ever more complicated the faster you push the system. Here I was thinking just crank up the clock speeds and make it go faster, but apparently it doesn’t work quite that way 🙂

For the longest time all of Extreme’s products were backplane based. Then they released a mid plane based product the Black Diamond 20808 a couple of years ago. This product was discontinued earlier this year when the X-series was announced. The 20808 had (in my simple mind) a similar design to what Force10 had been doing for many years – which is basically N+1 switch fabric modules (I believe the 20808 could go to something like 5 fabric modules), all of their previous switches had what they called MSMs, or Management Switch Modules. These were combination switch fabric and management modules., with a maximum of two per system, each providing half of the switch’s fabric capacity. Some other manufacturers like Cisco separated out their switch fabric from their management module. Having separate fabric modules really doesn’t buy you much when you only have two modules in the system. But if your architecture can go to many more (I seem to recall Force10 at one point having something like 8), then of course you can get faster performance. Another key point in the design is having separate slots for your switch fabric modules so they don’t consume space that would otherwise be used by ethernet ports.

Anyways, on the Black Diamond 20808 they did something else they had never done before, they put modules on both the front, AND on the back of the chassis. On top of that the modules were criss-crossed. The modules on the front were vertical, the modules on the back were horizontal. This is purely guessing here but I speculate the reason for that is, in part, to cut the distance needed to travel between the fabric and the switch ports. HP’s c-Class Blade enclosure has a similar mid plane design with criss crossed components. Speaking of which, I wonder if the next generation 3PAR will leverage the same “non stop” midplane technology of the c-Class. The 5 Terabits of capacity on the c-Class is almost an order of magnitude more than what is even available on the 3PAR V800. Whether or not the storage system needs that much fabric is another question.

Black Diamond 20808 (rear)

The 20808 product seemed to be geared more towards service providers and not towards high density enterprise or data center computing(if I remember right the most you could get out of the box was 64x10GbE ports which you can now get in a 1U X670V).

Black Diamond 20808 (front)

Their (now very old) Black Diamond 8000 series (with the 8900 model which came out a couple of years ago being the latest incarnation) has been the enterprise workhorse for them for many years, with a plethora of different modules and switch fabric options. The Black Diamond 8900 is a backplane based product.  I remember when it came out too – it was just a couple months after I bought my Black Diamond 10808s, in the middle of 2005. Although if I remember right the Black Diamond 8800, as it was originally released, did not support the virtual router capability that the 10808 supported that I intended to base my network design on. Nor did it support the Clear Flow security rules engine. Support for these features was added years later.

You can see the impact distance has on the Black Diamond 8900 for example, with the smaller 6-slot chassis getting at least 48Gbps more switching capacity per line card than the 10-slot chassis simply because it is smaller. Remember this is a backplane designed probably seven years ago, so it doesn’t have as much fabric capacity as a modern mid plane based system.

Anyways, back on topic, the Black Diamond X-series. Extreme’s engineers obviously saw the physics (?) limits they were likely going to hit when building a next generation platform and decided to re-think how the system works, resulting, in my opinion a pretty revolutionary way of building a switch fabric (at least I’m not aware of anything else like it myself). While much of the rest of the world is working with mid planes for their latest generation of systems, here we have the Direct Orthogonal Data Path Mating System or DOD PMS (yeah, right).

Black Diamond X-Series fabric

What got me started down this path, was I was on the Data Center Knowledge web site, and just happened to see a Juniper Qfabric advertisement. I’ve heard some interesting things about Qfabric since it was announced, it sounds similar to the Brocade VCS technology. I was browsing through some of their data sheets and white papers and it came across as something that’s really complicated. It’s meant to be simple, and it probably is, but the way they explain it to me at least makes it sound really complicated. Anyways I went to look at their big 40GbE switch which is at the core of their Qfabric interconnect technology. It certainly looks like a respectable switch from a performance stand point – 128 40GbE ports, 10 Terabits of switching fabric, weighs in at over 600 pounds(I think Juniper packs their chassis products with lead weights to make them feel more robust).

So back to the report that they posted. The networking industry doesn’t have anything like the SPC-1 or SpecSFS standardized benchmarks to measure performance, and most people would have a really hard time generating enough traffic to tax these high end switches.There is standard equipment that does it, but it’s very expensive.

So, to a certain extent you have to trust the manufacturer as to the specifications of the product, a way many manufacturers try to prove their claims of performance or latency is to hire “independent” testers to run tests on the products and give reports. This is one of those reports.

Reading it made me smile, seeing how well the X-Series performed but in the grand scheme of things it didn’t surprise me given the design of the system and the fabric capacity it has built into it.

The BDX8 breaks all of our previous records in core switch testing from performance, latency, power consumption, port density and packaging design. The BDX8 is based upon the latest Broadcom merchant silicon chipset.

For the Fall Lippis/Ixia test, we populated the Extreme Networks BlackDiamond® X8 with 256 10GbE ports and 24 40GbE ports, thirty three percent of its capacity. This was the highest capacity switch tested during the entire series of Lippis/Ixia cloud network test at iSimCity to date.

We tested and measured the BDX8 in both cut through and store and forward modes in an effort to understand the difference these latency measurements offer. Further, latest merchant silicon forward packets in store and forward for smaller packets, while larger packets are forwarded in cut-through making this new generation of switches hybrid cut-through/store and forward devices.

Reading through the latency numbers, they looked impressive, but I really had nothing to compare them with, so I don’t know how good. Surely for any network I’ll ever be on it’d be way more than enough.

The BDX8 forwards packets ten to six times faster than other core switches we’ve tested.

[..]

The Extreme Networks BDX8 did not use HOL blocking which means that as the 10GbE and 40GbE ports on the BDX8 became congested, it did not impact the performance of other ports. There was no back pressure detected. The BDX8 did send flow control frames to the Ixia test gear signaling it to slow down the rate of incoming traffic flow.

Back pressure? What an interesting term for a network device.

The BDX8 delivered the fastest IP Multicast performance measured to date being able to forward IP Multicast packets between 3 and 13 times faster then previous core switch measures of similar 10GbE density.

The Extreme Networks BDX8 performed very well under cloud simulation conditions by delivering 100% aggregated throughput while processing a large combination of east-west and north-south traffic flows. Zero packet loss was observed as its latency stayed under 4.6 μs and 4.8 μs measured in cut through and store and forward modes respectively. This measurement also breaks all previous records as the BDX8 is between 2 and 10 times faster in forwarding cloud based protocols under load.

[..]

While these are the lowest Watts/10GbE port and highest TEER values observed for core switches, the Extreme Networks BDX8’s actual Watts/10GbE port is actually lower; we estimate approximately 5 Watts/10GbE port when fully populated with 768 10GbE or 192 40GbE ports. During the Lippis/Ixia test, the BDX8 was only populated to a third of its port capacity but equipped with power supplies, fans, management and switch fabric modules for full port density population. Therefore, when this full power capacity is divided across a fully populated BDX8, its WattsATIS per 10GbE Port will be lower than the measurement observed [which was 8.1W/port]

They also mention the cost of power, and the % of list price that cost is, so we can do some extrapolation. I suspect the list price of the product is not final, and I am assuming the prices they are naming are based on the configuration they are testing with rather than a fully loaded system(which as mentioned above the switch was configured with enough fabric and power for the entire chassis but only ~50% of the port capacity was installed).

Anyways, they say the price to power it over 3 years is $10,424.05 and say that is less than 1.7% of it’s list price. Extrapolating that a bit I can guesstimate that the list price of this system as tested with 352 10GbE ports is roughly $612,000, or about $1,741 per 10GbE port.

The Broadcom technology is available to the competition, the real question is how long will it take for the competition develop something that can compete with this 20 Terabit switching fabric, which seems to be about twice as fast as anything else currently on the market.

HP has been working on some next generation stuff, I read about their optical switching technology earlier this year that their labs are working on, sounds pretty cool.

[..] Charles thinks this is likely to be sometime in the next 3-to-5 years.

So, nothing on the immediate horizon on that front.

November 8, 2011

EMC and their quad core processors

Filed under: Storage — Tags: , , , — Nate @ 8:48 am

I first heard that Fujitsu had storage maybe one and a half years ago, someone told me that Fujitsu was one company that was seriously interested in buying Exanet at the time, which caused me to go look at their storage, I had no idea they had storage systems. Even today I really never see anyone mention them anywhere, my 3PAR reps say they never encounter Fujitsu in the field(at least in these territories they suspect over in Europe they go head to head more often).

Anyways, EMC folks seem to be trying to attack the high end Fujitsu system, saying it’s not “enterprise”, in the end the main leg that EMC has trying to hold on to what in their eyes is “enterprise” is mainframe connectivity, which Fujitsu rightly tries to debunk that myth since there are a lot of organizations that are consider themselves “enterprise” that don’t have any mainframes. It’s just stupid, but EMC doesn’t really have any other excuses.

What prompted me to write this, more than anything else was this

One can scale from one to eight engines (or even beyond in a short timeframe), from 16 to 128 four-core CPUs, from two to 16 backend- and front-end directors, all with up to 16 ports.

The four core CPUs is what gets me. What a waste! I have no doubt that in EMC’s  (short time frame)  they will be migrating to quad socket 10 core CPUs right? After all, unlike someone like 3PAR who can benefit from a purpose built ASIC to accelerate their storage, EMC has to rely entirely on software. After seeing SPC-1 results for HDS’s VSP, I suspect the numbers for VMAX wouldn’t be much more impressive.

My main point is, and this just drives me mad. These big manufacturers touting the Intel CPU drum and then not exploiting the platform to it’s fullest extent. Quad core CPUs came out in 2007. When EMC released the VMAX in 2009, apparently Intel’s latest and greatest was still quad core. But here we are, practically 2012 and they’re still not onto at LEAST hex core yet? This is Intel architecture, it’s not that complicated. I’m not sure what quad core CPUs specifically are in the VMAX, but the upgrade from Xeon 5500 to Xeon 5600 for the most part was

  1. Flash bios (if needed to support new CPU)
  2. Turn box off
  3. Pull out old CPU(s)
  4. Put in new CPU(s)
  5. Turn box on
  6. Get back to work

That’s the point of using general purpose CPUs!! You don’t need to pour 3 years of R&D into something to upgrade the processor.

What I’d like to see, something I mentioned in a comment recently is a quad socket design for these storage systems. Modern CPUs have had integrated memory controllers for a long time now (well only available on Intel since the Xeon 5500). So as you add more processors you add more memory too. (Side note: the documentation for VMAX seems to imply a quad socket design for a VMAX engine but I suspect it is two dual socket systems since the Intel CPUs EMC is likely using are not quad-socket capable). This page claims the VMAX uses the ancient Intel 5400 processors, which if I remember right was the first generation quad cores I had in my HP DL380 G5s many eons ago. If true, it’s even more obsolete than I thought!

Why not 8 socket? or more? Well cost mainly. The R&D involved in an 8-socket design I believe is quite a bit higher, and the amount of physical space required is high as well. With quad socket blades common place, and even some vendors having quad socket 1U systems, the price point and physical size related to quad socket designs is well within reach of storage systems.

So the point is on these high end storage systems you start out with a single socket populated on a quad socket board with associated memory. Want to go faster? add another CPU and associated memory? Go faster still? add two more CPUs and associated memory (though I think it’s technically possible to run 3 CPUs, well there have been 3 CPU systems in the past, it seems common/standard to add them in pairs). Your spending probably at LEAST a quarter million for this system initially, probably more than that, the incremental cost of R&D to go quad socket given this is Intel after all is minimal.

Currently VMAX goes to 8 engines, they say they will expand that to more. 3PAR took the opposite approach, saying while their system is not as clustered as a VMAX is (not their words), they feel such a tightly integrated system (theirs included) becomes more vulnerable to “something bad happening” that impacts the system as a whole, more controllers is more complexity. Which makes some sense. EMC’s design is even more vulnerable being that it’s so tightly integrated with the shared memory and stuff.

3PAR V-Class Cluster Architecture with low cost high speed passive backplane with point to point connections totalling 96 Gigabytes/second of throughput

3PAR goes even further in their design to isolate things – like completely separating control cache which is used for the operating system that powers the controllers and for the control data on top of it, with the data cache, which as you can see in the diagram below is only connected to the ASICs, not to the Intel CPUs. On top of that they separate the control data flow from the regular data flow as well.

One reason I have never been a fan of “stacking” or “virtual chassis” on switches is the very same reason, I’d rather have independent components that are not tightly integrated in the event “something bad” takes down the entire “stack”. Now if your running with two independent stacks, so that one full stack can fail without an issue then that works around that issue, but most people don’t seem to do that. The chances of such a failure happening are low, but they are higher than something causing all of the switches to fail if the switches were not stacked.

One exception might be some problems related to STP which some people may feel they need when operating multiple switches. I’ll answer that by saying I haven’t used STP in more than 8 years, so there have been ways to build a network with lots of devices without using STP for a very long time now. The networking industry recently has made it sound like this is something new.

Same with storage.

So back to 3PAR. 3PAR changed their approach in their V-series of arrays, for the first time in the company’s history they decided to include TWO ASICs in each controller, effectively doubling the I/O processing abilities of the controller. Fewer, more powerful controllers. A 4-node V400 will likely outperform an 8-node T800. Given the system’s age, I suspect a 2-node V400 would probably be on par with an 8-node S800 (released around 2003 if I remember right).

3PAR V-Series ASIC/CPU/PCI/Memory Architecture

EMC is not alone, and not the worst abuser here though. I can cut them maybe a LITTLE slack given the VMAX was released in 2009. I can’t cut any slack to NetApp though. They recently released some new SPEC SFS results, which among other things, disclosed that their high end 6240 storage system is using quad core Intel E5540 processors. So basically a dual proc quad core system. And their lower end system is — wait for it — dual proc dual core.

Oh I can’t describe how frustrated that makes me, these companies touting using general purpose CPUs and then going out of their way to cripple their systems. It would cost NetApp all of maybe what $1200 to upgrade their low end box to quad cores? Maybe $2500 for both controllers? But no they rather you spend an extra, what $50,000-$100,000  to get that functionality?

I have to knock NetApp more to some extent since these storage systems are significantly newer than the VMAX, but I knock them less because they don’t champion the Intel CPUs as much as EMC does, that I have seen at least.

3PAR is not a golden child either, their latest V800 storage system uses — wait for it — quad core processors as well. Which is just as disgraceful. I can cut 3PAR more slack because their ASIC is what provides the horsepower on their boxes, not the Intel processors, but still that is no excuse for not using at LEAST 6 core processors. While I cannot determine precisely which Intel CPUs 3PAR is using, I know they are not using Intel CPUs because they are ultra low power since the clock speeds are 2.8Ghz.

Storage companies aren’t alone here, load balancing companies like F5 Networks and Citrix do the same thing. Citrix is better than F5 in their offering software “upgrades” on their platform that unlock additional throughput. Without the upgrade you have full reign of all of the CPU cores on the box which allow you to run more expensive software features that would normally otherwise impact CPU performance. To do this on F5 you have to buy the next bigger box.

Back to Fujitsu storage for a moment, their high end box certainly seems like a very respectable system with regards to paper numbers anyways. I found it very interesting the comment on the original article that mentioned Fujitsu can run the system’s maximum capacity behind a single pair of controllers if the customer wanted to, of course the controllers couldn’t drive all the I/O but it is nice to see the capacity not so tightly integrated to the controllers like it is on the VMAX or even on the 3PAR platform. Especially when it comes to SATA drives which aren’t known for high amounts of I/O, higher end storage systems such as the recently mentioned HDS, 3PAR and even VMAX tap out in “maximum capacity” long before they tap out in I/O if your loading the system with tons of SATA disks. It looks like Fujitsu can get up to 4.2PB of space leaving, again HDS, 3PAR and EMC in the dust. (Capacity utilization is another story of course).

With Fujitsu’s ability to scale the DX8700 to 8 controllers, 128 fibre channel interfaces, 2,700 drives and 512GB of cache that is quite a force to be reckoned with. No sub-disk distributed RAID, no ASIC acceleration, but I can certainly see how someone would be willing to put the DX8700 up against a VMAX.

EMC was way late to the 2+ controller hybrid modular/purpose built game and is still playing catch up. As I said to Dell last year, put your money where your mouth is and publish SPC-1 results for your VMAX, EMC.

With EMC so in love with Intel I have to wonder how hard they had to fight off Intel from encouraging EMC to use the Itanium processor in their arrays instead of Xeons. Or has Intel given up completely on Itanium now (which, again we have to thank AMD for – without AMD’s x86-64 extensions the Xeon processor line would of been dead and buried many years ago).

For insight to what a 128-CPU core Intel-based storage system may perform in SPC-1, you can look to this system from China.

(I added a couple diagrams, I don’t have enough graphics on this site)

Older Posts »

Powered by WordPress