I first heard that Fujitsu had storage maybe one and a half years ago, someone told me that Fujitsu was one company that was seriously interested in buying Exanet at the time, which caused me to go look at their storage, I had no idea they had storage systems. Even today I really never see anyone mention them anywhere, my 3PAR reps say they never encounter Fujitsu in the field(at least in these territories they suspect over in Europe they go head to head more often).
Anyways, EMC folks seem to be trying to attack the high end Fujitsu system, saying it's not "enterprise", in the end the main leg that EMC has trying to hold on to what in their eyes is "enterprise" is mainframe connectivity, which Fujitsu rightly tries to debunk that myth since there are a lot of organizations that are consider themselves "enterprise" that don't have any mainframes. It's just stupid, but EMC doesn't really have any other excuses.
What prompted me to write this, more than anything else was this
One can scale from one to eight engines (or even beyond in a short timeframe), from 16 to 128 four-core CPUs, from two to 16 backend- and front-end directors, all with up to 16 ports.
The four core CPUs is what gets me. What a waste! I have no doubt that in EMC's (short time frame) they will be migrating to quad socket 10 core CPUs right? After all, unlike someone like 3PAR who can benefit from a purpose built ASIC to accelerate their storage, EMC has to rely entirely on software. After seeing SPC-1 results for HDS's VSP, I suspect the numbers for VMAX wouldn't be much more impressive.
My main point is, and this just drives me mad. These big manufacturers touting the Intel CPU drum and then not exploiting the platform to it's fullest extent. Quad core CPUs came out in 2007. When EMC released the VMAX in 2009, apparently Intel's latest and greatest was still quad core. But here we are, practically 2012 and they're still not onto at LEAST hex core yet? This is Intel architecture, it's not that complicated. I'm not sure what quad core CPUs specifically are in the VMAX, but the upgrade from Xeon 5500 to Xeon 5600 for the most part was
- Flash bios (if needed to support new CPU)
- Turn box off
- Pull out old CPU(s)
- Put in new CPU(s)
- Turn box on
- Get back to work
That's the point of using general purpose CPUs!! You don't need to pour 3 years of R&D into something to upgrade the processor.
What I'd like to see, something I mentioned in a comment recently is a quad socket design for these storage systems. Modern CPUs have had integrated memory controllers for a long time now (well only available on Intel since the Xeon 5500). So as you add more processors you add more memory too. (Side note: the documentation for VMAX seems to imply a quad socket design for a VMAX engine but I suspect it is two dual socket systems since the Intel CPUs EMC is likely using are not quad-socket capable). This page claims the VMAX uses the ancient Intel 5400 processors, which if I remember right was the first generation quad cores I had in my HP DL380 G5s many eons ago. If true, it's even more obsolete than I thought!
Why not 8 socket? or more? Well cost mainly. The R&D involved in an 8-socket design I believe is quite a bit higher, and the amount of physical space required is high as well. With quad socket blades common place, and even some vendors having quad socket 1U systems, the price point and physical size related to quad socket designs is well within reach of storage systems.
So the point is on these high end storage systems you start out with a single socket populated on a quad socket board with associated memory. Want to go faster? add another CPU and associated memory? Go faster still? add two more CPUs and associated memory (though I think it's technically possible to run 3 CPUs, well there have been 3 CPU systems in the past, it seems common/standard to add them in pairs). Your spending probably at LEAST a quarter million for this system initially, probably more than that, the incremental cost of R&D to go quad socket given this is Intel after all is minimal.
Currently VMAX goes to 8 engines, they say they will expand that to more. 3PAR took the opposite approach, saying while their system is not as clustered as a VMAX is (not their words), they feel such a tightly integrated system (theirs included) becomes more vulnerable to "something bad happening" that impacts the system as a whole, more controllers is more complexity. Which makes some sense. EMC's design is even more vulnerable being that it's so tightly integrated with the shared memory and stuff.
3PAR goes even further in their design to isolate things - like completely separating control cache which is used for the operating system that powers the controllers and for the control data on top of it, with the data cache, which as you can see in the diagram below is only connected to the ASICs, not to the Intel CPUs. On top of that they separate the control data flow from the regular data flow as well.
One reason I have never been a fan of "stacking" or "virtual chassis" on switches is the very same reason, I'd rather have independent components that are not tightly integrated in the event "something bad" takes down the entire "stack". Now if your running with two independent stacks, so that one full stack can fail without an issue then that works around that issue, but most people don't seem to do that. The chances of such a failure happening are low, but they are higher than something causing all of the switches to fail if the switches were not stacked.
One exception might be some problems related to STP which some people may feel they need when operating multiple switches. I'll answer that by saying I haven't used STP in more than 8 years, so there have been ways to build a network with lots of devices without using STP for a very long time now. The networking industry recently has made it sound like this is something new.
Same with storage.
So back to 3PAR. 3PAR changed their approach in their V-series of arrays, for the first time in the company's history they decided to include TWO ASICs in each controller, effectively doubling the I/O processing abilities of the controller. Fewer, more powerful controllers. A 4-node V400 will likely outperform an 8-node T800. Given the system's age, I suspect a 2-node V400 would probably be on par with an 8-node S800 (released around 2003 if I remember right).
EMC is not alone, and not the worst abuser here though. I can cut them maybe a LITTLE slack given the VMAX was released in 2009. I can't cut any slack to NetApp though. They recently released some new SPEC SFS results, which among other things, disclosed that their high end 6240 storage system is using quad core Intel E5540 processors. So basically a dual proc quad core system. And their lower end system is -- wait for it -- dual proc dual core.
Oh I can't describe how frustrated that makes me, these companies touting using general purpose CPUs and then going out of their way to cripple their systems. It would cost NetApp all of maybe what $1200 to upgrade their low end box to quad cores? Maybe $2500 for both controllers? But no they rather you spend an extra, what $50,000-$100,000 to get that functionality?
I have to knock NetApp more to some extent since these storage systems are significantly newer than the VMAX, but I knock them less because they don't champion the Intel CPUs as much as EMC does, that I have seen at least.
3PAR is not a golden child either, their latest V800 storage system uses -- wait for it -- quad core processors as well. Which is just as disgraceful. I can cut 3PAR more slack because their ASIC is what provides the horsepower on their boxes, not the Intel processors, but still that is no excuse for not using at LEAST 6 core processors. While I cannot determine precisely which Intel CPUs 3PAR is using, I know they are not using Intel CPUs because they are ultra low power since the clock speeds are 2.8Ghz.
Storage companies aren't alone here, load balancing companies like F5 Networks and Citrix do the same thing. Citrix is better than F5 in their offering software "upgrades" on their platform that unlock additional throughput. Without the upgrade you have full reign of all of the CPU cores on the box which allow you to run more expensive software features that would normally otherwise impact CPU performance. To do this on F5 you have to buy the next bigger box.
Back to Fujitsu storage for a moment, their high end box certainly seems like a very respectable system with regards to paper numbers anyways. I found it very interesting the comment on the original article that mentioned Fujitsu can run the system's maximum capacity behind a single pair of controllers if the customer wanted to, of course the controllers couldn't drive all the I/O but it is nice to see the capacity not so tightly integrated to the controllers like it is on the VMAX or even on the 3PAR platform. Especially when it comes to SATA drives which aren't known for high amounts of I/O, higher end storage systems such as the recently mentioned HDS, 3PAR and even VMAX tap out in "maximum capacity" long before they tap out in I/O if your loading the system with tons of SATA disks. It looks like Fujitsu can get up to 4.2PB of space leaving, again HDS, 3PAR and EMC in the dust. (Capacity utilization is another story of course).
With Fujitsu's ability to scale the DX8700 to 8 controllers, 128 fibre channel interfaces, 2,700 drives and 512GB of cache that is quite a force to be reckoned with. No sub-disk distributed RAID, no ASIC acceleration, but I can certainly see how someone would be willing to put the DX8700 up against a VMAX.
EMC was way late to the 2+ controller hybrid modular/purpose built game and is still playing catch up. As I said to Dell last year, put your money where your mouth is and publish SPC-1 results for your VMAX, EMC.
With EMC so in love with Intel I have to wonder how hard they had to fight off Intel from encouraging EMC to use the Itanium processor in their arrays instead of Xeons. Or has Intel given up completely on Itanium now (which, again we have to thank AMD for - without AMD's x86-64 extensions the Xeon processor line would of been dead and buried many years ago).
For insight to what a 128-CPU core Intel-based storage system may perform in SPC-1, you can look to this system from China.
(I added a couple diagrams, I don't have enough graphics on this site)
Just another one of my random thoughts I have been having recently.
Chuck wrote a blog not too long ago how he believes everyone is going to go to Intel (or x86 at least) processors in their systems and move away from ASICs.
He illustrated his point by saying some recent SPEC NFS results showed the Intel based system outperforming everything else. The results were impressive, the only flaw in them is that the costs are not disclosed for SPEC. An EMC VMAX with 96 EFDs isn't cheap. And the better your disk subsystem is the faster your front end can be.
Back when Exanet was still around they showed me some results from one of their customers testing SPEC SFS on the Exanet LSI (IBM OEM'd) back end storage vs 3PAR storage, and for the same number of disks the SPEC SFS results were twice as high on 3PAR.
But that's not my point here or question. A couple of years ago NetApp posted some pretty dismal results for the CX-3 with snapshots enabled. EMC doesn't do SPC-1 so NetApp did it for them. Interesting.
After writing up that Pillar article where I illustrated the massive efficiency gains on the 3PAR architecture(which is in part driven by their own custom ASICs), it got me thinking again, because as far as I can tell Pillar uses x86 CPUs.
Pillar offers multiple series of storage controllers to best meet the needs of your business and applications. The Axiom 600 Series 1 has dual-core processors and supports up to 24GB cache. The Axiom 600 Slammer Series 2 has quad-core processors and double the cache providing an increase in IOPS and throughput over the Slammer Series 1.
Now I can only assume they are using x86 processors, for all I know I suppose they could be using Power, or SPARC, but I doubt they are using ARM
Anyways back to the 3PAR architecture and their micro RAID design. I have written in the past about how you can have tens to hundreds of thousands of mini RAID arrays on a 3PAR system depending on the amount of space that you have. This is, of course to maximize distribution of data over all resources to maximize performance and predictability. When running RAID 5 or RAID 6, there are of course parity calculations involved. I can't help but wonder what sort of chances in hell a bunch of x86 CPU cores have in calculating RAID in real time for 100,000+ RAID arrays, with 3 and 4TB drives not too far out, you can take that 100,000+ and make it 500,000.
Taking the 3PAR F400 SPC-1 results as an example, here is my estimate on the number of RAID arrays on the system, fortunately it's mirrored so math is easier:
- Usable capacity = 27,053 GB (27,702,272 MB)
- Chunklet size = 256MB
- Total Number of RAID-1 arrays = ~ 108,212
- Total Number of data chunklets = ~216,424
- Number of data chunklets per disk = ~563
- Total data size per disk = ~144,128 MB (140.75 GB)
For legacy RAID designs it's probably not a big deal, but as disk drives grow ever bigger I have no doubt that everyone will have to move to a distributed RAID architecture, to reduce disk rebuild times and lower the chances of a double/triple disk failure wiping out your data. It is unfortunate (for them) that Hitachi could not pull that off in their latest system.
3PAR does use Intel CPUs in their systems as well, though they aren't used too heavily, on the systems I have had even at peak spindle load I never really saw CPU usage above 10%.
I think ASICs are here to stay for some time, on the low end you will be able to get by with generic CPU stuff, but on the higher end it will be worth the investment to do it in silicon.
Another place to look for generic CPUs vs ASICs is in the networking space. Network devices are still heavily dominated by ASICs because generic CPUs just can't keep up. Now of course generic CPUs are used for what I guess could be called "control data", but the heavy lifting is done by silicon. ASICs often draw a fraction of the power that generic CPUs do.
Yet another place to look for generic CPUs vs ASICs is in the HPC space - the rise of GPU-assisted HPC allowing them to scale to what was (to me anyways) unimaginable heights.
Generic CPUs are of course great to have and they have come a long way, but there is a lot of cruft in them, so it all depends on what your trying to accomplish.
The fastest NAS in the world is still BlueArc, which is powered by FPGAs, though their early cost structures put them out of reach for most folks, their new mid range looks nice, my only long time complaint about them has been their back end storage - either LSI or HDS, take it or leave it. So I leave it.
The only SPEC SFS results posted by BlueArc are for the mid range, nothing for their high end (which they tested on the last version of SFS, nothing yet for the current version).
This story makes me sick. Everyone in the industry knew it was going on, several years ago Intel was paying off their customers to stick to their products, and not deploy the superior Opteron processors. Intel's strategy to convert the world to Itanium was going down in flames thanks to AMD's extension of x86 - x86-64, combine that with a superior hardware architecture derived from the Alpha (and created by many former engineers who worked on the Alpha - AMD hired many of them). Whether it was the hypertransport design, the itnegrated memory controllers, multi core designs. In so many areas AMD was showing such massive innovations the only way Intel could respond at the time was by paying their customers to not use their stuff.
In no place was it more obvious than Dell. A company that myself I've never had respect for for other reasons(biggest being they don't innovate at all outside of their supply chain). Dell was the only big OEM that did not use AMD processors at all for the longest time.
What upsets me more than anything else, is not the fact that this went on, but the pocket change of penalties that resulted. Intel paid AMD $1.25 billion to settle all outstanding legal cases last year, a small fraction that otherwise should of been paid. Dell pays only $100M, maybe that's enough for the SEC, but on anti trust grounds it should be far more. $100M is not a deterrent. It accounts for a small fraction of what Intel paid them!
Same goes for the settlement Intel paid to AMD, I have absolutely no doubt, as you should have none as well, that Intel benefited FAR more than the $1 billion paid to AMD. It should of been $10 billion, if not higher. Intel wrote that settlement off in one quarter!
It really is depressing to see these big companies get away with this sort of thing. Whether it's Dell, or Intel, or the recent Goldman Sachs SEC settlement. The penalties are pocket change compared to what they should be to make it a real deterrent. And moreover, individuals are not punished in a lot of cases, the company takes the hit, and of course in all cases nobody ever admits any wrong doing. Goldman, like Intel wrote their settlement off in one quarter!
Dell wasn't alone, we all knew it but no other OEM was being so blatenly obvious in their strategy.
Intel's rebates amounted to 38 per cent of Dell's operating profit in the fiscal year 2006, and rose to 76 per cent (or $720m) in one quarter alone, Q1 2007. While almost all of the Intel funds were incorporated into Dell's component costs, Dell did not disclose the existence, much less the magnitude, of the Intel exclusivity payments.
New York State's lawsuit suggests that the reach of the funding was wide indeed. It alleges that IBM benefited by $130m from Intel simply for not launching an AMD product. HP benefited by almost $1bn. Again, you might suppose Intel might have found better use for such resources - such as R&D.
A lot of the big companies do this sort of thing, it's a wonder that tech startups even bother to start up anymore when there is really nobody keeping the playing field fair. One other similar despicable business deal which I was informed from two different people on both sides of the table was a networking deal Cisco was competing with AT&T for along with some other vendors. AT&T was(and probably still is) the largest user and re-seller of Cisco gear. The competition was the obvious players there's only so many out there! Anyways the deal went down, Cisco lost hands down on many accounts. Their technology just isn't competitive in so many areas. So how did Cisco respond? They came back to AT&T with 95% off list pricing. They bought the business. They didn't win on any real merits, they took a major loss on the deal, which will result in all of their other customers having to continue to pay more to compensate for that. That just makes me sick.
But nothing seems to be on such a grand scale as what Intel did to keep AMD at bay. It was shocking to me seeing the pundits saying "oh well the consumer wasn't hurt by those practices", not taking into account how close AMD came to the brink, with their massive(still massive!) amount of debt they have incurred over the years. An incredible market opportunity for them was there for several years, something Intel kept small by throwing cash at their customers because they had nothing else to offer.
Intel can't afford to lose AMD from an anti trust standpoint, but they also don't want them to succeed too much, a pretty fine line they walk.
- Expand the amount of memory available to the system
- Be able to "connect" two dual socket blades to form a single quad socket system
Pretty creative, though the end result wasn't quite as impressive as it sounded up front. Their standard blade chassis is 9U and has 14 slots on it.
- Each blade is dual socket, maximum 16 cores, and 16 DIMMs
- Each memory extender offers 24 additional DIMMs
So for the chassis as a whole your talking about 7 dual socket systems with 40 DIMMs each. Or 3 quad socket systems with 80 DIMMs each, and 1 dual socket with 40.
Compared to an Opteron 6100 system, which you can get 8 quad socket systems with 48 DIMMs each in a single enclosure(granted such a system has not been announced yet but I am confident it will be).
- Intel 7500-based system: 112 CPU cures (1.8Ghz), 280 DIMM slots - 9U
- Opteron 6100-based system: 384 CPU cores (2.2Ghz), 384 DIMM slots - 10U
And the price of the IBM system is even less impressive -
In a base configuration with a single four-core 1.86 GHz E7520 processor and 8 GB of memory, the BladeCenter HX5 blade costs $4,629. With two of the six-core 2 GHz E7540 processors and 64 GB of memory, the HX5 costs $15,095.
They don't seem to show pricing for the 8 core 7500-based blade, and say there is no pricing or ETA on the arrival of the memory extenders.
They do say this which is interesting (not surprising) -
The HX5 blade cannot support the top-end eight-core Xeon 7500 parts, which have a 130 watt thermal design point, but it has been certified to support the eight-core L7555, which runs at 1.86 GHz, has 24 MB of L3 cache, and is rated at 95 watts.
I only hope AMD has enough manufacturing capacity to keep up with demand, Opteron 6100s will wipe the floor with the Intel chips on price/performance (for the first time in a while).