I've been waiting to see these final results for a while, and now they are out! The numbers(performance + cost + latency) are actually better than I was expecting.
You can see a massive write up I did on this platform when it was released last year.
(last minute edits to add a new Huawei results that was released yesterday)
(more last minute edits to add a HP P6500 EVA SPC-1E)
I'll say this again in case this happens to be read by someone who is new here. Myself, I see value in the SPC-1 as it provides a common playing field for reporting on performance in random transactional workloads (the vast majority of workloads are transactional). On top of the level playing field the more interesting stuff comes in the disclosures of the various vendors. You get to see things like
- Cost (SpecSFS for example doesn't provide this and the resulting claims from the vendors showing high performance relative to others at a massive cost premium but not disclosing the costs is very sad)
- Utilization (SPC-1 minimum protected utilization is 55%)
- Configuration complexity (only available in the longer full disclosure report)
- Other compromises the vendor might of made (see the note about disabling cache mirroring)
- 3 year 24x7 4 hour on site hardware support costs
There is a brief executive summary as well as what is normally a 50-75 page full disclosure report with the nitty gritty details.
SPC-1 also has maximum latency requirements - no I/O request can take longer than 30ms to serve or the test is invalid.
There is another test suite - SPC-2, which tests throughput with various means. Much fewer systems participate in that test (3PAR never has, though I'd certainly like them to).
Having gone through several storage purchases over the years I can say from personal experience it is a huge pain to try to evaluate stuff under real workloads - often times vendors don't even want to give evaluation gear (that is in fact in large part why I am a 3PAR customer today). Even if you do manage to get something in house to test, there are many things out there, with wide ranging performance / utilization ratios. At least with something like SPC-1 you can get some idea how the system performs relative to other systems at non trivial utilization rates. This example is rather extreme but is a good illustration.
I have no doubt the test is far from perfect, but in my opinion at least it's far better than the alternatives, like people running 100% read tests with IOMeter to show they can get 1 million IOPS.
I find it quite strange that none of the new SSD startups have participated in SPC-1, I've talked to a couple different ones and they don't like the test, they give the usual it's not real world, customers should take the gear and test it out themselves. Typical stuff. Usually means they would score poorly - especially those that leverage SSD as a cache tier, with high utilization rates of SPC-1 you are quite likely to blow out that tier, once that happens performance tanks. I have heard reports of some of these guys getting their systems yanked out of production because they fail to perform after utilization goes up. System shines like a star during brief evaluation - then after several months of usage and utilization increasing, performance no longer holds up.
One person said their system is optimized for multiple workloads and SPC-1 is a single workload. I don't really agree with that, SPC-1 does a ton of reads and writes all over the system, usually from multiple servers simultaneously. I look back to 3PAR specifically, who have been touting multiple workload (and mixed workload) support since their first array was released more than a decade ago. They have participated in SPC-1 for over a decade as well, so arguments saying testing is too expensive etc doesn't hold water either. They did it when they were small, on systems that are designed from the ground up for multiple workloads (not just riding a wave of fast underlying storage and hoping that can carry them), these new small folks can do it too. If they can come up with a better test with similar disclosures I'm all ears too.
The one place where I think SPC-1 could be improved is in failure testing. Testing a system in a degraded state to see how it performs.
The below results are from what I could find on all SSD SPC-1 results. If there is one/more I have missed(other than TMS, see note below), let me know. I did not include the IBM servers with SSD, since those are..servers.
|HP 3PAR 7400||May 23, 2013|
|HP P6500 EVA (SPC-1E)||February 17, 2012|
|IBM Storwize V7000||June 4, 2012|
|HDS Unified Storage 150||March 26, 2013|
|Huawei OceanStor Dorado2100 G2||May 22, 2013|
|Huawei OceanStor Dorado5100||August 13, 2012|
I left out the really old TMS (now IBM) SPC-1 results as they were from 2011, too old for a worthwhile comparison.
Performance / Latency
|System Name||SPC-1 |
|# of times|
|HP 3PAR 7400||258,078||0.66ms||0.86ms||0 / 15||32x
|HP P6500 EVA (SPC-1E)||20,003||4.01ms||11.23ms||13 / 15||8x
|IBM Storwize V7000||120,492||2.6ms||4.32ms||15 / 15||18x
|HDS Unified Storage 150||125,018||0.86ms||1.09ms||12 / 15||20x
|Huawei OceanStor Dorado2100 G2||400,587||0.60ms||0.75ms||0 / 15||50x
|Huawei OceanStor Dorado5100||600,052||0.87ms||1.09ms||7 / 15||96x
A couple of my own data points:
- Avg latency (All utilization levels) - I just took aggregate latency of "All ASUs" for each of the utilization levels and divided it by 6 (the number of utilization levels)
- Number of times above 1ms of latency - I just counted the number of cells in the I/O throughput table for each of the ASUs (15 cells total) that the test reported above 1ms of latency
|HP 3PAR 7400||$148,737||$0.58||$133,019|
|HP P6500 EVA (SPC-1E)||$130,982||$6.55||$260,239|
|IBM Storwize V7000||$181,029||$1.50||$121,389|
|HDS Unified Storage 150||$198,367||$1.59||$118,236|
|Huawei OceanStor Dorado2100 G2||$227,062||$0.57||$61,186|
|Huawei OceanStor Dorado5100||$488,617||$0.81||$77,681|
|HP 3PAR 7400||3,250 GB||1,159 GB||70.46%|
|HP P6500 EVA (SPC-1E)||1,600 GB||515 GB||64.41%|
|IBM Storwize V7000||3,600 GB||1,546 GB||84.87%|
|HDS Unified Storage 150||3,999 GB||1,717 GB||85.90%|
|Huawei OceanStor Dorado2100 G2||10,002 GB||3,801 GB||75.97%|
|Huawei OceanStor Dorado5100||19,204 GB||6,442 GB||67.09%|
The new utilization charts in the latest 3PAR/Huawei tests are quite nice to see, really good illustrations as to where the space is being used. They consume a full 3 pages in the executive summary. I wish SPC would go back and revise previous reports so they have these new easier forms of disclosure in them. The data is there for users to compute on their own.
This is a SPC-1e result rather than SPC-1 - I believe the work load is the same(?) they just measure power draw in addition to everything else. The stark contrast between the new 3PAR and the older P6500 is remarkable from every angle whether it is cost, performance, capacity, latency. Any way you slice it (well except power I am sure 3PAR draws more power )
It is somewhat interesting in the power results for the P6500 that there is only a 16 watt difference between 0% load and 100% load.
I noticed that the P6500 is no longer being sold (P6550 was released to replace it - and the 3PAR 7000-series was released to replace the P6550 which is still being sold).
While I don't expect Huawei to be a common rival for the other three outside of China perhaps, I find their configuration very curious. On the 5100 with such a large number of apparently low cost SLC(!) SSDs, and "short stroking" (even though there are no spindles I guess the term can still apply) they have managed to provide a significant amount of performance at a reasonable cost. I am confused though they claim SLC but yet they have so many disks(would think you'd need fewer with SLC), at the same time at a much lower cost. Doesn't compute..
Huawei appears to have absolutely no software options for these products - no thin provisioning, no snapshots, no replication, nothing. Usually vendors don't include any software options as part of the testing since they are not used. In this case the options don't appear to exist at all.
They seem to be more in line with something that LSI/NetApp E-series, or Infortrend or something like that rather than an enterprise storage system. Though looking at Infortrend's site earlier this morning shows them supporting thin provisioning, snapshots, and replication on some arrays. Even NetApp seems to have thin provisioning on their E-series included.
3PAR's utilization in this test is hampered by (relatively) excessive metadata, the utilization results say only 7% unused storage ratio which on the surface is an excellent number. But this number excludes metadata which in this case is 13%(418GB) of the system. Given the small capacity of the system this has a significant impact on utilization (compared to 3PAR's past results). They are working to improve this.
The next largest meta data size in the above systems is IBM which has only 1GB of metadata (about 99.8% less than 3PAR). I would be surprised if 3PAR was not able to significantly slash the metadata size in the future.
In the grand scheme of things this problem is pretty trivial. It's not as if the meta data scales linearly with the system.
Only quad controller system
3PAR is the only SSD solution above tested with 4 controllers(totalling 4 Gen4 ASICs, 24 x 1.8Ghz Xeon CPU cores, 64GB of data cache, and 32GB of control cache), meaning with their persistent cache technology(which is included at no extra cost) you can lose a controller and keep a fully protected and mirrored write cache. I don't believe any of the other systems are even capable of such a configuration regardless of cost.
The 7400 managed to stay below 1 millisecond response times even at maximum utilization which is quite impressive.
Thin provisioning built in
The new license model of the 3PAR 7000 series means this is the first SPC-1 result to include thin provisioning for a 3PAR system at least. I'm sure they did not use thin provisioning(no point when your driving to max utilization), but from a cost perspective it is something good to keep in mind. In the past thin provisioning would add significant costs onto a 3PAR system. I believe thin provisioning is still a separate license on the P10000-series (though would not be surprised if that changes as well).
Low cost model
They managed to do all of this while remaining a lower cost offering than the competition - the economics of this new 7000 series are remarkable.
IBM's poor latency
IBM's V7000 latency is really terrible relative to HDS and HP. I guess that is one reason they bought TMS. Though it may take some time for them to integrate TMS technology (assuming they even try) to have similar software/availability capabilities as their main enterprise offerings.
With these results I believe 3PAR is showing well that they too can easily compete in the all SSD market opportunities without requiring excessive amounts of rack space or power circuits as some of their previous systems required. All of that performance(only 32 of the 48 drive bays are occupied!), in a small 4U package. Previously you'd likely be looking at a absolute minimum of half a rack!
I don't know whether or not 3PAR will release performance results for the 7000 series on spinning rust, it's not too important at this point though. The system architecture is distributed and they have proven time and again they can drive high utilization, so it's just a matter of knowing the performance capacity of the controllers (which we have here), and just throwing as much disk as you want at it. The 7400 series tops out at 480 disks at the moment - even if you loaded it up with 15k spindles you wouldn't come close to the peak performance of the controllers.
It is, of course nice to see 3PAR trouncing the primary competition in price, performance and latency. They have some work to do on utilization as mentioned above.
I got curious when I read the news article though so I did some quick math - the Dorado 5100 is powered by 96 x 200GB SSDs and 96GB of cache in a dual controller active-active configuration. Putting out an impressive 600,000 IOPS with the lowest latency (by far) that I have seen anyways. Also they did have a somewhat reasonable unused storage ratio of 32.35% (I would of liked to have seen much better given the performance of the box but I'll take what I can get).
But the numbers aren't too surprising - I mean SSDs are really fast right. What got me curious though is the # of IOPS coming out of each SSD to the front end, in this case it comes to 6,250 IOPS/SSD. Compared to some of the fastest disk-based systems this is about 25x faster per disk than spinning rust. There is no indication that I can see at least that tells what specific sort of SSD technology they are using(other than SLC). But 6,250 per disk seems like a far cry from the 10s of thousands of IOPS many SSDs claim to be able to do.
I'm not trying to say it's bad or anything but I found the stat curious.
I went ahead and looked at another all-SSD solution the IBM V7000, this time 18 x 200GB SSDs are providing roughly 120,000 IOPS also with really good latency with 16GB of data cache between the pair of controllers. Once again the numbers come to roughly 6,600 IOPS per SSD. IBM ran at an even better unused storage ratio of just under 15%, hard to get much better than that.
Texas memory systems (recently acquired by IBM), posted results for their RamSan-630 about a year ago, with 20 x 640GB SSDs pushing out roughly 400,000 IOPS with pretty good latency. This time however the numbers change - around 20,000 IOPS per SSD here, as far as I can tell there is no RAM cache either. The TMS system came in at a 20% unused storage ratio.
While there are no official results, HP did announce not long ago that an 'all SSD' variant of the P10000(just realized it is kind of strange to have two sub models(V400 and V800 which were the original 3PAR models) of the larger P10000 model), which said would get the same 450,000 IOPS on 512 x SSDs. The difference here is pretty stark with each SSD theoretically putting out only 878 IOPS(so roughly 3.5x faster than spinning rust).
At least originally I know originally 3PAR chose a slower STEC Mach8IOPS SSD primarily due to cost (it was something like 60% cheaper). STEC's own website shows the same SSD getting 10,000 IOPS (on a read test - whereas the disk they compared it to seemed to give around 250 IOPS). Still though you can tap out the 8 controllers with almost 1/4th the number of disks supported with these SSDs. I don't know whether or not the current generation of systems uses the same SSD or not.
I'll be the first to admit an all-SSD P10000 doesn't make a lot of sense to me, though it's nice that customers have that option if that's what they want (I never understood why all-SSD was not available before that didn't make sense either). HP says it is 70% less expensive than an all-disk variant, though are not specific whether they are using 100GB SSDs (I assume they are) vs 200GB SSDs.
Both TMS and Huawei advertise their respective systems as being "1 million IOPS", I suppose if you took one of each and striped them together that's about what you'd get ! Sort of reminds me of a slide show presentation I got from Hitachi right before their AMS2000-series launched one of the slides showed the # of IOPS from cache (they did not have a number for IOPS from disk at the time), which didn't seem like a terribly useful statistic.
So here you have individual SSDs providing anywhere from 900 to 20,000 IOPS per disk on the same test...
I'd really love to see SPC-1 results for the likes of Pure Storage, Nimble Storage, Nimbus Storage, and perhaps even someone like Tintri, just to see how they measure up on a common playing field, with a non trivial utilization rate. Especially with claims like this from Nimbus saying they can do 20M IOPS/rack, does that mean at 10% of usable capacity or greater than 50%? I really can't imagine what sort of workload would need that kind of I/O but there's probably one or two folks out there that can leverage it.
We now take you back to your regularly scheduled programming..
A few days ago I came across an article on Datacenter Knowledge that was talking about Flash reliability. As much as I'd love to think that just because it's solid state that it will last much longer, real world tests to-date haven't shown that to be true in many cases.
As NAND Flash devices age with use, the capability of the media to retain a programmed value begins to deteriorate. This deterioration is affected by the number of times a particular memory cell is programmed and subsequently erased. When a device is new, it has a powered off data retention capability of up to ten years. With use the retention capability of the device is reduced. Temperature also has an effect on how long a Flash component can retain its pro-grammed value with power removed. At high temperature the retention capabilities of the device are reduced. Data retention is not an issue with power applied to the SSD. The SSD drive contains firmware and hardware features that can monitor and refresh memory cells when power is applied.
I am of course not an expert in this kind of stuff, so was operating under the assumption that if the data is written then it's written and won't get "lost" if it is turned off for an extended period of time.
Seagate rates their Pulsar to retain data for up to one year without power at a temperature of 25 C (77 F).
Compare to what tape can do. 15-30 years of data retention.
Not that I think that SSD is a cost effective method to do backups!
I don't know what other manufacturers can do, I'm not picking on Seagate, but found the data tidbit really interesting.
(I originally had the manual open to try to find reliability/warranty specs on the drive to illustrate that many SSDs are not expected to last multiple decades as the original article suggested).
About damn time! I read earlier in the year on their forums that they were planning on ESX support for their next release of code, originally expected sometime in March/April or something. But that time came and went and saw no new updates.
I would be very interested to see how performance could be boosted and VM density incerased by leveraging local Fusion IO storage for swap in ESX. I know of a few 3PAR customers that say they get double the VM density per host vs other storage because of the better I/O they get from 3PAR, though of course Fusion IO is quite a bit snappier.
With VMware's ability to set swap file locations on a per-host basis, it's pretty easy to configure, in order to take advantage of it though you'd have to disable memory ballooning in the guests I think in order to force the host to swap. I don't think I would go so far as to try to put individual swap partitions on the local fusion IO for the guests to swap to directly, at least not when I'm using a shared storage system.
I just checked again, and as far as I can tell, still, from a blade perspective at least, still the only player offering Fusion IO modues for their blades is the HP c Class in the form of their IO Accelerator. With up to two expansion slots on the half width, and three on the full width blades, there's plenty of room for the 80, 160 GB SLC models or the 320GB MLC model. And if you were really crazy I guess you could use the "standard" Fusion IO cards with the blades by using the PCI Express expansion module, though that seems more geared towards video cards as upcomming VDI technologies leverage hardware GPU acceleration.
FusionIO claims to be able to write 5TB per day for 24 years, even if you cut that to 2TB per day for 5 years, it's quite an amazing claim.
From what I have seen (can't speak with personal experience just yet), the biggest advantage Fusion IO has over more traditional SSDs is write performance, of course to get optimal write performance on the system you do need to sacrifice space.
Unlike drive form factor devices, the ioDrive can be tuned to achieve a higher steady-state write performance than what it is shipped with from the factory.
Fusion IO does it again, another astonishing level of performance in such an efficient design, from the case study:
LLNL used Fusion’s ioMemory technology to create the world’s highest performance storage array. Using Fusion’s ioSANs and ioDrive Duos, the cluster achieves an unprecedented 40,800,000 IOPS and 320GB/s aggregate bandwidth.
Incredibly, Fusion’s ioMemory allowed LLNL to accomplish this feat in just two racks of appliances– something that would take a comparable hard disk-based solution over 43 racks. In fact, it would take over 100 of the SPC-1 benchmark’s leading all-flash vendor systems combined to match the performance, at a cost of over $300 million.
40 Million IOPS @ ~250 IOPS per 15K RPM disk your talking 160,000 disk drives.
Not all flash is created equal of course, many people don't understand that. They just see ooh this one is cheap, this one is not, not having any clue (shocker).
It's just flat out irresponsible to ignore such a industry changing technology, especially for workloads that deal with small (sub TB) amounts of data.
Grid Iron Systems seems to have left stealth mode somewhat recently, they are another start up that makes an accelerator appliance that sits in between your storage and your server(s). Kind of what Avere does on the NAS side, Grid Iron does on the SAN side with their "TurboCharger".
Certainly looks like an interesting product but it appears they make it "safe" by making it cache only reads, I want a SSD system that can cache writes too! (yes I know that wears the SSDs out faster I'm sure, but just do warranty replacement). I look forward to seeing some SPC-1 numbers on how Grid Iron can accelerate systems, at the same time I look forward to SPC-1 numbers on how automatic storage tiering can accelerate systems as well.
I'd also be interested in seeing how Grid Iron can accelerate NetApp systems vs using NetApp's own read-only PAM (since Grid Iron specifically mentions NetApp in their NAS accelerator, although yes I'm sure they just used NetApp as an example).
I don't visit the MySQL performance blog too often, but today happened to run across a very interesting post here comparing a Fusion IO card to an 8-disk 15k RPM RAID 1+0 array. Myself I've been interested in Fusion IO since I first heard about it, very interesting technology, have not used it personally yet.
The most interesting numbers to me was the comparably poor sequential write performance vs random write performance on the same card. Random write was upwards of 3 times faster.
Just thought it was kind of funny timing. Xiotech came to my company a few weeks ago touting their ISE systems, about the raw IOPS they can deliver(apparently they do something special with the SCSI protocol that gets them 25% more IOPS than you can normally get). I asked them about SSD and they knocked it saying it wasn't reliable enough for them(compared to their self healing ISEs).
Well apparently that wasn't the case, because it seems they might be using STEC SSDs in the near future according to The Register. What? No Seagate? As you may or may not be aware Xiotech's unique features come with an extremely tight integration with the disk drives, something they can only achieve by using a single brand, which is Seagate(who helped create the technology and later spun it out into Xiotech). Again, The Register has a great background on Xiotech and their technology.
My own take on their technology is it certainly looks interesting, their Emprise 5000 looks like a great little box as a standalone unit. It scales down extremely well. I'm not as convinced with how well it can scale up with the Emprise 7000 controllers though, they tried to extrapolate SPC-1 numbers from a single ISE 5000 to the same number of drives as a 3PAR T800 which I believe still holds the SPC-1 record at least for spinning disks anyways. Myself I'd like to see them actually test a high end 64-node ISE 7000 system for SPC-1 and show the results.
If your a MS shop you might appreciate Xiotech's ability to integrate with MS Excel, as a linux user myself I did not of course. I prefer something like perl. Funny that they said their first generation products integrated with perl, but their current ones do not at the moment.
This sort of about face with regards to SSD in such a short time frame of reminds me when NetApp put out a press release touting their de-duplication technology as being the best for customers only to come out a week later and say they are trying to buy Data Domain because they have better de-duplication technnology. I mean I would of expected Xiotech to say something along the lines of "we're working on it" or something. Perhaps the STEC news was an unintentional leak, or maybe their regional sales staff here was just not informed or something.
I was poking around again and came across a new product from Fusion IO which looked really cool. Their new Iodrive Octal, which packs 800,000 IOPS on a single card with 6 Gigabytes/second sustained bandwidth. To put this in perspective, a typical high end 15,000 RPM SAS/Fiber Channel disk drive can do about 250 IOPS. As far as I/O goes this is roughly the same as 3,200 drives. The densest high performance storage I know of is 3PAR who can pack 320 15,000 RPM drives in a rack in their S-class and T-class systems (others can do high density SATA, I'm not personally aware of others that can do high density 15,000 RPM drives for online data processing).
But anyways, in 3PAR's case that is 10 racks of drives, and three more racks for disk controllers(24 controllers), roughly 25,000 pounds of equipment(performance wise) in the palm of your hand with the Iodrive Octal. Most other storage arrays top out at between 200 and 225 disks per rack.
The Fusion IO solutions aren't for everyone of course they are targeted mostly at specialized applications with smaller data sets that require massive amounts of I/O. Or those that are able to distribute their applications amongst several systems using their PCIe cards.