Diggin' technology every day


New record holder for inefficient storage – VMware VSA

TechOps Guy: Nate

I came across this article last night and was honestly pretty shocked, it talks about the limitations of the new VMware Virtual Storage Appliance that was released along side vSphere 5. I think it is the second VSA to receive full VMware certification after the HP/Lefthand P4000.

The article states

Plus, this capacity will be limited by a 75% storage overhead requirement for RAID data protection. Thus, a VSA consisting of eight 2 TBs would have a raw capacity of 16 TB, but the 75% redundancy overhead would result in a maximum usable capacity of 4 TB.

VMware documentation cites high availability as the reason behind VSA’s capacity limitations: “The VSA cluster requires RAID10 virtual disks created from the physical disks, and the vSphere Storage Appliance uses RAID1 to maintain the VSA datastores’ replicas,” resulting in effective capacity of just 25% of the total physical hard disk capacity.

That's pretty pathetic! Some folks bang on NetApp for being inefficient in space, I've ragged on a couple of other folks for the same, but this VSA sets a new standard. Well there is this NEC system with 6%, though in NEC's case that was by choice. The current VSA architecture forces the low utilization on you whether you want it or not.

I don't doubt that VMware released the VSA "because they could", I'm sure they designed it primarily for their field reps to show off the shared storage abilities of vSphere from laptops and stuff like that (that was their main use of the Lefthand VSA when it first came out at least), given how crippled the VSA is(it doesn't stop at low utilization see the article for more), I can't imagine anyone wanting to use it - at any price.

The HP Lefthand VSA seems like a much better approach - it's more flexible, has more fault tolerance options, and appears to have an entry level price of about half that of the VMware VSA.

The only thing less efficient that I have come across is utilization in Amazon EC2 - where disk utilization rates in the low single digits are very common due to the broken cookie cutter design of the system.


More inefficient storage

TechOps Guy: Nate

Another random thought, got woken up this morning and wound up checking what's new on SPC-1, and a couple weeks ago the Chinese company Huawei posted results for their Oceanspace 8100 8-node storage system. This system seems to be similar to the likes of HDS USP/VSP, IBM SVC in that it has the ability to virtualize other storage systems behind it. The system is powered by 32 quad core processors or 128 CPU cores.

The thing that caught my eye is in every SPC-1 disclosure is the paragraph

Unused Storage Ratio: Total Unused Capacity (XXX GB) divided by Physical
Storage Capacity (XXX GB) and may not exceed 45%.

So what is Huawei's Unused storage ratio? - 44.77%

I wonder how hard it was for them to get under the 45% limit, I bet they were probably at 55-60% and had to yank a bunch of drives out or something to decrease their ratio.

From their full disclosure document it appears their tested system has roughly 261TB of unused storage on it. That's pretty bad, 3PAR F400 has a mere 75GB of unused capacity (0.14%) by contrast. The bigger T800 has roughly 21TB of unused capacity (15%).

One would think, that for Huawei, they would be better off using 146GB disks instead of the 300GB, 450GB and 600GB disks (another question is what is the point in mismatched disks for this test, maybe they didn't have enough of one drive type which would be odd for a drive array manufacturer - maybe they mixed drive types to drive the unused capacity perhaps after having started with nothing but 600GB disks).

Speaking of drive sizes, one company I know well has a lot of big Oracle databases and are I/O bound more than space bound, so it benefits them to use smaller disk drives, their current array manufacturer no longer offers 146GB disk drives so they are forced to pay quite a bit more for the bigger disks.

Lots of IOPS to be sure, 300,000 of them (260 IOPS per drive) and 320GB of cache (see note below!), but certainly seems that you could do this a better way..

Looking deeper into the full disclosure documents(Appendix C page 64) for the Huawei system reveals this little gem

The creatlun command creates a LUN with a capacity 1,716,606 MiB. The -p 0 parameter, in the creatlun command sets the read cache policy as no prefetch and the -m 0 parameter sets the write cache policy as write cache with no mirroring.

So they seem to be effectively disabling the read cache and disabling cache mirroring making all cache a write back cache that is not protected? I would imagine they ran the test and found their read cache ineffective so disabled it and devoted it to write cache and re-ran the test.

Submitting results without mirrored cache seems, well misleading to say the least. Glad there is full disclosure!

The approximate cost of the Huawei system seems to be about $2.2 million according to the google exchange rate.

While I am here, what is it with 8 node storage systems? What is magical about that number? I've seen a bunch of different ones both SAN and NAS that top out at eight. Not 10? not 6? Seems a strange coincidence, and has always bugged me for some reason.


Capacity Utilization: Storage

TechOps Guy: Nate

So I was browsing through that little drop down address bar in Firefox hitting the sites I usually hit, and I decided hey let's go look at what Pillar is doing. I've never used their stuff but I dig technology you know, so I like to try to keep tabs on companies and products that I haven't used, and may never consider using, good to see what the competition is up to, because you never know they may come out with something good.

Tired of the thin argument

So the CEO of Pillar has a blog, and he went on a mini rant about how 3PAR^H^H^H^HHP is going around telling people you can get 100TB of capacity in one of their 70TB arrays. I haven't read too deep into what the actual claim they are making is, but being so absolutely well versed in 3P..HP technology I can comment with confidence in what their strategy is and how they can achieve those results. Whether or not they are effective at communicating that is another story, I don't know because well I don't read everything they say.

Pillar notes that HP is saying that due to the 3PAR technologies you can get by with less and he's tired of hearing that old story over and over.

Forget about thin for the moment!

So let me spray paint another angle for everyone to see. As you know I do follow SPC-1 numbers pretty carefully. Again not that I really use them to make decisions, I just find the numbers and disclosure behind them very interesting and entertaining at times. It is "fun" to see what others can do with their stuff in a way that can be compared on a level playing field.

I wrote, what I consider a good article on SPC-1 benchmarks a while back, EMC gave me some flak because they don't believe SPC-1 is a valid test, when I believe EMC just doesn't like the disclosure requirements, but I'm sure you won't ever hear EMC say that.

SPC-1 Results

So let's take the one and only number that Pillar published, because, well that's all I have to go on, I have no personal experience with their stuff, and don't know anyone that uses it. So if this information is wrong it's wrong because the results they submitted were wrong.

So, the Pillar Axiom 600's results have not stood the test of time well at all, as you would of noticed in my original article, but to highlight:

  • System tested: January 12, 2009
  • SPC-1 IOPS performance: 64,992
  • SPC-1 Usable space: 10TB
  • Disks used: 288 x 146G 15k RAID 1
  • IOPS per disk: 226 IOPS/disk
  • Average Latency at peak load: 20.92ms
  • Capacity Utilization (my own metric I just thought of): 34.72 GB/disk
  • Cost per usable TB (my own metric extrapolated from SPC-1): $57,097 per TB
  • Cost per IOP (SPC-1 publishes this): $8.79

The 3PAR F400 by contrast was tested just 105 days later and absolutely destroyed the Pillar numbers, and unlike the Pillar numbers the F400 has held up very well against the test of time all the way to present day even:

  • System tested: April 27, 2009
  • SPC-1 IOPS performance: 93,050
  • SPC-1 Usable space: 27 TB
  • Disks used: 384 x 146G 15k RAID 1
  • IOPS per disk: 242 IOPS/disk
  • Average Latency at peak load: 8.85ms
  • Capacity Utilization: 70.432 GB/disk
  • Cost per usable TB: $20,312 per TB
  • Cost per IOP: $5.89

Controller Capacity

Now in my original post I indicated stark differences in some configurations that tested substantially less physical disks than the controllers supported, there are a couple of possibilities I can think of for this:

  • The people running the test didn't have enough disks to test (less likely)
  • The controllers on the system couldn't scale beyond the configuration tested, so to illustrate the best bang for your $ they tested with the optimal number of spindles to maximize performance (more likely)

So in Pillar's case I think the latter is the case as they tested with a pretty small fraction of what their system is advertised as being capable of supporting.


So taking that into account, the 3PAR gives you 27TB of usable capacity, note here we aren't even taking into account the thin technologies, just throw those out the window for a moment, let's simplify this.

The Pillar system gives you 10TB of usable capacity, the 3PAR system gives you about 270% more space and 130% more performance for less money.

What would a Pillar system look like(or Systems I guess I should say since we need more than one) that could give us 27TB usable capacity and 93,000 SPC-1 IOPS using 146G 15k RPM disks (again trying to keep level playing field here)?

Well I can only really guess, to reach the same level of performance Pillar would need an extra 124 disks, so 412 spindles. Maintaining the same level of short stroking that they are doing(34.7GB/disk), those extra 124 spindles only get you to roughly 14.3TB.

And I'm assuming here because my comments earlier about optimal number of disks to achieve performance, if you wanted to get those extra 124 spindles in you need a 2nd Axiom 600, and all the costs with the extra controllers and stuff. Controllers obviously carry a hefty premium over the disk drives. While the costs are published in Pillar's results I don't want to spend the time to try to extrapolate that angle.

And if you do in fact need more controllers, the system was tested with two controllers, if you have to go to four (tested 3PAR F400 has four), 3PAR has another advantage completely unrelated to SPC-1, the ability to maintain performance levels under degraded conditions (controller failure, software upgrade, whatever) with Persistent Cache. Run your same SPC-1 test, and yank a controller out from each system (3PAR and Pillar) and see what the results are. The numbers would be even more embarrassingly in 3PAR's favor thanks to their architecture and this key caching feature. Unlike most of 3PAR's feature add-ons, this one comes at no cost to the customer, the only requirement is you must have at least 4 controllers on the box.

So you still need to get to 27 TB of usable capacity. From here it can get really fuzzy because  you need to add enough spindles to get that high but then you need to adjust the level of short stroking your doing to use more of the space per drive, it wouldn't surprise me if this wasn't even possible on the Pillar system(not sure if any system can do it really,  but I don't know).

If Pillar can't adjust the size of the short stroking then the numbers are easy, at 34.7GB/disk they need 778 drives to get to 27TB of usable capacity, roughly double what 3PAR has.

Of course the performance of a two-system based Axiom 600 with 778 drives will likely outperform a 384-disk F400(I should hope so at least), but you see where I'm going.

I'm sure Pillar engineers could come up with a way to configure the system more optimally my 778 drive solution is crazy but from a math perspective it's the easiest and quickest thing I could come up with, with the data I have available to me.

This is also a good illustration why when I go looking at what Xiotech posts, I really can't compare them against 3PAR or anybody else, because they only submit results for ~16 drive systems. To me, it is not valid to compare a ~16 drive system to something that has hundreds of drives and try to extrapolate results. Xiotech really does give stellar results as far as capacity utilization and IOPS/disk and stuff, but they haven't yet demonstrated that those numbers are scalable beyond a single ISE enclosure - yet alone to several hundred disks.

I also believe the 3PAR T800 results could be better too, the person at 3PAR who was responsible for running the test was new to the company at the time and the way he laid out the system was, odd to say the least. The commands he used were even depreciated. But 3PAR isn't all that interested in re-testing, they're still the record holder for spinning rust in a single system(more than two years running now no doubt!).

Better response times to boot

You can see the 3PAR system performs with less than half the amount of latency that the Pillar system does despite the Pillar system short stroking their disks. Distributed RAID with full mesh architecture at work baby. I didn't even mention it but the Pillar system has double the cache than than the F400. I mean the comparison really almost isn't fair.

I'm sure Pillar has bigger and better things out now since they released the SPC-1 numbers for the Axiom, so this post has the obvious caveat that I am reporting based on what is published. They'd need to pull more than a rabbit out of a hat to make up these massive gaps though I think.

Another Angle

We could look at this another way as well, assuming for simplicity's sake for a moment that both systems scale lineally up or down, we can configure a 3PAR F400 with the same performance specs as the Pillar that was tested.

You'd need 268 disks on the 3PAR F400 to match the performance of the Pillar system. With those 268 disks you'd get 18.4 TB of usable space, same performance, fewer disks, and 184% additional usable capacity. And scaling the cost down like we scaled the performance down, the cost would drop to roughly $374,000, a full $200,000 less than Pillar for the same performance and more space.


So hopefully this answers the question with more clarity why you can get less storage from the 3PAR F400 and get the same or better performance and usable capacity than going with a Pillar Axiom 600.  At the end of the day 3PAR drives higher capacity utilization and delivers superior results for significantly less greenbacks. And I didn't even take 3PAR's thin technologies into account, the math there can become even more fuzzy depending on the customer's actual requirements and how well they can leverage thin built in.

You may be able to understand why HP was willing to go to the end of the earth to acquire 3PAR technology. And you may be able to understand why I am so drawn to that very same technology. And here I'm just talking about performance. Something that unlike other things(ease of use etc) is really easy to put hard numbers on.

The numbers are pretty simple to understand, and you can see why the big cheese at HP responsible for spear heading the 3PAR purchase said:

The thin provisioning, automatic storage tiering, multi-tenancy, shared-memory architecture, and built-in workload management and load balancing in the 3PAR arrays are years ahead of the competition, according to Donatelli, and therefore justify the $2.4bn the company paid to acquire 3PAR in a bidding war with rival Dell.

Maybe if I'm lucky I can trigger interest in The Register again by starting a blog war or something and make tech news! woohoo! that would be cool. Of course now that I said that it probably won't happen.

I'm told by people who know the Pillar CEO he is "raw", much like me. so it will be interesting to see the response. I think the best thing they can do is post new SPC-1 numbers with whatever the latest technology they have is, preferably on 146G 15k disks!

Side note

It was in fact my 3PAR rep that inspired me to write about this SPC-1 stuff, I was having a conversation with him earlier in the year where he didn't think the F400 was as competitive against the HDS AMS2500 as he felt it needed to be. I pointed out to him that despite the AMS2500 having similar SPC-1 IOPS and similar cost, the F400 offered almost twice the usable capacity. And the cost per usable TB was far higher on the 2500. He didn't realize this.  I did see this angle so felt the need to illustrate it. Hence my Cost per SPC-1 Usable TB. It's not a performance metric, but in my opinion from a cost perspective a very essential metric, at least for highly efficient systems.

(In case it wasn't obvious, I am by no means compensated by 3PAR in any way for anything I write, I have a deep passion for technology and they have some really amazing technology, and they make it easy to use and cost effective to boot)


How inefficient can you get?

TechOps Guy: Nate

[ the page says the system was tested in Jan 2010, so not recent, but I don't recall seeing it on the site before now, in any case it's still crazy]

I was about to put my laptop down when I decided hey let's go over to SPEC and see if there are any new NFS results posted.

So I did, you know me I am into that sort of stuff. I'm not a fan of NFS but for some reason the SPEC results still interest me.

So I go and see that NEC has posted some results. NEC isn't a very well known server or even IT supplier in the U.S. at least as far as I know. I'm sure they got decent market share over in Asia or something.

But anyways they posted some results, and I have to say I'm shocked. Either there is a glaring typo or that is just the worst NAS setup on the face of the planet.

It all comes down to usable capacity. I don't know how you can pull this off but they did - they apparently have 284 300GB disks on the system but only have 6.1 TB of usable space! That is roughly 83TB of raw storage and they only manage to get something like 6% capacity utilization out of the thing?

Why even bother with disks at all if your going to do that? Just go with a few SSDs.

But WAIT! .. WAIT! It gets better. That 6.1 TB of space is spread across -- wait for it -- 24 file systems.

12 filesystems were created and used per node. One of 24 filesystems consisted of 8 disks which were divided into two 4-disk RAID 1+0 pools, and each of the other 23 filesystems consisted of 12 disks which were divided into two 6-disk RAID 1+0 pools. There were 6 Disk Array Controllers. One Disk Array Controller controlled 47 disks, and each one of the other 5 controlled 48 disks.

I mean the only thing I can hope for is that the usable capacity is in fact a big typo.

Total Exported Capacity 6226.5GB

But if it's not I have to hand it to them for being brave enough to post such terrible results. That really takes some guts.