Diggin' technology every day

March 23, 2012

Hitachi trounces XIV in SPC-2 Costs

Filed under: Storage — Tags: , , — Nate @ 9:37 am

This really sort of surprised me. I came across a HP storage blog post which mentioned some new SPC-2 results for the P9500 aka Hitachi VSP, naturally I expected the system to cost quite a bit, and offer good performance but I was really not expecting these results.

A few months ago I wrote about what seemed like pretty impressive numbers from IBM XIV (albeit at a high cost), I didn’t realize how high of a cost until these latest results came out.

Not that any of my workloads are SPC-2 related (which is primarily throughput). I mean if I have a data warehouse I’d probably run HP Vertica (which slashes I/O requirements due to it’s design), negating the need for such a high performing system, if I was streaming media I would probably be running some sort of NAS – maybe Isilon or DDN, BlueArc – I don’t know. I’m pretty sure I would not be using one of these kinds of arrays though.

Anyways, the raw numbers came down to this:


  • 7.4GB/sec throughput
  • $152.34 per MB/sec of throughput (42MB/sec per disk)
  • ~$7,528 per usable TB (~150TB Usable)
  • Total system cost – $1.1M for 180 x 2TB SATA disks and 360GB cache

HP P9500 aka Hitachi VSP

  • 13.1GB/sec throughput
  • $88.34 per MB/sec of throughput (26MB/sec per disk)
  • ~$9,218 per usable TB (~126TB Usable)
  • Total system cost – $1.1M for 512 x 300GB 10k SAS disks and 512GB cache

The numbers are just startling to me, I never really expected the cost of the XIV to be so high in comparison to something like the P9500. In my original post I suspected that any SPC-1 numbers coming out of XIV(based on the SPC-2 configuration cost anyways) would put the XIV as the most expensive array on the market(per IOP), which is unfortunate given it’s limited scalability to 180 disks, 7200RPM-only and RAID 10 only. I wonder what, if anything(other than margin) keeps the price so high on XIV.

I’m sure a good source for getting the cost lower on the P9500 side was the choice to use RAID 5 instead of RAID 10. The previous Hitachi results, released in 2008 for the previous generation USP platform was mirroring. And of course XIV only supports mirroring.

It seems clear to me that the VSP is the winner here, I suspect the XIV probably includes more software out of the box, while the VSP is likely not an all-inclusive system.

IBM gets some slack cut to them since they were doing a SPC-2/E energy efficiency test, though not too much since if your spending $1M on a storage system the cost of energy isn’t going to be all that important(at least given average U.S. energy rates). I’m sure the P9500 with it’s 2.5″ drives are pretty energy efficient on their own anyways.

Where XIV really fell short was on the last test for Video on Demand, for some reason the performance tanked, less than 50% of the other tests( a full 10 Gigabytes/second less than VSP). I’m not sure what the weightings are for each of the tests but if IBM was lucky and the VOD test wasn’t there it would of helped them a lot.

The XIV as tested is maxed out, so any expansion would require an additional XIV. The P9500 is nowhere close to maxed out (though throughput could be maxed out, I don’t know).

March 20, 2012

Storage Reclamation under Linux

Filed under: Storage — Nate @ 1:32 pm

UPDATED – Oh the good ‘ol days, I remember when I first started using thin provisioning in late 2006, learning the ropes, learning ins and outs.. I can’t count how many times I did data migrations between volumes to reclaim space at the time.

Today times have changed, a few years ago many companies started introducing thin reclamation technologies which provides a means for the computer to communicate to the storage which blocks are not being used and can be reclaimed by the shared storage system for use in other volumes.

Initially software support for this was almost non existent, short of a Microsoft tool called sdelete, which was never designed with storage reclamation in mind (I assume ..) it would be the tool of choice for early reclamation systems since it had the ability to zero out deleted space in a volume. Of course it only worked on Windows boxes.

Later came support from Symantec in their Veritas products, and VMware in their VAAI technologies (though in VMware’s case it seems they suggest you disable support because storage arrays can’t keep up which drives latency up – wonder if that includes 3PAR ?). Then Oracle announced their own reclamation system which the built in co-operation with 3PAR. Myself at least, I have not seen many other announcements for such technology.

A few days ago I came across this post from Compellent which talks about the support of the SCSI UNMAP command in Red Hat Enterprise 6, apparently integrated into the ext4 file system (or somewhere else in the layer so that it is transparent, no special tool needed to reclaim space). That sounded really cool, my company is an Ubuntu shop at the moment, and as far as I can tell there is no such support in Ubuntu at this time (at least not with 10.04 LTS).

One of my co-workers pointed me to another tool called fstrim, which seems to do something similar as sdelete.

Fstrim is used on a mounted filesystem to discard (or “trim”) blocks which are not in use by the filesystem. This is useful for solid-state drives (SSDs) and thinly-provi-sioned storage.

I have not yet tried it. YAT (Yet Another Tool) I came across is zerofree, but this tool is not really too useful since it needs the file system to be in read only mode or totally unmounted.

fstrim is part of the util-linux package in newer Debian-based distributions (Ubuntu 10.04 excluded), so it should be supported by your distro if your on a current release. It is also part of the latest versions of Red Hat Enterprise.

Any other thin reclamation tools out there (for any platform), or any other thin reclamation support built into other software stacks ?

UPDATE – Upon further investigation I believe I determined that this new Linux functionality was introduced into the kernel as something called FITRIM which Ubuntu says was part of 2.6.36 kernel. I see an XFS page that mentions  real time reclamation (what RHEL 6 does) is apparently part of the 3.0 kernel, so I assume RHEL back ported those changes.

March 19, 2012

Apple and their dividends

Filed under: Random Thought — Nate @ 4:20 pm

I watch quite a bit of CNBC despite never having invested a dime in the markets. I found this news of Apple doing a stock buyback and offering a dividend curious.

People have been clamoring for a dividend from Apple for a while now, wanting Apple to return some of that near $100B cash stock pile to investors.  I’m no fan of Apple(to put it mildly), never really have been, but for some reason I always thought Apple should NOT offer a dividend nor do a stock buyback.

The reason? Just look at the trajectory of the stock, investors are getting rewarded by a large amount already, from about $225 two years ago to near $600 now. Keep the cash, sooner or later their whole iOS ecosystem will start to fizzle and they will go on a decline again,  when that is I don’t know but the cash could be used to sustain them for quite some time to come. As long as the stock keeps going up at a reasonable amount year over year (I hardly think what has happened to their stock is reasonable but that’s why I’m not an investor), with some room for corrections here and there, keep the cash.

Same goes for stock buybacks, these seem to be tools for companies that really don’t have any other way to boost their stock price, a sort of last resort, they are all out of ideas. Apple doesn’t seem to be near that situation.

Even with the dividend I saw people complaining this morning that it was too low.

Apple could do something crazy with their cash stockpile too – perhaps buy HP (market cap of $48B). Or Dell ($30B market cap), if for nothing else then for diversification.  Not that I’d be happy with an Apple acquisition of HP.. Apple wouldn’t have to do anything with the companies just let them run like they are now. For some reason I thought both HP and Dell’s market caps were much higher(I suppose they were, just a matter of time frame).

What I think Apple should do with their stock? is a massive split (10:1 ?). Something apparently they are considering. Not for any other reason than to allow the little guy to get in on the action easier, having a $600/share price is kind of excessive, not Berkshire Hathaway excessive ($122k/share) but still excessive.

With the Nasdaq hitting new 10+ year highs in recent days/weeks, I would be curious to see the charts of such indexes that have Apple in them, how they would look without the ~50 fold increase in Apple’s stock over the past 10 years). I seem to recall seeing/hearing that Dow won’t put Apple in their DJI index because it would skew the numbers too much due to how massive it is. One article here says that if Apple had been put in the DJI in 2009 instead of Cisco the Dow would be past 15,000 now.

CNBC reported that Steve Jobs was against dividends.

10GbaseT making a comeback ?

Filed under: Networking — Tags: — Nate @ 12:20 pm

Say it’s true… I’ve been a fan of 10GbaseT for a while now. Though it hasn’t really caught on in the industry, off the top of my head I can only think of Arista and Extreme who have embraced the standard from a switching perspective, with everyone else going with SFP+, or XFP or something else. Both Arista and Extreme obviously have SFP+ products as well, maybe XFP too, though I haven’t looked into why someone would use XFP over SFP+ or vise versa.

From what I know, the biggest thing that has held back adoption of 10GbaseT has been power usage. Also I think other industry organizations had given up waiting for 10GbaseT to materialize. Also cost was somewhat of a factor too, I recall at least with Extreme their 24-port 10GbaseT switch was about $10k more than their SFP+ switch (without any SFP+ adapters or cables), so it was priced similarly to an optical switch that was fairly fully populated with modules, making entry level pricing if you only needed say 10 ports initially quite a bit higher.

But I have read two different things (and heard a third)  recently, which I’m sure are related which hopefully points to a turning point in 10GbaseT adoption.

The first was a banner on Arista’s website.

The second  is this blog post talking about a new 10GbaseT chip from Intel.

Then the third thing I probably can’t talk about, so I won’t 🙂

I would love to have 10GbaseT over the passive copper cabling that most folks use now, that stuff is a pain to work with. While there are at least two switching companies that have 10GbaseT (I recall a Dell blade switch that had 10GbaseT support too), the number of NICs out there that support it is just about as sparse.

Not only that but I do like to color code my cables, and while CAT6 cables are obviously easy to get in many colors, it’s less common and harder to get those passive 10GbE cables in multiple colors, seems most everyone just has black.

Also, cable lengths are quite a bit more precise with CAT6 than with passive copper. For example from Extreme at least (I know I could go 3rd party if I wanted), their short cables are 1 meter and 3 meters. There’s a massive amount of distance between those two. CAT6 can be easily customized to any length and pre-made cables(I don’t make my own), can be fairly easily found to be in 1 foot (or even half a foot) increments.

SFP Passive copper 10GbE cable

I wonder if there are (or will there be) 10GbaseT SFP+ GBICs (so existing switches could support 10GbaseT without wholesale replacement) ? I know there are 1GbE SFP+ GBICs.


March 17, 2012

Who uses Legacy storage?

Filed under: Random Thought,Storage — Tags: — Nate @ 3:34 pm

Still really busy these days haven’t had time to post much but I was just reading someone’s LinkedIn profile who works at a storage company and it got me thinking.

Who uses legacy storage? It seems almost everyone these days tries to benchmark their storage system against legacy storage.  Short of something like maybe direct attached storage which has no functionality, legacy storage has been dead for a long time now. What should the new benchmark be? How can you go about (trying to) measuring it?  I’m not sure what the answer is.

When is thin, thin?

One thing that has been in my mind a lot on this topic recently is how 3PAR constantly harps on about their efficient allocation at 16kB blocks. I think I’ve tried to explain this in the past but I wanted to write about it again. I wrote a comment on it in a HP blog recently I don’t think they published the comment though (haven’t checked for a few weeks maybe they did). But they try to say they are more efficient (by dozens or hundreds of times) than other platforms because of this 16kB allocation thing-a-ma-bob.

I’ve never seen this as an advantage to their platform. Whether you allocate in 16kB chunks or perhaps 42MB chunks in the case of Hitachi, it’s still a tiny amount of data in any case and really is a rounding error. If you have 100 volumes and they all have 42MB of slack hanging off the back of them, that’s 4.2GB of data, it’s nothing.

What 3PAR doesn’t tell you is this 16kB allocation unit is what a volume draws from a storage pool (Common Provisioning Group in 3PAR terms – which is basically a storage template or policy which defines things like RAID type, disk type, placement of data, protection level etc). They don’t tell you up front how much these storage pools provision storage on, which is in-part based on the number of controllers in the system.

If your volumes max out a CPG’s allocated space and it needs more, it won’t grab 16kB, it will grab (usually at least) 32GB, this is adjustable. This is – I believe in part how 3PAR addresses minimizing impact of thin provisioning with large amounts of I/O, because it allocates these pools with larger chunks of data up front. They even suggest that if you have a really large amount of growth that you increase the allocation unit even higher.

Growth Increments for CPGs on 3PAR

I bet you haven’t heard HP/3PAR say their system grows in 128GB increments recently 🙂

It is important to note, or to remember, that a CPG can be home to hundreds of volumes, so it’s up to the user, if they only have one drive type for example maybe they only want 1 CPG.  But I think as they use the system they will likely go down a similar path that I have and have more.

If you only have one or two CPGs on the system it’s probably not a big deal, though the space does add up. Still I think for the most part even this level of allocation can be a rounding error. Unless you have a large number of CPGs.

Myself, on my 3PAR arrays I use CPGs not just for determining data characteristics of the volumes but also for organizational purposes / space management. So I can look at one number and see all of the volumes dedicated to development purposes are X in size, or set an aggregate growth warning on a collection of volumes. I think CPGs work very well for this purpose. The flip side is you can end up wasting a lot more space. Recently on my new 3PAR system I went through and manually set the allocation level of a few of my CPGs from 32GB down to 8GB because I know the growth of those CPGs will be minimal. At the time I had maybe 400-450GB of slack space in the CPGs, not as thin as they may want you to think (I have around 10 CPGs on this array). So I changed the allocation unit and compacted the CPGs which reclaimed a bunch of space.

Again, in the grand scheme of things that’s not that much data.

For me 3PAR has always been more about higher utilizations which are made possible by the chunklet design and the true wide striping, the true active-active clustered controllers, one of the only(perhaps one of if not the first?) storage designs in the industry that goes beyond two controllers, and the ASIC acceleration which is at the heart of the performance and scalability. Then there is the ease of use and stuff, but I won’t talk about that anymore I’ve already covered it many times. One of my favorite aspects of the platform is the fact that they use the same design on everything from the low end to the high end, the only difference really is scale. It’s also part of the reason why their entry level pricing can be quite a bit higher than entry level pricing from others since there is the extra sauce in there that the competition isn’t willing or able to put on their low end box(s).

Sacrificing for data availability

I was talking to Compellent recently learning about some of their stuff for a project over in Europe and they told me their best practice (not a requirement) is to have 1 hot spare of each drive type (I think drive type meaning SAS or SATA, I don’t think drive size matters but am not sure) per drive chassis/cage/shelf.

They, like many other array makers don’t seem to support the use of low parity RAID (like RAID 50 3+1, or 4+1), they (like others) lean towards higher data:parity ratios I think in part because they have dedicated parity disks(they either had a hard time explaining to me how data is distributed or I had a hard time understanding, or both..), and dedicating 25% of your spindles to parity is very excessive, but in the 3PAR world dedicating 25% of your capacity  to parity is not excessive(when compared to RAID 10 where there is a 50% overhead anyways).

There are no dedicated parity, or dedicated spares on a 3PAR system so you do not lose any I/O capacity, in fact you gain it.

The benefits to a RAID 50 3+1 configuration are a couple fold – you get pretty close to RAID 10 performance, and you can most likely (depending on the # of shelves) suffer a shelf failure w/o data loss or downtime(downtime may vary depending on your I/O requirements and I/O capacity after those disks are gone).

It’s a best practice (again, not a requirement) in the 3PAR world to provide this level of availability (losing an entire shelf), not because you lose shelves often but just because it’s so easy to configure and is self managing. With a 4, or 8-shelf configuration I do like RAID 50 3+1. In an 8-shelf configuration maybe I have some data volumes that don’t need as much performance so I could go with a 7+1 configuration and still retain shelf-level availability.

Or, with CPGs you could have some volumes retain shelf-level availability and other volumes not have it, up to you. I prefer to keep all volumes with shelf level availability. The added space you get with a higher data:parity ratio really has diminishing returns.

Here’s a graphic from 3PAR which illustrates the dimishing returns(at least on their platform, I think the application they used to measure was Oracle DB):

The impact of RAID on I/O and capacity

3PAR can take this to an even higher extreme on their lower end F-class series which uses daisy chaining in order to get to full capacity (max chain length is 2 shelves). There is a availability level called port level availability which I always knew was there but never really learned what it truly was until last week.

Port level availability applies only to systems that have daisy chained chassis and protects the system from the failure of an entire chain. So two drive shelves basically. Like the other forms of availability this is fully automated, though if you want to go out of your way to take advantage of it you need to use a RAID level that is compliant with your setup to leverage port level availability otherwise the system will automatically default to a lower level of availability (or will prevent you from creating the policy in the first place because it is not possible on your configuration).

Port level availability does not apply to the S/T/V series of systems as there is no daisy chaining done on those boxes (unless you have a ~10 year old S-series system which they did support chaining – up to 2,560 drives on that first generation S800 – back in the days of 9-18GB disks).

Powered by WordPress