Diggin' technology every day

December 4, 2012

3PAR: The Next Generation

Filed under: Storage — Tags: , — Nate @ 12:40 am

(Cue Star Trek: The Next Generation theme music)

[Side note: I think this is one of my most popular post ever with nearly 3,000 hits to it so far (excluding my own IPs). Thanks for reading!]

[I get the feeling I will get lots of people linking to this since I suspect what is below will be the most complete guide as to what was released – for those of you that haven’t been here before I am in no way associated with HP or 3PAR – or compensated by them in any way of course! Just been using it for a long time and it’s one of the very few technologies that I am passionate about – I have written a ton about 3PAR over past three years]

HP felt their new storage announcements were so ground breaking that they decided to have a special event a day before HP Discover is supposed to start. They say it’s the biggest announcement for storage from HP in more than a decade.

I first got wind of what was coming last Fall, though there wasn’t much information available at the time other than a picture and some thoughts as to what might happen. Stuff wasn’t nailed down yet. I was fortunate enough to finally visit 3PAR HQ a couple of months ago and get a much more in depth briefing as to what was coming, and I’ll tell you what it’s been damn hard to contain my excitement.

HP announced a 75% year over year increase in 3PAR sales, along with more than 1,200 new customers in 2012 alone. Along with that HP said that their StoreOnce growth is 45% year over year.

By contrast HP did not reveal any growth numbers for either their Lefthand  StoreVirtual platform nor their IBRIX StoreAll platforms.

David Scott, former CEO of 3PAR tried to set the tone as a general storage product launch, they have enhancements to primary storage, to file/object scale-out storage as well as backup/archive storage.

You know I’m biased, I don’t try to hide that. But it was obvious to me at the end of the presentation this announcement was all about one thing: David’s baby – 3PAR.

Based on the web site, I believe the T-class of 3PAR systems is finally retired now. Replaced last year by the V-Class (aka P10000 or 10400 and 10800)

Biggest changes to 3PAR in at least six years

The products that are coming out today are in my opinion, the largest set of product (AND policy) enhancements/changes/etc from 3PAR in at least the past six years that I’ve been a customer.

First – a blast from the past.

The first mid range 3PAR system – the E200

Hello 2006!

There is some re-hashing of old concepts, specifically the concept of mid range. 3PAR introduced their first mid range system back in 2006, which was the system I was able to deploy – the E200. The E200 was a dual node system that went up to 4GB data cache per controller and up to 128 drives or 96TB of usable capacity whichever came first. It was powered by the same software and same second generation ASIC (code named Eagle if I remember right) that was in the high end S-class at the time.

The E200 was replaced by the F200, and the product line extended to include the first quad controller mid range system the F400 in 2009. The F-class, along with the T-class (which replaced the S-class) had the third generation ASIC in it (code named Osprey if I remember right?? maybe I have those reversed). The V-class which was released last year, along with what came out today has the 4th generation ASIC (code named Harrier).

To-date – as far as I know the F400 is still the most efficient SPC-1 result out there, with greater than 99% storage utilization – no other platforms (3PAR included) before or since have come close.

These systems, while coined mid range in the 3PAR world were still fairly costly. The main reason behind this was the 3PAR architecture itself. It is a high end architecture. Where other vendors like EMC and HDS chose radically different designs for their high end vs. their mid range, 3PAR aimed a shrink ray at their system and kept the design the same. NetApp on the other hand was an exception – they too have a single architecture that scales from the bottom on up. Though as you might expect – NetApp and 3PAR architectures aren’t remotely comparable.

Here is a diagram of the V-series controller architecture, which is very similar to the 7200 and 7400, just at a much larger scale:

3PAR V-Series ASIC/CPU/PCI/Memory Architecture

Here is a diagram of the inter-node communications on an 8-node P10800, or T800 before it, again similar to the new 7000-series just larger scale:

3PAR Cluster Architecture with low cost high speed passive backplane with point to point connections totalling 96 Gigabytes/second of throughput

Another reason for the higher costs was the capacity based licensing (& associated support). Some things were licensed per controller pair, some things based on raw capacity, some things licensed per system, etc. 3PAR licensing was not very friendly to the newbie.

Renamed Products

There was some basic name changes for 3PAR product lines:

  • The HP 3PAR InServ is now the HP 3PAR StorServ
  • The HP 3PAR V800 is now the HP 3PAR 10800
  • The HP 3PAR V400 is now the HP 3PAR 10400

The 3PAR 7000-series – mid range done right

The 3PAR 7000-series leverages all of the same tier one technology that is in the high end platform and puts it in a very affordable package, starting at roughly $25,000 for a two-node 7200 system, and $32,000 for an entry level two-node 7400 system(which can later be expanded to four nodes, non disruptively).

I’ve seen the base 7200 model (2 controllers, no disks, 3 year 24×7 4-hour on site support “parts only”) online for as low as $10,000.

HP says this puts 3PAR in a new $11 Billion market that it was previously unable to compete.

This represents roughly a 55-65% discount over the previous F-class mid range 3PAR solution. More on this later.

Note that it is not possible to upgrade in place a 7200 to a 7400. So you still have to be sure if you want a 4-node capable system to choose the 7400 up front (you can, of course purchase a two-node 7400 and add the other two nodes later).

Dual vs quad controller

The controller configurations are different between the two and the 7400 has extra cluster cross connects to unify the cluster across enclosures. The 7400 is the first 3PAR system that is not leveraging a passive backplane for all inter-node communications. I don’t know what technology 3PAR is using to provide this interconnect over a physical cable – it may be entirely proprietary. They use their own custom light weight protocols on the connection, so from a software standpoint it is their own stuff. Hardware – I don’t have that information yet.

A unique and key selling point for having a 4-node 3PAR system is persistent cache, which keeps the cache in write back mode during planned or unplanned controller maintenance.

3PAR Persistent Cache mirrors cache from a degraded controller pair to another pair in the cluster automatically.

The 3PAR 7000 series is based on what I believe is the Xyratex OneStor SP-2224 enclosure, the same one IBM uses for their V7000 StorWize system (again, speculation). Speaking of the V7000 I learned tonight that this IBM system implemented RAID 5 in software resulting in terrible performance. 3PAR RAID 5 is well – you really can’t get any faster than 3PAR RAID, that’s another topic though.

3PAR 7000 Series StorServs

3PAR 7000 Series StorServs

3PAR has managed to keep it’s yellow color, and not go to the HP beige/grey. Somewhat surprising though I’m told it’s because it helps the systems stand out in the data center.

The 7000 series comes in two flavors – a two node 7200, and a two or four node 7400. Both will be available starting December 14.

2.5″ or 3.5″ (or both)

There is also a 3.5″ drive enclosure for large capacity SAS (up to 3TB today). There are also 3.5″ SSDs but their capacities are unchanged from the 2.5″ variety – I suspect they are just 2.5″ drives in a caddy. This is based, I believe on the Xyratex OneStor SP-2424.

Xyratex OneStor SP-2424

This is a 4U, 24-drive enclosure for disks only(controllers go in the 2U chassis). 3PAR kept their system flexible by continuing to allow customers to use large capacity disks, however do keep in mind that for the best availability you do need to maintain at least two (RAID 10),  three (RAID 5), or six (RAID 6)  drive enclosures. You can forgo cage level availability if you want, but I wouldn’t recommend it – that provides an extra layer of protection from hardware faults, at basically no cost of complexity on the software side (no manual layouts of volumes etc).

HP has never supported the high density 3.5″ disk chassis on the mid range systems I believe primarily for cost, as they are custom designed. By contrast the high end systems only support the high density enclosures at this time.

3PAR High Density 3.5" Disk Chassis - not available on mid range systems

The high end chassis is designed for high availability. The disks are not directly accessible with this design. In order to replace disks the typical process is to run a software task on the array which then migrates all of the data from the disks in that particular drive sled (pack of four drives), to other disks on the system(any disks of the same RPM), once the drive sled is evacuated it can be safely removed. Another method is you can just pull the sled, the system will go into logging mode for writes for those disks(sending the writes elsewhere), and you have roughly seven minutes to do what you need to do and re-insert the sled before the system marks those drives as failed and begins the rebuild process.

The one thing that HP does not allow on SP-2424-based 3.5″ drive chassis is high performance (10 or 15K RPM) drives. So you will not be able to build a 7000-series with the same 650GB 15k RPM drives that are available on the high end 10000-series. However they do have a nice 900GB 10k RPM option in a 2.5″ form factor which I think is a good compromise.  Or you could go with a 300GB 15k RPM 2.5″. I don’t think there is a technical reason behind this, so I imagine if enough customers really want this sort of setup and yell about it, then HP will cave and start supporting it. Probably won’t be enough demand though.

Basic array specifications

72002250TB144Up to 12x8Gbps FC OR
4x8Gbps FC AND 4x10Gbps iSCSI
74004864TB480Up to 24x8Gbps FC OR
8x8Gbps FC AND 8x10Gbps iSCSI
104004800TB960Up to 96x8Gbps FC ports
Up to 16x10Gbps iSCSI
1080081600TB1920Up to 192x8Gbps FC ports
Up to 32x10Gbps iSCSI

(Note: All current 3PAR arrays have dedicated gigabit network ports on each controller for IP-based replication)

In a nut shell, vs the F-class mid range systems, the new 7000-series:

  • Doubles the data cache per controller to 12GB compared to F200, almost triple if you compare the 7400 to the F200/F400)
  • Doubles the control cache per controller to 8GB, The control cache is dedicated memory for the operating system completely isolated from the data cache.
  • Brings PCI-Express support to the 3PAR mid range allowing for 8Gbps Fibre Channel and 10Gbps iSCSI
  • Brings the mid range up to spec with the latest 4th generation ASIC, and latest Intel processor technology.
  • Nearly triples the raw capacity
  • Moves from an entirely Fibre channel based system to a SAS back end with a Fibre front end
  • Moves from exclusively 3.5″ drives to primarily 2.5″ drives with a couple 3.5″ drive options
  • Brings FC0E support to the 3PAR mid range (in 2013) for the four customers who use FCoE.
  • Cuts the size of the controllers by more than half
  • Obviously dramatically increases the I/O and throughput of the system with the new ASIC with PCIe, faster CPU cores, more CPU cores(in 7400)  and the extra cache.

Where’s the Control Cache?

Control cache is basically dedicated memory associated with the Intel processors to run the Debian Linux operating system which is the base for 3PAR’s own software layer.

HP apparently has removed all references to the control cache in the specifications, I don’t understand why. I verified with 3PAR last night that there was no re-design in that department, the separated control cache still exists, and as previously mentioned is 8GB on the 7000-series. It’s important to note that some other storage platforms share the same memory for both data and control cache and they give you a single number for how much cache there is – when in reality the data cache can be quite a bit less.

Differences between the 7200 and 7400 series controllers

Unlike previous generations of 3PAR systems, where all controllers for a given class of system were identical, the new controllers for the 104800 vs 10800, as well as the 7200 vs 7400 are fairly different.

  • 7200 has quad core 1.8Ghz CPUs, 7400 has hex core 1.8Ghz CPUs.
  • 7200 has 12GB cache/controller, 7400 has 16GB/controller.
  • 7200 supports 144 disks/controller pair, 7400 is 240 disks.
  • Along that same note 7200 supports 5 disk enclosures/pair, 7400 supports nine.
  • 7400 has extra cluster interconnects to link two enclosures together forming a mesh active cluster.

iSCSI No longer a second class citizen

3PAR has really only sort of half heartily embraced iSCSI over the years, their customer base was solidly fibre channel. When you talk to them of course they’ll say yes they do iSCSI as well as anyone else but the truth is they didn’t. They didn’t because the iSCSI HBA that they used was the 4000 series from Qlogic. The most critical failing of this part is it’s pathetic throughput. Even though it has 2x1Gbps ports, the card itself is only capable of 1Gbps of throughput. So you look at your 3PAR array and make a decision:

  • I can install a 4x4Gbps Fibre channel card and push the PCI-X bus to the limit
  • I can install a 2x1Gbps iSCSI card and hobble along with less capacity than a single fibre channel connection

I really don’t understand why they did not go back and re-visit alternative iSCSI HBA suppliers since they kept the same HBA for a whole six years. I would of liked to have seen at least a quad port 1Gbps card that could do 4Gbps of throughput. I hammered on them for years it just wasn’t a priority.

But no more! I don’t know what card they are using now, but it is PCIe and it is 10Gbps! Of course the same applies to the 10000-series – I’d assume they are using the same HBA in both but I am not certain.

Lower cost across the board for the SME

For me these details are just as much, if not more exciting than the new hardware itself. These are the sorts of details people don’t learn about until you actually get into the process of evaluating or purchasing a system.

Traditionally 3PAR has all been about margin – at one point I believe they were known to have the highest margins in the industry (pre acquisition). I don’t know where that point stands today, but from an up front standpoint they were not a cheap platform to use. I’ve always gotten a ton of value out of the platform, making the cost from my standpoint trivial to justify. But to less experienced management out there they often see cost per TB or cost per drive or support costs or whatever, compared to other platforms at a high level they often cost more. How much value you derive from those costs can very greatly.

Now it’s obvious that HP is shifting 3PAR’s strategy from something that is entirely margin focused to most likely lower margins but orders of magnitude more volume to make up for it.

I do not know if any of these apply to anything other than the 7000-series, for now assume they do not.

Thin licensing included in base software

Winning the no brainer of the year award in the storage category HP is throwing in all thin licensing as part of the  array with the base license. Prior to this there were separate charges to license thin functionality based on how much written storage was used for thin provisioning. You could license only 10TB on a 100TB array if you want, but you lose the ability to provision new thin provisioned volumes if you exceed that license (I believe there is no impact on existing volumes, but the system will pester you on a daily basis that you are in violation of the license). This approach often caught customers off guard during upgrades – they sometimes thought they only needed to buy disks – but they needed software licenses for those disks, as well as support for those software licenses.

HP finally realized that thin provisioning is the norm rather than the exception. HP is borrowing a page from the Dell Compellent handbook here.

Software License costs capped

Traditionally, most of 3PAR’s software features are based upon some measure of capacity of the system, in most cases it is raw capacity, for thin provisioning it is a more arbitrary value.

HP is once again following the Dell Compellent handbook which caps license costs at a set value(in Dell’s case I believe it is 96 spindles). For the 3PAR 7000-series the software license caps are:

  • 7200: 48 drives (33% of array capacity)
  • 7400: 168 drives (35% of array capacity)

Easy setup with Smart Start

Leveraging technology from the EVA line of arrays, HP has radically simplified the installation process of a 7000-series array, so much so that the customer can now perform the installation on their own without professional services. This is huge for this market segment. The up front professional services to install a mid range F200 storage system had a list price of $10,000 (as of last year anyway).

User serviceable components

Again for the first time in 3PAR’s history a customer will be allowed to replace their own components (disks at least, I assume controllers as well though). This again is huge – it will slash the entry level pricing for support for organizations that have local support staff available.

The 7000-series comes by default with a 24x7x365 4-hour on site support (parts only). I believe software support and higher end on site services are available for an additional charge.

All SSD 7000 series

Like the 10000-series, the 7000-series can run on 100% SSDs, a configuration that for some reason was not possible on the previous F-series of midrange systems (also I think T-class could not as well).

HP claims that with a maximum configuration, a 4-node 7400 maxed out with 240 x 100 or 200GB SSDs the system can achieve 320,000 IOPS, a number which HP claims is a 2.4x performance advantage to their closest priced competitor. This number is based on a 100% random read test with 8kB block sizes @ 1.6 milliseconds of latency. SPC-1 numbers are coming – I’d guesstimate that SPC-1 for the 7400 will be in the ~110,000 IOPS range since it’s roughly 1/4th the power of a 10800 (half the nodes, and each node has half the ASICs & CPUs and far less data cache).

HP is also announcing their intention to develop a purpose built all-SSD solution based on 3PAR technology.

Other software announcements

Most of them from here.

Priority Optimization

For a long time 3PAR has touted it’s ability to handle many workloads of different types simultaneously, providing multiple levels of QoS on a single array. This was true, to a point.

3PAR: Mixed quality of service in the same array

While it is true that you can provide different levels of QoS on the same system, 3PAR customers such as myself realized years ago that it could be better. A workload has the potential to blow out the caches on the controllers (my biggest performance headache with 3PAR – it doesn’t happen often, all things considered I’d say it’s probably a minor issue compared to competing platforms but for me it’s a pain!). This is even more risky in a larger service provider environment where the operator has no idea what kind of workloads the customers will be running. Sure you can do funky things like carve the system up so less of it is impacted when that sort of event happens but there are trade offs there as well.

Priority Optimization

The 3PAR world is changing – with Priority Optimization – a feature that essentially beta at this point, allows the operator to set thresholds both on an IOPS as well as bandwidth perspective. The system reacts basically in real time. Now on a 3PAR platform you can guarantee a certain level of performance to a workload. Whereas in the past, there was a lot more hope involved.  Correct me if I’m wrong but I thought this sort of QoS was exactly the sort of thing that Oracle Pillar used to tout. I’m not sure if they had knobs like this, but I do recall them touting QoS a lot.

Priority Optimization will be available sometime in 2013 – I’d imagine it’d be early 2013 but not sure.

Autonomic Replication

As I’ve said before – I’ve never used 3PAR replication – never needed it. I’ve tended to build things so that data is replicated via other means, and low level volume-based replication is just overkill – not to mention the software licensing costs.

3PAR Synchronous long distance replication: unique in the mid range

But many others I’m sure do use it, and this industry first as HP called it is pretty neat. Once you have your arrays connected, and your replication policies defined, when you create a new volume on the source array, all details revolving around replication are automatically configured to protect that volume according to the policy that is defined. 3PAR replication was already a breeze to configure, this just made it that much easier.

Autonomic Rebalance

3PAR has long had the ability to re-stripe data across all spindles when new disks were added, however this was always somewhat of a manual process, and it could take a not insignificant amount of time because your basically reading and re-writing every bit of data on the system. It was a very brute force approach. On top of that you had to have a software license for Dynamic Optimization in order to use it.

Autonomic rebalance is now included in the base software license and will automatically re-balance the system when resources change, new disks, new controllers etc. It will try, whenever possible, to move the least amount of data – so the brute force approach is gone, the system has the ability to be more intelligent about re-laying out data.

I believe this approach also came from the EVA storage platform.

Persistent Ports

This is a really cool feature as well – it gives the ability to provide redundant connectivity to multiple controllers on a 3PAR array without having to have host-based multipathing software. How is this possible? Basically it is NPIV for the array. Peer controllers can assume the world wide names for the ports on their partner controller. If a controller goes down, it’s peer assumes the identities of that controller’s ports, instantaneously providing connectivity for hosts that were (not directly) connected to the ports on the downed controller. This eliminates pauses for MPIO software to detect faults and fail over, and generally makes life a better place.

HP claims that some other tier 1 vendors can provide this functionality for software changes, but they do not today, provide it for hardware changes. 3PAR provides this technology for both hardware and software changes – on all of their currently shipping systems!

Peer Persistence

This is basically a pair of 3PAR arrays acting as a transparent fail over cluster for local or metro distances. From the PDF

The Peer Persistence software achieves this key enhancement by taking advantage of the Asymmetric Logical Unit Access (ALUA) capability that allows paths to a SCSI device to be marked as having different characteristics.

Peer persistence also allows for active-active to maximize available storage I/O under normal conditions.

Initially Peer Persistence is available for VMware, other platforms to follow.

3PAR Peer Persistence

Virtualized Service Processor

All 3PAR systems have come with a dedicated server known as the Service Processor, this acts as a proxy of sorts between the array and 3PAR support. It is used for alerting as well as remote administration. The hardware configuration of this server was quite inflexible and it made it needlessly complex to deploy in some scenarios (mainly due to having only a single network port).

The service processor was also rated to consume a mind boggling 300W of power (it may of been a legacy typo but that’s the number that was given in the specs).

The Service processor can now be deployed as a virtual machine!

Web Services API

3PAR has long had a CIM API (never really knew what that was to be honest), and it had a very easy-to-use CLI as well (used that tons!), but now they’ll have a RESTful Web Services API that uses JSON (ugh, I hate JSON as you might recall! If it’s not friends with grep or sed it’s not friends with me!). Fortunately for people like me we can keep using the CLI.

This API is, of course, designed to be integrated with other provisioning systems, whether it’s something off the shelf like OpenStack, or custom stuff organizations write on their own.

Additional levels of RAID 6

3PAR first introduced RAID 6 (aka RAID DP) with the aforementioned last major software release three years ago, with that version there were two options for RAID 6:

  • 6+2
  • 14+2

The new software adds several more options:

  • 4+2
  • 8+2
  • 10+2

Thick Conversion

I’m sure many customers have wanted this over the years as well. The new software will allow you to convert a thin volume to a thick (fat) volume. The main purpose of this of course is to save on licensing for thin provisioning when you have a volume that is fully provisioned (along with the likelihood of space reclamation on that volume being low as well). I know I could of used this years ago.. I always shook my fist at 3PAR when they made it easy to convert to thin, but really impossible to convert back to thick (without service disruption anyway). Basically all that is needed is to flip a bit in the OS (I’m sure the nitty gritty is more complicated).

Online Import

This basically allows EVA customers to migrate to 3PAR storage without disruption (in most cases).

System Tuner now included by default

The System Tuner package is now included in the base operating system (at least on 7000-series). System Tuner is a pretty neat little tool written many years ago that can look at a 3PAR system in real time, and based on thresholds that you define recommend dynamic movement of data around the system to optimize the data layout. From what I recall it was written in response to a particular big customer request to prove that they could do such data movement.

3PAR System Tuner moves chunklets around in real time

It is important to note that this tool is an on demand tool, when running it gathers tens of thousands of additional performance statistics from the chunklets on the system. It’s not something that can(or should be) run all the time. You need to run it when the workload you want to analyse is running in order to see if further chunklet optimization would benefit you.

System Tuner will maintain all existing availability policies automatically.

In the vast majority of cases the use of this tool is not required. In fact in my experience going back six years I’ve used it on a few different occasions, and in all cases it didn’t provide any benefit. The system generally does a very good job of distributing resources. But if your data access patterns change significantly, System Tuner may be for you – and now it’s included!

3PAR File Services

This announcement was terribly confusing to me at first. But I got some clarification. The file services module is based on the HP StoreEasy 3830 storage gateway.

  • Hardware platform is a DL380p Gen8 rack server attached to the 3PAR via Fibre Channel
  • Software platform is Microsoft Windows Storage Server 2012 Standard Edition
  • Provides NFS, CIFS for files and iSCSI for block
  • SMB 3.0 supported (I guess that is new, I don’t use CIFS much)
  • NFS 4.1 supported (I’ll stick to NFSv3, thanks – I assume that is supported as well)
  • Volumes up to 16TB in size
  • Integrated de-duplication (2:1 – 20:1)
  • VSS Integration – I believe that means no file system-based snapshots (e.g. transparent access of the snapshot from within the same volume) ?
  • Uses Microsoft clustering for optional HA
  • Other “Windowsey” things

The confusion comes from them putting this device under the 3PAR brand. It doesn’t take a rocket scientist to look at the spec sheets and see there are no Ethernet ports on the arrays for file serving. I’d be curious to find out the cost of this file services add-on myself, and what it’s user interface is like. I don’t believe there is any special integration between this file services module and 3PAR – it’s just a generic gateway appliance.

For someone with primarily a Linux background I have to admit I wouldn’t feel comfortable relying on a Microsoft implementation of NFS for my Linux boxes (by the same token I feel the same way about using Samba for serious Windows work – these days I wouldn’t consider it – I’d only use it for light duty simple stuff).

Oh while your at it HP – gimme a VSA of this thing too.

Good-bye EVA and VSP, I never knew thee

Today I think was one of the last nails in the coffin for EVA. Nowhere was EVA present on the presentation other than providing tools to seamlessly migrate off of EVA onto 3PAR. Well that and they have pulled some of the ease of use from EVA into 3PAR.

Literally nowhere was Hitachi VSP (aka HP P9500). Since HP acquired 3PAR the OEM’d Hitachi equipment has been somewhat of a fifth wheel in the HP storage portfolio. Like the EVA, HP had customers who wanted the VSP for things that 3PAR simply could not or would not do at the time. Whether it was mainframe connectivity, or perhaps ultra high speed data warehousing. When HP acquired 3PAR, the high end was still PCI-X based and there wasn’t a prayer it was going to be able to dish out 10+ GB/second. The V800 changed that though. HP is finally making inroads into P9500 customers with the new 3PAR gear. I personally know of two shops that have massive deployments of HP P9500 that will soon have their first 3PAR in their respective data centers. I’m sure many more will follow.

Time will tell how long P9500 sticks around, but I’d be shocked – really shocked if HP decided to OEM whatever came next out of Hitachi.

What’s Missing

This is a massive set of announcements, the result of blood sweat and tears of many engineers work, assuming it all works as advertised they did an awesome job!


There’s always a BUT isn’t there.

There is one area that I have hammered on 3PAR for what feels like three years now and haven’t gotten anywhere, the second area is more of a question/clarification.

SSD-Accelerated write caching

Repeat after me – AO (Adaptive Optimization) is not enough. Sub LUN auto tiering is not enough. I brought this up with David Scott himself last year, and I bring it up every time I talk to 3PAR. Please, I beg you please, come out with SSD-accelerated write caching technology. The last time I saw 3PAR in person I gave them two examples – EMC FastCache which is both a read and a write back cache. The second is Dell Compellent’s Data Progression technology. I’ve known about Compellent’s storage technology for years but there was one bit of information that I was not made aware of until earlier this year. That is their Data Progression technology by default automatically sends ALL writes (regardless of what tier the blocks live on), to the highest tier. On top of that, this feature is included in the base software license, it is not part of the add-on automatic tiering software.

The key is accelerating writes. Not reads, though reads are nice too. Reads are easy to accelerate compared to writes. The workload on my 3PAR here at my small company is roughly 92% write (yes you read that right). Accelerating reads on the 3PAR end of things won’t do anything for me!

If they can manage to pull themselves together and create a stable product, the Mt. Rainier technology from Qlogic could be a stop gap. I believe NetApp is partnered with them already for those products. Mt. Rainier, other than being a mountain near Seattle, is a host-based read and write acceleration technology for fibre channel storage systems.

Automated Peer Motion

HP released this more than a year ago – however to-date I have not noticed anything revolving around automatic movement of volumes. Call it what you want, load balancing, tiering, or something, as far as I know at this point any actions involving peer motion are entirely manual. Another point is I’m not sure how many peers an array can have. HP tries to say it’s near limitless – could you have 10 ? 20 ? 30 ? 100 ?  I don’t know the answer to that.

Again going back to Dell Compellent (sorry) their Live Volume software has automatic workload distribution. I asked HP about this last year and they said it was not in place then – I don’t see it in place yet.

That said – especially with the announcements here I’m doubling down on my 3PAR passion. I was seriously pushing Compellent earlier in the year(one of the main drivers was cost – one reseller I know calls them the Poor Man’s 3PAR) but where things stand now, their platform isn’t competitive enough at this point, from either a cost or architecture standpoint. I’d love to have my writes going to SSD as Compellent’s Data Progression does things, but now that the cost situation is reversed, it’s a no brainer to stick with 3PAR.

More Explosions

HP needs to take an excursion and blow up some 3PAR storage to see how fast and well it handles disaster recovery, take that new Peer Persistence technology and use it in the test.

Other storage announcements

As is obvious by now, the rest of the announcements pale in comparison to what came out of 3PAR. This really is the first major feature release of 3PAR software in three years (the last one being 2.3.1 which my company at the time participated in the press event and I was lucky enough to be the first production customer to run it in early January 2010 (had to for Exanet support – Exanet was going bust and I wanted to get on their latest code before they went *poof*)).

StoreOnce Improvements

The StoreOnce product line was refreshed earlier in the year and HP made some controversial performance claims. From what I see the only improvement here is they brought down some performance enhancements from the high end to all other levels of the StoreOnce portfolio.

I would really like to see HP release a VMware VSA with StoreOnce, really sounds like a no brainer, I’ll keep waiting..

StoreAll Improvements

StoreAll is the new name for the IBRIX product line, HP’s file and object storage offering. The main improvement here is something called Express Query which I think is basically a meta data search engine that is 1000s of times faster than using regular search functions for unstructured data. For me I’d rather just structure the data a bit more, the example given is tagging all files for a particular movie  to make it easier to retrieve later. I’d just have a directory tree and put all the files in the tree – I like to be organized. I think this new query tool depends on some level of structure – the structure being the tags you can put on files/objects in the system.

HP Converged storage growth - 38% YoY - notice no mention of StoreAll/IBRIX! Also no mention of growth for Lefthand either

HP has never really talked a whole lot about IBRIX – and as time goes on I’m understanding why. Honestly it’s not in the same league (or sport for that matter) for quality and reliability as 3PAR is, not even close. It lacks features, and according to someone I know who has more than a PB on HP IBRIX storage (wasn’t his idea it’s a big company)  it’s really not pleasant to use. I could say more but I’ll end by saying it’s too bad that HP does not have a stronger NAS offering. IBRIX may scale well on paper, but there’s a lot more to it than the paper specs of course. I went over the IBRIX+3PAR implementation guide, for using 3PAR back end storage on a IBRIX system and wasn’t impressed with some of the limitations.

Like everything else, I would like to see a full IBRIX cluster product deployable as a VMware VSA. It would be especially handy for small deployments(e.g. sub 1TB). The key here is the high availability.

HP also announced integration between StoreAll ExpressQuery and Autonomy software. When the Autonomy guy came on the stage I really just had one word to describe it: AWKWARD – given what happened recently obviously!


This was known as the P4000, or Lefthand before that. It was also refreshed earlier in the year. Nothing new announced today. HP is trying to claim the P4000 VSA as Software Defined Storage (ugh).


Make no mistake people – this storage announcement was all about 3PAR. David Scott tried his best to share the love, but there just wasn’t much exciting to talk about outside of 3PAR.

6,000+ words ! Woohoo. That took a lot of time to write, hopefully it’s the most in depth review of what is coming out.

December 3, 2012

The final countdown

Filed under: Storage — Tags: — Nate @ 6:31 am


It’s 5:30 AM ..

I got paged this morning for something someone else broke, so I was up already, I know HP was going to announce something soon (was expecting tomorrow, HP Discover is Dec 4-6th and it’s still December 3 in Germany), but it seems like it is today instead, and as I write this we’re about 30-minutes away.

I’m not all sure what is being announced vs what I have learned already, so am excited to see this news will finally get released.

UPDATE – It seems the new 7200 and 7400 arrays have been announced, waiting to see if there is more or not. Entry level pricing for 3PAR just got cut by about 2/3rds to about $20-25,000 with the introduction of these arrays in the mid range. There’s a bunch more though, once I get more clarification as to what I can talk about then I’ll have something else to write..


October 5, 2012

HP Releases new 3PAR vCenter Plugin

Filed under: Storage — Tags: , — Nate @ 3:31 pm

[I haven’t written a story about 3PAR in the past five minutes so I suppose I’m due..]

Well it’s not that new, to be honest I don’t know how old it is(maybe it’s 6 months old!). I was complaining to 3PAR recently about the lack of functionality in their vCenter plugin and was told that they had a newer version that had some good stuff in it.

The only caveat is this version couldn’t be downloaded from the HP website (no versions can, I looked as recently as yesterday afternoon). It’s only available in the media kit, aka CDROM. I didn’t remember which version was the newer one and when I was told about the newer one I didn’t know which version I had. So I asked the Seattle account team what the current version is because the version I was handed with our array which was installed in December was 2.2.0. It had some marginal improvements in the VMware Recovery Manager (I don’t need the recovery manager), but the vCenter plugin itself was sorely lacking, it felt like it had gone nowhere since it was first released what seems like three years ago (maybe it was two).

I track 3PAR pretty closely as you might imagine, and if I had absolutely no idea there was a new version then I suspect there are a lot of customers out there that have no idea. I never noticed any notifications, there’s no “upgrade checker” on the software side etc.

Anyways, sure enough they get back to me and say 2.2.3 is the latest and sent me a electronic copy of the ISO, and I installed it. I can’t say it’s massively better but it does address two basic sets of functionality that was lacking previously:

  • Ability to cache user credentials to the array in vCenter itself (before you had to re-login to the array every time you loaded the vCenter client)
  • Ability to provision storage from vCenter (tried this – it said I had to configure a storage template before it would function – I’ve never needed templates on 3PAR before so not sure why i do now – I suppose it just makes it more simple, though it’d be nice if there was an advanced check box to continue without a template)

There may be other things too that I haven’t noticed. I don’t think it is top notch yet, I’m fairly certain both EMC and NetApp’s integration packages are much more in depth. Though it wouldn’t surprise me if 3PAR now has the resources to fix the situation on their end, client side software was never really a strong point of theirs. For all I know they are busy re-writing it in a better language – to run on the new vCenter web console.

HP 3PAR vCenter Plugin

Based on the UI, I didn’t get the impression that the plugin could export storage to the whole cluster, since the provision storage option was available under each server but wasn’t visible in the cluster. But who knows, maybe if I took the time to make a template I’d see that it could export to the whole cluster at once..

Not that I needed to provision storage from vCenter, for me it’s much simpler to just ssh in and do it –

  • Create the volume of whatever size I want
  • Export the volume to the cluster (all servers with 1 command)

It really is just two commands. Well three if you count the ssh command line itself to login to the system. I can see the value for less technical folks though so I think it’s important functionality to have. I can accomplish that in a fraction of the amount of time it takes me to login to vCenter, fire up silver light and go through a wizard.

Something I have wanted to see is more integration from the performance monitoring/management standpoint. I don’t know what all hooks are available in vCenter for this sort of thing.

The 3PAR plugin is built using Microsoft Silverlight which was another thorn in my side earlier this year – because Silverlight did not support 64-bit windows. So I couldn’t run the plugin from the vCenter server itself (normally I just remote desktop to the vCenter server and run the client locally – the latency running it over the WAN can get annoying). But to my surprise Microsoft released an update at some point in the past several months and Silverlight now works in 64bit!

So if you happen to want this newer version of software (the plugin is free), contact your HP account team or file a support ticket to get it. Be sure to tell them to make that available for download, there’s no reason to not make it available to download. The VMware Recovery Manager is not free by contrast (both are distributed together), however the Recovery manager checks the license status on the array, so you can install it, but it won’t work unless the array has the license key.

On a somewhat related note I installed a Qlogic management plugin in vCenter a couple of months back, among other things it allows you to upgrade the firmware of their cards from vCenter itself. The plugin isn’t really high quality though, the documentation is poor and it was not too easy to get up and going(unlike the 3PAR plugin the Qlogic plugin cannot be installed on the vCenter server – I tried a dozen times). But it is sort of neat to see what it has, it shows all of the NICs and HBAs and what they are connected to. I think I have so many paths and connections that it seems to make the plugin go unresponsive and hang the vCenter client much of the time (eventually it unfreezes). Because of that I have not trusted it to do firmware upgrades.

Qlogic vCenter Plugin

The Qlogic plugin requires software to be installed on each physical server that you want Qlogic information for(which also requires a reboot). The host software, from what I remember, is also not compatible with VMware Update Manager, so at least I had to install it from the CLI. You can download the Qlogic plugin from their website, here is one link.

Both plugins need a lot of work, Qlogic’s is pretty much unusable, I have a small environment here and it’s dog slow. 3PAR’s well it is more usable now, performance is fine, and at least the two new features above bring it out of the unusable territory for myself (I probably still won’t use it but it provides at least some value now for less technical folks where before it did not).

August 24, 2012

3PAR: Helping and Hurting HP ?

Filed under: Storage — Tags: — Nate @ 8:59 am

Here I go, another blog post starting with question. Yet another post sparking speculation on my part due to an article by our friends at The Register who were kind enough to do some number crunching of HP’s latest quarterly numbers were storage revenues were down about 5%.

Apparently a big chunk of the downward slide for revenues was declines in EVA and tape, offset to some degree by 3PAR and StoreOnce de-dupe products.

I suppose this thought could apply to both scenarios, but I’ll focus on the disk end, since I have no background on StoreOnce.

Before HP acquired 3PAR, obviously EVA was a juicy target to go after to replace EVAs with 3PARs. The pitch was certainly you can get a hell of a lot more done on less 3PAR than you can with more EVA. So you’ll end up saving money. I’ve never used EVA before myself, heard some good aspects of it and some really bad aspects of it, I don’t think I’d ever want to use EVA regardless.

I am sure that 3PAR reps (those that haven’t left anyways – I’ve heard from numerous sources they outclass their HP counterparts by leagues and leagues), who are now responsible for pitching HP’s entire portfolio obvious have a strong existing bias towards 3PAR and away from the other HP products. They try to keep a balanced viewpoint but I’m sure that’s hard to do, especially after they’ve been spending so much time telling the world how much these other products are bad and why the customer should use 3PAR instead. Can’t blame them, its a tough switch to make.

So, assuming you can get a hell of a lot more done on a smaller/fewer 3PAR system(s) than EVA  – which I think is totally true, (with some caveat as to what sort of discounts some may be able to score on EVA, 3PAR has traditionally had some strict margin rules where they have no problem walking away from a deal if the margin is too low), add to that the general bias of at least part of the sales force, as well as HP’s general promotion that 3PAR is the future, and you can quite possibly get lower overall revenue while the customers are saving money by having to buy fewer array resources to accomplish the same (or more) tasks.

3PAR revenue was up more than 60% apparently, on top of the previous gains made since the acquisition.

It would be very interesting to me to see how much consolidation some of these deals end up being – traditionally NetApp I think has been the easiest target for 3PAR, I’ve seen some absolutely massive consolidation done in the past with those products, it was almost comical in some cases. I bet EVA is similar.

Now the downside to the lower revenues, and I’ve seen this at both Dell and HP – both companies are feeling tremendous pressure to try to outperform, they haven’t been able to do it on the revenue side, so they’ve been squeezing on internal costs, which really can degrade services. Overall quality of the sales forces at the likes of HP and Dell have traditionally been terrible, compared to the smaller company counterparts (at least in storage). Add to that the internal politics and region limitations that the companies place on their sales forces further complicates and frustrates the quality people internally as well as customers externally. Myself I was unable to get anything out of a local HP/3PAR account team for months in the Bay Area, so I reached out to my friends in Seattle and they turned some stuff around for me in a matter of hours no questions asked, and they didn’t get any credit (from HP) for it either. Really sad situation for both sides.

I don’t have much hope that HP will be willing or able to retain the top quality 3PAR folks at least on the sales side over the medium term, they, like Dell seem focused on driving down costs rather than keeping quality high, which is a double edged sword. The back end folks will probably stick around for longer, given that 3PAR is one of the crown jewels in HP’s enterprise portfolio.

For some reason I’m immediately reminded of this quote from Office Space:

[..] "that is not right, Michael. For five years now, you've worked your ass  off at Initech, hoping for a promotion or some kind of profit sharing  or something. Five years of your mid-20s now, gone. And you're gonna go  in tomorrow and they're gonna throw you out into the street. You know  why? So Bill Lumbergh's stock will go up a quarter of a point."


One of 3PAR’s weak points has been at the low end of the market, say sub $100k deals, is a space 3PAR has never tried to compete in.  Apparently according to The Register the HP P4000/Lefthand side of things is not doing so hot, and also seemed to be HP’s go-to product for this price range. This product range is what HP used to be excited about, before 3PAR, I attended a storage briefing at a VMware User group meeting just before HP bought 3PAR, expecting some sort of broad storage overview, but it was entirely Lefthand focused. While Lefthand has some interesting tech (the network RAID is pretty neat), for the most part I’d rather pay more and use 3PAR obviously.

I wonder what will happen to Lefthand in the future, will the best of it’s tech get rolled up into 3PAR? or vise versa? Or maybe it will just stay where it’s at, the one good thing Lefthand has is the VSA, it’s not as complete as I’d like to see it, but it’s one of the very few VSAs out there.

Dell has been busy trying to integrate their various storage acquisitions whether it’s Compellent, Ocarnia, Exanet, and I think there was one or two more that I don’t remember. Storage revenues there down as well. I’m not sure how much of the decline has to do with Dell terminating the EMC reselling stuff at this point, but it seems like a likely contributor to the declines in their case.

June 11, 2012


Filed under: Storage — Tags: , — Nate @ 7:20 am

I was invited to a little preview of some of the storage things being announced at HP Discover last week, just couldn’t talk about it until the announcement. Since I was busy in Amsterdam all last week I really didn’t have a lot of time to think about blogging here.

But I’m back and am mostly adjusted to the time zone differences I hope. HP had at least two storage related announcements they made last Monday, one related to scaling of their StoreOnce dedupe setup and another related to 3PAR. The StoreOnce announcement seemed to be controversial, since I really have a minimal amount of exposure to that sort of product I won’t talk about it much, on the surface it sounded pretty impressive but if the EMC claims are true than it’s unfortunate.

Anyways onto the 3PAR announcement which while it had a ton of marketing around it, it basically comes down to three words:

3PAR Supports NPIV (finally)

NPIV in a nutshell the way I understand it is a way of virtualizing connections between points in a fibre channel network, most often in the past it seems to have been used to present storage directly to VM hosts, via FC switches. NPIV is also used by HP’s VirtualConnect technology on the FC side to connect the VC modules to a NPIV-aware FC switch (which is pretty much all of them these days?), and then the switch connected to the storage(duh). I assume that NPIV is required by Virtual Connect because the VC module isn’t really a switch it’s more of a funky bridge.

Because 3PAR did not support NPIV (for what reason I don’t know I kept asking them about it for years but never got a solid response as to why not or when they might support it) there was no way to directly connect a Virtual Connect module (either the new Flex Fabric or the older dedicated FC VC modules) to a 3PAR array, you had to have a switch as a middleman. Which just seemed like a waste. I mean here you have a T or now a V-class system with tons of ports, you have these big blade chassis with a bunch of servers in them, with the VC modules acting like a switch (acting as in aggregating points) and you can’t directly connect it to the 3PAR storage! It was an unfortunate situation. Even going back to the 3cV, which was a bundle of sorts of 3PAR, HP c-Class Blades and VMware (long before HP bought 3PAR of course), I would have thought getting NPIV support would of been a priority but it didn’t happen, until now (well last Monday I suppose).

So at scale you have up to 96 host fibre channel ports on a V400 or 192 FC ports on a V800 operating at 8Gbps. At a maximum you could get by with 48 blade enclosures (2 FC/VC modules each with a single connection) on a V400 or of course double that to 96 on a V800. Cut it in half if you want higher redundancy with dual paths on each FC/VC module. That’s one hell of a lot of systems directly connected to the array. Users may wish to stick to a single connection per VC module allowing the 2nd connection to be connected to something else, maybe another 3PAR array. You still have full redundancy with two modules and one path per module. 3PAR 4Gbps HBAs (note the V-class has 8Gbps) have queue depths of something like 1,536 (not sure what the 8Gbps HBAs have). If your leveraging full height blades you get 8 per chassis, absolute worst case scenario you could set a queue depth of 192/server (I use 128/server on my gear). You could probably pretty safely go quite a bit higher though more thought may have to be had in certain circumstances. I’ve found 128 has been more than enough for my own needs.

It’s cost effective today to easily get 4TB worth of memory per blade chassis, memory being the primary driver of VM density, so your talking anywhere from 96 – 384 TB of memory hooked up to a single 3PAR array. From a CPU perspective anywhere from 7,680 CPU cores all the way up to 36,684 CPU cores in front of a single storage system, a system that has been tested to run at over 450,000 SPC-1 IOPS. The numbers are just insane.

All we need now is a flat ethernet fabric to connect the Virtual Connect switches to, oh wait we have that too, though it’s not from HP. A single pair of Black Diamond X-Series switches could scale to the max here as well, supporting a full eight 10Gbit/second connections per blade chassis with 96 blade chassis directly connected – which, guess what – is the maximum number of 10GbE ports on a pair of FlexFabric Virtual Connect modules (assuming your using two ports for FC). Of course all of the bandwidth is non blocking. I don’t know what the state of interoperability is but Extreme touts their VEPA support in scaling up to 128,000 VMs in an X-series, and Virtual Connect appears to tout their own VEPA support as well. Given the lack of more traditional switching functionality in the VC modules it would probably be advantageous to leverage VEPA (whether or not this extends to the Hypervisor I don’t know – I suspect not based on what I last heard at least from VMware, I believe it is doable in KVM though) to route that inter-server traffic through the upstream switches in order to gain more insight into it and even control it. If you have upwards of 80Gbps of connectivity per chassis anyways it seems there’d be abundant bandwidth to do it. All HP needs to do now is follow the Dell and revise their VC modules to natively support 40GbE (the Dell product is a regular Blade Ethernet switch by contrast and is not yet shipping).

You’d have to cut at least one chassis out of that configuration(or reduce port counts) in order to have enough ports on the X-Series to uplink to other infrastructure. (When I did the original calculations I forgot there would be two switches not one, so there’s more than enough ports to support 96 blade chassis between a pair of X-8s going full bore with 8x10GbE/chassis and you could even use M-LAG to go active-active. if you prefer). I’m thinking load balancers, and some sort of scale-out NAS for file sharing, maybe the interwebs too.

Think about that, up to 30,000 cores, more than 300 TB of memory, sure you do have a bunch of bridges, but all of it connected by only two switches, and one storage array (perhaps two). Just insane.

One HP spokesperson mentioned that even a single V800 isn’t spec’d to support their maximum blade system configuration of 25,000 VMs. 25k VMs on a single array does seem quite high(that comes to an average of 18 SPC-1 IOPS/VM), but it really depends on what those VMs are doing. I don’t see how folks can go around tossing solutions about saying X number of VMs when workloads and applications can vary so widely.

So in short, the announcement was simple – 3PAR supports NPIV now – the benefits of that simple feature addition are pretty big though.

April 12, 2012

3PAR Zero detection and snapshots

Filed under: Storage — Tags: , — Nate @ 9:00 am


I’ve been holding onto this post to get confirmation from HP support, now that I have it, here goes something’




A commenter Karl made a comment that made my brain think about this another way, and it turns out I was stupid in my original assessment as for the system storing the data for the snapshot. For some reason I was thinking the system was storing the zeros, but rather it was storing the data the zeros were replacing.

So I’m a dumbass for thinking that. homer moment *doh*

BUT the feature still appears broken in that what should happen is if there is in fact 200GB of data written to the snapshot that implies that zeros overwrote 200GB worth of non zero’d data – and that data should of been reclaimed from the source volume. In this case it was not, only a tiny fraction (1,536MB of logical storage or 12x128MB chunks of data). So at the very least the bulk of the data usage should of been moved from the source volume to the snapshot (snapshot space is allocated separate from the source volume so it’s easy to see which is using what). CURRENTLY the volume is showing 989GB of reserved space on the array with 120GB of written snapshot data and 140GB of file system data or around 260GB of total data which should come out to around 325GB of physical data in RAID 5 3+1, not 989thGB. But that space reclaiming technology is another feature thin copy reclamation. Which reclaims space from deleted snapshots.

So, sorry for being a dumbass for the original premise of the post, for some reason my brain got confused by the results of the tests, and it wasn’t until Karl’s comment that it made me think about it from the other angle.

I am talking to some more technical / non support people on Monday about this.

And thanks Karl 🙂



I got some good information from senior technical folks at 3PAR and it turns out the bulk of my problems are related to bugs in how one part reports raw space utilization (resulting in wildly inaccurate info), and a bug with regards to a space reclamation feature that was specifically disabled by a software patch on my array in order to fix another bug with space reclamation. So the fix for that is to update to a newer version of code which has that particular problem fixed for good(I hope?). I think I’d never get that kind of information out of the technical support team.

So in the end not much of a big issue after all, just confused by some bugs and functionality that was disabled and me being stupid.





A lot of folks over the years have tried to claim I am a paid shill for 3PAR, or Extreme or whatever. All I can say is I’m not compensated by them for my posts in any direct way (maybe I get better discounts on occasion or something I’m not sure how that stuff factors in but in any case those benefits go to my companies rather than me).

I do knock them when I think they need to be knocked though. Here is something that made me want to knock 3PAR, well more than knock, more like kick in the butt, HARD and say W T F.

I was a very early adopter of the T-class of storage system, getting it in house just a couple months after it was released. It was the first system from them which had the thin built in – the thin reclamation and persistence technology integrated into the ASIC – only I couldn’t use it because the software didn’t exist at the time.


3PAR "Stay Thin" strategy - wasn't worth $100,000 in licensing for the extra 10% of additional capacity savings for my first big array.


That was kind of sad but it could of been worse – the competition that we were evaluating was Hitachi who had just released their AMS2000-series of products, literally about a month after the T-class was released. Hitachi had no thin provisioning support what-so-ever on the AMS2000 when it was launched. That came about seven months later. If you required thin provisioning at the time you had to buy a USP or (more common for this scenario due to costs at the time) a USP-V, which supported TP, and put the AMS2000 behind it. Hitachi refused to even give us a ballpark price as to the cost of TP on the AMS2000 whenever it was going to be released. I didn’t need an exact price, just tell me is it going to be $5,000, or $25,000 or maybe $100,000 or more ? Should be a fairly simple process, at least from a customer perspective especially given they already had such licensing in place on their bigger platform. In the end I took that bigger platform’s licensing costs(since they refused to give that to me too) and extrapolated what the cost might look like on the AMS line. I got the info from Storage Mojo‘s price list and basically took their price and cut it in half to take into account discounts and stuff. We ended up obviously not going for HDS so I don’t know what it would of really cost us in the end.

OK, steering the tangent closer to the topic again..bear with me.

Which got me wondering – given it is an ASIC – and not a FPGA they really have to be damn sure it works when they ship product otherwise it can be an expensive proposition to replace the ASICs if there is a design problem with the chip, after all the CPUs aren’t really in the data path of the stuff flowing through the system so it would be difficult to work around ASIC faults in software(if it was possible at all).

So I waited, and waited for the new thin stuff to come out, thinking since I had thin provisioning licensed already I would just install the new software and get the new features.

Then it was released – more than a year after I got the T400 but it came with a little surprise – additional licensing costs associated with the software – something nobody ever told me of (sounds like it was a last minute decision). If I recall right, for the system I had at the time if we wanted to fully license thin persistence it was going to be an extra $100,000 in software. We decided against it at the time, really wasn’t worth the price for what we’d reclaim. Later on 3PAR offered to give us the software for free if we bought another storage array for disaster recovery (which we were planning to) – but the disaster recovery project got canned so we never got it.

Another licensing feature of this new software was in order to get to the good stuff, the thin persistence you had to license another product  – Thin Conversion whether you wanted it or not (I did not – really you might only need Thin Conversion if your migrating from a non thin storage system).

Fast forward almost two years and I’m at another company with another 3PAR, there was a thin provisioning licensing snafu with our system so for the past few months(and for the next few) I’m operating on an evaluation license which basically has all the features unlocked – including the thin reclamation tech. I had noticed recently that some of my volumes are getting pretty big – per the request of the DBA we have I agreed to make these volumes quite large – 400GB each, what I normally do is create the physical volume at 1 or 2TB (in this case 2TB), then I create a logical volume that is more in line with what the application actually needs(which may be as low as say 40GB for the database), then grow it on line as the space requirements increase.

3PAR’s early marketing at least tried to communicate that you can do away with volume management altogether.  While certainly technically possible, I don’t recommend that you take that approach. Another nice thing about volume management is being able to name the volumes with decent names, which is very helpful when working with moving snapshots between systems, especially with MPIO and multiple paths and multiple snapshots on one system, with LVM it’s simple as can be, without – I really don’t want to think about it. Only downside is you can’t easily mount a snapshot back to the originating system because the LVM UUID will conflict and changing that ID is not (or was not, been a couple years since I looked into it) too easy, blocking access to the volume. Not a big deal though the number of times I felt I wanted to do that was once.

This is a strategy I came up with going back almost six years to my original 3PAR box and has worked quite well over the years. Originally, resizing was an off line operation since the kernel that we had at the time (Red Hat Enterprise 4.x) did not support on line file system growth, it does (and has) for a while now, I think since maybe 4.4/4.5 and certainly ever since v5.

Once you have a grasp as to the growth pattern of your application it’s not difficult to plan for. Getting the growth plan in the first place could be complex though given the dedicate on write technology, you had to (borrowing a term from Apple here) think different. It obviously wasn’t enough to just watch how much disk space was being consumed on average, you had to take into account space being written vs being deleted and how effective the file system was at re-utilizing deleted blocks. In the case of MySQL – being as inefficient as it is, you had to also take into account space utilization required by things like ALTER TABLE statements, in which MySQL makes a copy of the entire table with your change then drops the original. Yeah, real thin friendly there.

Given this kind of strategy it is more difficult to gauge just exactly how much your saving with thin provisioning, I mean on my original 3PAR I was about 300-400% over subscribed(which at the time was considered extremely high – I can’t count the number of hours I spent achieving that), I think I recall at that conference I was at David Scott saying the average customer was 300% oversubscribed. On my current system I am 1300% over subscribed. Mainly because I got a bunch of databases and I make them all 2TB volumes, I can say with a good amount of certainty that they will probably never get to remotely 2TB in size but it doesn’t affect me otherwise so I give it what I can (all my boxes on this array are VMware ESX 4.1 which of course has a 2TB limit – the bulk of these volumes are raw device mapped to leverage SAN-based snapshots as well as, to a lesser extent individually manage and monitor space and i/o metrics).

At the time my experience was compounded by the fact that I was still very green when it came to storage (I’d like to think I am more blue now at least whatever that might mean). Never really having dabbled much in it prior, choosing instead to focus on networking and servers. All big topics, I couldn’t take them all on at once 🙂

So my point is – even though 3PAR has had this technology for a while now – I really have never tried it. In the past couple months I have run the Microsoft sdelete tool on the 3 windows VMs I do have to support my vCenter stuff(everything else is Linux) – but honestly I don’t think I bothered to look to see if any space was reclaimed or not.

Now back on topic

Anyways, I have this one volume that was consuming about 300GB of logical space on the array when it had maybe 140GB of space written to the file system (which is 400GB). Obviously a good candidate for space reclamation, right? I mean the marketing claims you can gain 10% more space, in this case I’m gaining a lot more than that!

So I decided – hey how bout I write a basic script that writes out a ton of zeros to the file system to reclaim this space (since I recently learned that the kernel code required to do fancier stuff like fstrim [updated that post with new information at the end since I originally wrote it] doesn’t exist on my systems). So I put a basic looping script in to write 100MB files filled with zeros from /dev/zero.

I watched it as it filled up the file system over time (I spaced out the writing as to not flood my front end storage connections), watching it reclaim very little space – at the end of writing roughly 200GB of data it reclaimed maybe 1-2GB from the original volume. I was quite puzzled to say the least. But that’s not the topic of this post now is it.

I was shocked, awed, flabbergasted by the fact that my operation actually CONSUMED an additional 200GB of space on the system (space filled with zeros). Why did it do this? Apparently because I created a snapshot of the volume earlier in the day and the changes were being kept track of thus consuming the space. Never mind the fact that the system is supposed to drop the zeros even if it doesn’t reclaim space – it doesn’t appear to do so when there is a snapshot(s) on the volume, so the effects were a double negative – didn’t reclaim any space from the original, and actually consumed a ton more space (more than 2x the original volume size) due to the snapshot.

Support claims minimal space was reclaimed by the system because I wrote files in 100MB blocks instead of 128MB blocks. I find it hard to believe out of 200GB of files I wrote that there was not more 128MB contiguous blocks of space of zeros. But I will try the test again with 128MB files on that specific volume after I can contact the people that are using the snapshot to delete the snapshot and re-create it to reclaim that 200GB of space. Hell I might as well not even use the snapshot and create a full physical copy of the volume.

Honestly I’m sort of at a loss for words as to how stupid this is. I have loved 3PAR through thick and thin for a long time (and I’ve had some big thicks over the years that I haven’t written about here anyways..), but this one I felt compelled to. A feature so heavily marketed, so heavily touted on the platform is rendered completely ineffective when a basic function like snapshots is in use. Of course the documentation has nothing on this, I was looking through all the docs I had on the technology when I was running this test on Thursday and basically what it said was enable zero detection on the volume (disabled by default) and watch it work.

I’ve heard a lot of similar types of things (feature heavily touted but doesn’t work under load or doesn’t work period) on things like NetApp, EMC etc. This is a rare one for 3PAR in my experience at least. My favorite off the top of my head was NetApp’s testing of an EMC CX-3 performance with snapshots enabled. That was quite a shocker to me when I first saw it. Roughly a 65% performance drop over the same system without snapshots.

Maybe it is a limitation of the ASIC itself – going back to my speculation about design issues and not being able to work around them in software. Maybe this limitation is not present in the V-class which is the next generation ASIC. Or maybe it is, I don’t know.

HP Support says this behavior is as designed. Well I’m sure more than one person out there would agree it is a stupid design if so. I can’t help but think it is a design flaw, not an intentional one – or a design aspect they did not have time to address in order to get the T-series of arrays out in a timely manor(I read elsewhere that the ASIC took much longer than they thought to design, which I think started in 2006 – and was at least partially responsible for them not having PCI express support when the ASIC finally came out). I sent them an email asking if this design was fixed in the V-Class, will update if they respond. I know plenty of 3PAR folks (current and former) read this too so they may be able to comment (anonymously or not..).

As for why more space was not reclaimed in the volume, I ran another test on Friday on another volume without any snapshots which should of reclaimed a couple hundred gigs but according to the command line it reclaimed nothing, support points me to logs saying 24GB was reclaimed, but that is not reflected in the command line output showing the raw volume size on the system. Still working with them on that one. My other question to them is why 24GB ? I wrote zeros to the end of the file system, there was  0 bytes left. I have some more advanced logging things to do for my next test.

While I’m here I might as well point out some of the other 3PAR software or features I have not used, let’s see

  • Adaptive optimization (sub LUN tiering – licensed separately)
  • Full LUN-based automated tiering (which I believe is included with Dynamic optimization) – all of my 3PAR arrays to-date have had only a single tier of storage from a spindle performance perspective though had different tiers from RAID level perspectives
  • Remote Copy – for the situations I have been in I have not seen a lot of value in array-based replication. Instead I use application based. The one exception is if I had a lot of little files to replicate, using block based replication is much more efficient and scalable. Array-based replication really needs application level integration, and I’d rather have real time replication from the likes of Oracle(not that I’ve used it in years, though I do miss it, really not a fan of MySQL) or MySQL then having to co-ordinate snapshots with the application to maintain consistency (and in the case of MySQL there really is no graceful way to take snapshots, again, unlike Oracle – I’ve been struggling recently with a race condition somewhere in an App or in MySQL itself which pretty much guarantees MySQL slaves will abort with error code 1032 after a simple restart of MySQL – this error has been shown to occur upwards of 15 minutes AFTER the slave has gotten back in sync with the master – really frustrating when trying to deal with snapshots and getting those kinds of issues from MySQL). I have built my systems, for the most part so they can be easily rebuilt so I really don’t have to protect all of my VMs by replicating their data, I just have to protect/replicate the data I need in order to reconstruct the VM(s) in the event I need to.
  • Recovery manager for Oracle (I licensed it once on my first system but never ended up using it due to limitations in it not being able to work with raw device maps on vmware – I’m not sure if they have fixed that by now)
  • Recovery manager for all other products (SQL server, Exchange, and VMware)
  • Virtual domains (useful for service providers I think mainly)
  • Virtual lock (used to lock a volume from having data deleted or the volume deleted for a defined period of time if I recall right)
  • Peer motion

3PAR Software/features I have used (to varying degrees)

  • Thin Provisioning (for the most part pretty awesome but obviously not unique in the industry anymore)
  • Dynamic Optimization (oh how I love thee) – the functionality this provides I think for the most part is still fairly unique, pretty much all of it being made possible by the sub disk chunklet-based RAID design of the system. Being able to move data around in the array between RAID levels, between tiers, between regions of the physical spindles themselves (inner vs outer tracks), really without any limit as to how you move it (e.g. no limitations like aggregates in the NetApp world), all without noticeable performance impact is quite amazing (as I wrote a while back I ran this process on my T400 once for four SOLID MONTHS 24×7 and nobody noticed).
  • System Tuner (also damn cool – though never licensed it only used it in eval licenses) – this looks for hot chunklets and moves them around automatically. Most customers don’t need this since the system balances itself so well out of the box. If I recall right, this product was created in response to a (big) customer’s request mainly to show that it could be done, I am told very few license it since it’s not needed. In the situations where I used it it ended up not having any positive(or negative) effect on the situation I was trying to resolve at the time.
  • Virtual Copy (snapshots – both snapshots and full volume copies) – written tons of scripts to use this stuff mainly with MySQL and Oracle.
  • MPIO Software for MS windows – worked fine – really not much to it, just a driver. Though there was some licensing fee 3PAR had to pay for MS for the software or development efforts they leveraged to build it – otherwise the drivers could of been free.
  • Host Explorer (pretty neat utility that sends data back through the SCSI connection from the server to the array including info like OS version, MPIO version, driver versions etc – doesn’t work on vSphere hosts because VMware hasn’t implemented support for those SCSI commands or something)
  • System Reporter – Collects a lot of data, though from a presentation perspective I much prefer my own cacti graphs
  • vCenter Plugin for the array – really minimal set of functionality compared to the competition – a real weak point for the platform. Unfortunately it hasn’t changed much in the almost two years since it was released – hoping it gets more attention in the future, or even in the present. As-is, I consider it basically useless and don’t use it. I haven’t taken advantage of the feature on my own system since I installed the software to verify that it’s functional.
  • Persistent Cache – an awesome feature in 4+ node systems that allows re-mirroring of cache to another node in the system in the event of planned or unplanned downtime on one or more nodes in the cluster (while I had this feature enabled – it was free with the upgrade to 2.3.1 on systems with 4 or more nodes I never actually had a situation where I was able to take advantage of it before I left the company with that system).
  • Autonomic Groups – group volumes and systems together and make managing mappings of volumes to clusters of servers very easy. The GUI form of this is terrible and they are working to fix it. I literally practically wiped out my storage system when I first tried this feature using the GUI. It was scary the damage I did in the short time I had this(even more so given the number of years I’ve used the platform for). Fortunately the array that I was using was brand new and had really no data on it (literally). Since then – CLI for me, safer and much more clear as to what is going on. My friends over at 3PAR got a lot of folks involved over there to drive a priority plan to fix this functionality which they admit is lacking. What I did wipe out were my ESX boot volumes, so I had to re-create the volumes and re-install ESX. Another time I wiped out all of my fibre channel host mappings and had to re-establish those too. Obviously on a production system this would of resulted in massive data loss and massive downtime. Fortunately, again it was still at least 2 months from being a production system and had a trivial amount of data. When autonomic groups first came out I was on my T400 with a ton of existing volumes, migrating to use existing volumes to groups likely would of been disruptive so I only used groups for new resources, so I didn’t get much exposure to the feature at the time.

That turned out to be A LOT longer than I expected.

This is probably the most negative thing I’ve said about 3PAR here. This information should be known though. I don’t know how other platforms behave – maybe it’s the same. But I can say in the nearly three years I have been aware of this technology this particular limitation has never come up in conversations with friends and contacts at 3PAR. Either they don’t know about it either or it’s just one of those things they don’t want to admit to.

It may turn out that using SCSI UNMAP to reclaim space, rather than writing zeros is much more effective thus rendering the additional costs of thin licensing worth while. But not many things support that yet. As mentioned earlier, VMware specifically recommends disabling support for UNMAP in ESX 5.0 and has disabled it in subsequent releases because of performance issues.

Another thing that I found interesting, is that on the CLI itself, 3PAR specifically reccomends keeping Zero detection disabled unless your doing data migration because under heavy load it can cause issues –

Note: there can be some performance implication under extreme busy systems so it is recommended for this policy to be turned on only  during Fat to Thin and re-thinning process and be turned off during normal operation.

Which to some degree defeats the purpose? Some 3PAR folks have told me that information is out of date and only related to legacy systems. Which didn’t really make sense since there are no legacy systems that support zero detection as it is hard wired into the ASIC. 3PAR goes around telling folks that zero detection on other platforms is no good because of the load it introduces but then says that their system behaves in a similar way. Now to give them credit I suspect it is still quite likely a 3PAR box can absorb that hit much better than any other storage platform out there, but it’s not as if your dealing with a line rate operation, there clearly seems to be a limit as to what the ASIC can process. I would like to know what an extremely busy system looks like – how much I/O as a percentage of controller and/or disk capacity?

Bottom line – at this point I’m even more glad I didn’t license the more advanced thinning technologies on my bigger T400 way back when.

I suppose I need to go back to reclaiming space the old fashioned way – data migration.

4,000+ words woohoo!

March 17, 2012

Who uses Legacy storage?

Filed under: Random Thought,Storage — Tags: — Nate @ 3:34 pm

Still really busy these days haven’t had time to post much but I was just reading someone’s LinkedIn profile who works at a storage company and it got me thinking.

Who uses legacy storage? It seems almost everyone these days tries to benchmark their storage system against legacy storage.  Short of something like maybe direct attached storage which has no functionality, legacy storage has been dead for a long time now. What should the new benchmark be? How can you go about (trying to) measuring it?  I’m not sure what the answer is.

When is thin, thin?

One thing that has been in my mind a lot on this topic recently is how 3PAR constantly harps on about their efficient allocation at 16kB blocks. I think I’ve tried to explain this in the past but I wanted to write about it again. I wrote a comment on it in a HP blog recently I don’t think they published the comment though (haven’t checked for a few weeks maybe they did). But they try to say they are more efficient (by dozens or hundreds of times) than other platforms because of this 16kB allocation thing-a-ma-bob.

I’ve never seen this as an advantage to their platform. Whether you allocate in 16kB chunks or perhaps 42MB chunks in the case of Hitachi, it’s still a tiny amount of data in any case and really is a rounding error. If you have 100 volumes and they all have 42MB of slack hanging off the back of them, that’s 4.2GB of data, it’s nothing.

What 3PAR doesn’t tell you is this 16kB allocation unit is what a volume draws from a storage pool (Common Provisioning Group in 3PAR terms – which is basically a storage template or policy which defines things like RAID type, disk type, placement of data, protection level etc). They don’t tell you up front how much these storage pools provision storage on, which is in-part based on the number of controllers in the system.

If your volumes max out a CPG’s allocated space and it needs more, it won’t grab 16kB, it will grab (usually at least) 32GB, this is adjustable. This is – I believe in part how 3PAR addresses minimizing impact of thin provisioning with large amounts of I/O, because it allocates these pools with larger chunks of data up front. They even suggest that if you have a really large amount of growth that you increase the allocation unit even higher.

Growth Increments for CPGs on 3PAR

I bet you haven’t heard HP/3PAR say their system grows in 128GB increments recently 🙂

It is important to note, or to remember, that a CPG can be home to hundreds of volumes, so it’s up to the user, if they only have one drive type for example maybe they only want 1 CPG.  But I think as they use the system they will likely go down a similar path that I have and have more.

If you only have one or two CPGs on the system it’s probably not a big deal, though the space does add up. Still I think for the most part even this level of allocation can be a rounding error. Unless you have a large number of CPGs.

Myself, on my 3PAR arrays I use CPGs not just for determining data characteristics of the volumes but also for organizational purposes / space management. So I can look at one number and see all of the volumes dedicated to development purposes are X in size, or set an aggregate growth warning on a collection of volumes. I think CPGs work very well for this purpose. The flip side is you can end up wasting a lot more space. Recently on my new 3PAR system I went through and manually set the allocation level of a few of my CPGs from 32GB down to 8GB because I know the growth of those CPGs will be minimal. At the time I had maybe 400-450GB of slack space in the CPGs, not as thin as they may want you to think (I have around 10 CPGs on this array). So I changed the allocation unit and compacted the CPGs which reclaimed a bunch of space.

Again, in the grand scheme of things that’s not that much data.

For me 3PAR has always been more about higher utilizations which are made possible by the chunklet design and the true wide striping, the true active-active clustered controllers, one of the only(perhaps one of if not the first?) storage designs in the industry that goes beyond two controllers, and the ASIC acceleration which is at the heart of the performance and scalability. Then there is the ease of use and stuff, but I won’t talk about that anymore I’ve already covered it many times. One of my favorite aspects of the platform is the fact that they use the same design on everything from the low end to the high end, the only difference really is scale. It’s also part of the reason why their entry level pricing can be quite a bit higher than entry level pricing from others since there is the extra sauce in there that the competition isn’t willing or able to put on their low end box(s).

Sacrificing for data availability

I was talking to Compellent recently learning about some of their stuff for a project over in Europe and they told me their best practice (not a requirement) is to have 1 hot spare of each drive type (I think drive type meaning SAS or SATA, I don’t think drive size matters but am not sure) per drive chassis/cage/shelf.

They, like many other array makers don’t seem to support the use of low parity RAID (like RAID 50 3+1, or 4+1), they (like others) lean towards higher data:parity ratios I think in part because they have dedicated parity disks(they either had a hard time explaining to me how data is distributed or I had a hard time understanding, or both..), and dedicating 25% of your spindles to parity is very excessive, but in the 3PAR world dedicating 25% of your capacity  to parity is not excessive(when compared to RAID 10 where there is a 50% overhead anyways).

There are no dedicated parity, or dedicated spares on a 3PAR system so you do not lose any I/O capacity, in fact you gain it.

The benefits to a RAID 50 3+1 configuration are a couple fold – you get pretty close to RAID 10 performance, and you can most likely (depending on the # of shelves) suffer a shelf failure w/o data loss or downtime(downtime may vary depending on your I/O requirements and I/O capacity after those disks are gone).

It’s a best practice (again, not a requirement) in the 3PAR world to provide this level of availability (losing an entire shelf), not because you lose shelves often but just because it’s so easy to configure and is self managing. With a 4, or 8-shelf configuration I do like RAID 50 3+1. In an 8-shelf configuration maybe I have some data volumes that don’t need as much performance so I could go with a 7+1 configuration and still retain shelf-level availability.

Or, with CPGs you could have some volumes retain shelf-level availability and other volumes not have it, up to you. I prefer to keep all volumes with shelf level availability. The added space you get with a higher data:parity ratio really has diminishing returns.

Here’s a graphic from 3PAR which illustrates the dimishing returns(at least on their platform, I think the application they used to measure was Oracle DB):

The impact of RAID on I/O and capacity

3PAR can take this to an even higher extreme on their lower end F-class series which uses daisy chaining in order to get to full capacity (max chain length is 2 shelves). There is a availability level called port level availability which I always knew was there but never really learned what it truly was until last week.

Port level availability applies only to systems that have daisy chained chassis and protects the system from the failure of an entire chain. So two drive shelves basically. Like the other forms of availability this is fully automated, though if you want to go out of your way to take advantage of it you need to use a RAID level that is compliant with your setup to leverage port level availability otherwise the system will automatically default to a lower level of availability (or will prevent you from creating the policy in the first place because it is not possible on your configuration).

Port level availability does not apply to the S/T/V series of systems as there is no daisy chaining done on those boxes (unless you have a ~10 year old S-series system which they did support chaining – up to 2,560 drives on that first generation S800 – back in the days of 9-18GB disks).

November 8, 2011

EMC and their quad core processors

Filed under: Storage — Tags: , , , — Nate @ 8:48 am

I first heard that Fujitsu had storage maybe one and a half years ago, someone told me that Fujitsu was one company that was seriously interested in buying Exanet at the time, which caused me to go look at their storage, I had no idea they had storage systems. Even today I really never see anyone mention them anywhere, my 3PAR reps say they never encounter Fujitsu in the field(at least in these territories they suspect over in Europe they go head to head more often).

Anyways, EMC folks seem to be trying to attack the high end Fujitsu system, saying it’s not “enterprise”, in the end the main leg that EMC has trying to hold on to what in their eyes is “enterprise” is mainframe connectivity, which Fujitsu rightly tries to debunk that myth since there are a lot of organizations that are consider themselves “enterprise” that don’t have any mainframes. It’s just stupid, but EMC doesn’t really have any other excuses.

What prompted me to write this, more than anything else was this

One can scale from one to eight engines (or even beyond in a short timeframe), from 16 to 128 four-core CPUs, from two to 16 backend- and front-end directors, all with up to 16 ports.

The four core CPUs is what gets me. What a waste! I have no doubt that in EMC’s  (short time frame)  they will be migrating to quad socket 10 core CPUs right? After all, unlike someone like 3PAR who can benefit from a purpose built ASIC to accelerate their storage, EMC has to rely entirely on software. After seeing SPC-1 results for HDS’s VSP, I suspect the numbers for VMAX wouldn’t be much more impressive.

My main point is, and this just drives me mad. These big manufacturers touting the Intel CPU drum and then not exploiting the platform to it’s fullest extent. Quad core CPUs came out in 2007. When EMC released the VMAX in 2009, apparently Intel’s latest and greatest was still quad core. But here we are, practically 2012 and they’re still not onto at LEAST hex core yet? This is Intel architecture, it’s not that complicated. I’m not sure what quad core CPUs specifically are in the VMAX, but the upgrade from Xeon 5500 to Xeon 5600 for the most part was

  1. Flash bios (if needed to support new CPU)
  2. Turn box off
  3. Pull out old CPU(s)
  4. Put in new CPU(s)
  5. Turn box on
  6. Get back to work

That’s the point of using general purpose CPUs!! You don’t need to pour 3 years of R&D into something to upgrade the processor.

What I’d like to see, something I mentioned in a comment recently is a quad socket design for these storage systems. Modern CPUs have had integrated memory controllers for a long time now (well only available on Intel since the Xeon 5500). So as you add more processors you add more memory too. (Side note: the documentation for VMAX seems to imply a quad socket design for a VMAX engine but I suspect it is two dual socket systems since the Intel CPUs EMC is likely using are not quad-socket capable). This page claims the VMAX uses the ancient Intel 5400 processors, which if I remember right was the first generation quad cores I had in my HP DL380 G5s many eons ago. If true, it’s even more obsolete than I thought!

Why not 8 socket? or more? Well cost mainly. The R&D involved in an 8-socket design I believe is quite a bit higher, and the amount of physical space required is high as well. With quad socket blades common place, and even some vendors having quad socket 1U systems, the price point and physical size related to quad socket designs is well within reach of storage systems.

So the point is on these high end storage systems you start out with a single socket populated on a quad socket board with associated memory. Want to go faster? add another CPU and associated memory? Go faster still? add two more CPUs and associated memory (though I think it’s technically possible to run 3 CPUs, well there have been 3 CPU systems in the past, it seems common/standard to add them in pairs). Your spending probably at LEAST a quarter million for this system initially, probably more than that, the incremental cost of R&D to go quad socket given this is Intel after all is minimal.

Currently VMAX goes to 8 engines, they say they will expand that to more. 3PAR took the opposite approach, saying while their system is not as clustered as a VMAX is (not their words), they feel such a tightly integrated system (theirs included) becomes more vulnerable to “something bad happening” that impacts the system as a whole, more controllers is more complexity. Which makes some sense. EMC’s design is even more vulnerable being that it’s so tightly integrated with the shared memory and stuff.

3PAR V-Class Cluster Architecture with low cost high speed passive backplane with point to point connections totalling 96 Gigabytes/second of throughput

3PAR goes even further in their design to isolate things – like completely separating control cache which is used for the operating system that powers the controllers and for the control data on top of it, with the data cache, which as you can see in the diagram below is only connected to the ASICs, not to the Intel CPUs. On top of that they separate the control data flow from the regular data flow as well.

One reason I have never been a fan of “stacking” or “virtual chassis” on switches is the very same reason, I’d rather have independent components that are not tightly integrated in the event “something bad” takes down the entire “stack”. Now if your running with two independent stacks, so that one full stack can fail without an issue then that works around that issue, but most people don’t seem to do that. The chances of such a failure happening are low, but they are higher than something causing all of the switches to fail if the switches were not stacked.

One exception might be some problems related to STP which some people may feel they need when operating multiple switches. I’ll answer that by saying I haven’t used STP in more than 8 years, so there have been ways to build a network with lots of devices without using STP for a very long time now. The networking industry recently has made it sound like this is something new.

Same with storage.

So back to 3PAR. 3PAR changed their approach in their V-series of arrays, for the first time in the company’s history they decided to include TWO ASICs in each controller, effectively doubling the I/O processing abilities of the controller. Fewer, more powerful controllers. A 4-node V400 will likely outperform an 8-node T800. Given the system’s age, I suspect a 2-node V400 would probably be on par with an 8-node S800 (released around 2003 if I remember right).

3PAR V-Series ASIC/CPU/PCI/Memory Architecture

EMC is not alone, and not the worst abuser here though. I can cut them maybe a LITTLE slack given the VMAX was released in 2009. I can’t cut any slack to NetApp though. They recently released some new SPEC SFS results, which among other things, disclosed that their high end 6240 storage system is using quad core Intel E5540 processors. So basically a dual proc quad core system. And their lower end system is — wait for it — dual proc dual core.

Oh I can’t describe how frustrated that makes me, these companies touting using general purpose CPUs and then going out of their way to cripple their systems. It would cost NetApp all of maybe what $1200 to upgrade their low end box to quad cores? Maybe $2500 for both controllers? But no they rather you spend an extra, what $50,000-$100,000  to get that functionality?

I have to knock NetApp more to some extent since these storage systems are significantly newer than the VMAX, but I knock them less because they don’t champion the Intel CPUs as much as EMC does, that I have seen at least.

3PAR is not a golden child either, their latest V800 storage system uses — wait for it — quad core processors as well. Which is just as disgraceful. I can cut 3PAR more slack because their ASIC is what provides the horsepower on their boxes, not the Intel processors, but still that is no excuse for not using at LEAST 6 core processors. While I cannot determine precisely which Intel CPUs 3PAR is using, I know they are not using Intel CPUs because they are ultra low power since the clock speeds are 2.8Ghz.

Storage companies aren’t alone here, load balancing companies like F5 Networks and Citrix do the same thing. Citrix is better than F5 in their offering software “upgrades” on their platform that unlock additional throughput. Without the upgrade you have full reign of all of the CPU cores on the box which allow you to run more expensive software features that would normally otherwise impact CPU performance. To do this on F5 you have to buy the next bigger box.

Back to Fujitsu storage for a moment, their high end box certainly seems like a very respectable system with regards to paper numbers anyways. I found it very interesting the comment on the original article that mentioned Fujitsu can run the system’s maximum capacity behind a single pair of controllers if the customer wanted to, of course the controllers couldn’t drive all the I/O but it is nice to see the capacity not so tightly integrated to the controllers like it is on the VMAX or even on the 3PAR platform. Especially when it comes to SATA drives which aren’t known for high amounts of I/O, higher end storage systems such as the recently mentioned HDS, 3PAR and even VMAX tap out in “maximum capacity” long before they tap out in I/O if your loading the system with tons of SATA disks. It looks like Fujitsu can get up to 4.2PB of space leaving, again HDS, 3PAR and EMC in the dust. (Capacity utilization is another story of course).

With Fujitsu’s ability to scale the DX8700 to 8 controllers, 128 fibre channel interfaces, 2,700 drives and 512GB of cache that is quite a force to be reckoned with. No sub-disk distributed RAID, no ASIC acceleration, but I can certainly see how someone would be willing to put the DX8700 up against a VMAX.

EMC was way late to the 2+ controller hybrid modular/purpose built game and is still playing catch up. As I said to Dell last year, put your money where your mouth is and publish SPC-1 results for your VMAX, EMC.

With EMC so in love with Intel I have to wonder how hard they had to fight off Intel from encouraging EMC to use the Itanium processor in their arrays instead of Xeons. Or has Intel given up completely on Itanium now (which, again we have to thank AMD for – without AMD’s x86-64 extensions the Xeon processor line would of been dead and buried many years ago).

For insight to what a 128-CPU core Intel-based storage system may perform in SPC-1, you can look to this system from China.

(I added a couple diagrams, I don’t have enough graphics on this site)

November 4, 2011

Hitachi VSP SPC-1 Results posted

Filed under: Storage — Tags: , , , , — Nate @ 7:41 pm

[(More) Minor updates since original post] I don’t think I’ll ever work for a company that is in a position to need to leverage something like a VSP, but that doesn’t stop me from being interested as to how it performs. After all it has been almost four years since Hitachi released results for their USP-V, which had, at the time very impressive numbers(record breaking even, only to be dethroned by the 3PAR T800 in 2008) from a performance perspective(not so much from a cost perspective not surprisingly).

So naturally, ever since Hitachi released their VSP (which is OEM’d by HP as their P9500)about a year ago I have been very curious as to how well it performs, it certainly sounded like a very impressive system on paper with regards to performance. After all it can scale to 2,000 disks and a full terabyte of cache. I read an interesting white paper (I guess you could call it that) recently on the VSP out of curiosity.

One of the things that sort of stood out is the claim of being purpose built, they’re still using a lot of custom components and monolithic architecture on the system, vs most of the rest of the industry which is trying to be more of a hybrid of modular commodity and purpose built, EMC VMAX is a good example of this trend. The white paper, which was released about a year ago, even notes

There are many storage subsystems on the market but only few can be considered as real hi-end or tier-1. EMC announced its VMAX clustered subsystem based on clustered servers in April 2009, but it is often still offering its predecessor, the DMX, as the first choice. It is rare in the industry that a company does not start ramping the new product approximately half a year after launching. Why EMC is not pushing its newer product more than a year after its announcement remains a mystery. The VMAX scales up by adding disks (if there are empty slots in a module) or adding modules, the latter of which are significantly more expensive. EMC does not participate in SPC benchmarks.

The other interesting aspect of the VSP is it’s dual chassis design(each chassis in a separate rack), which, the white paper says is not a cluster, unlike a VMAX or a 3PAR system(3PAR isn’t even a cluster the way VMAX is a cluster intentionally with the belief the more isolated design would lead to higher availability). I would assume this is in response to earlier criticisms of the USP-V design in which the virtualization layer was not redundant(when I learned that I was honestly pretty shocked) – Hitachi later rectified the situation on the USP-V by adding some magic sauce that allowed you to link a pair of them together.

Anyways with all this fancy stuff, obviously I was pretty interested when I noticed they had released SPC-1 numbers for their VSP recently. Surely, customers don’t go out and buy a VSP because of the raw performance, but because they may need to leverage the virtualization layers to abstract other arrays, or perhaps the mainframe connectivity, or maybe they get a comfortable feeling knowing that Hitachi has a guarantee on the array where they will compensate you in the event there is data loss that is found to be the fault of the array (I believe the guarantee only covers data loss and not downtime, but am not completely sure). After all, it’s the only storage system on the planet that has such a stamp of approval from the manufacturer (earlier high end HDS systems had the same guarantee).

Whatever the reason,  performance is still a factor given the amount of cache and the sheer number of drives the system supports.

One thing that is probably never a factor is ease of use – since the USP/VSP are complex beasts to manage, something your very likely need significant amounts of training. One story I remember being told is a local HDS rep in the Seattle area mentioned to a 3PAR customer “after a few short weeks of training you’ll feel as comfortable on our platform as you do on 3PAR”. Something like that, the customer said “you just made my point for me”. Something like that anyways.

Would the VSP dethrone the HP P10000 ? I found the timing of the release of the numbers an interesting coincidence after all, I mean the VSP is over a year old at this point, why release now ?

So I opened the PDF, and hesitated .. what would be the results? I do love 3PAR stuff and I still view them as  somewhat of an underdog even though they are under the HP umbrella.

Wow, was I surprised at the results.

Not. Even. Close.

The VSP’s performance was not much better than that of the USP-V which was released more than four years ago. The performance itself is not bad but it really puts the 3PAR V800 performance in some perspective:

  • VSP -   ~270,000 IOPS @ $8.18/IOP
  • V800 – ~450,000 IOPS @ $6.59/IOP

But it doesn’t stop there

  • VSP -  ~$45,600 per usable TB (39% unused storage ratio)
  • V800 – ~$13,200 per usable TB (14% unused storage ratio)

Hitachi managed to squeeze in just below the limit for unused storage – which is not allowed to be above 45%, sounds kind of familiar. The VSP had only 17TB additional usable capacity than the original USP-V as tested. This really shows how revolutionary sub disk distributed RAID is for getting higher capacity utilization out of your system, and why I was quite disappointed when the VSP came out without anything resembling it.

Hitachi managed to improve their cost per IOP on the VSP vs the USP but their cost per usable TB has skyrocketed, about triple the price of the original USP-V results. I wonder why this is ? (I mis counted the digits in my original calculations, in fact the VSP is significantly cheaper than the USP when it was tested!) One big change in the VSP is the move to 2.5″ disk drives. The decision to use 2.5″ drives did kind of hamper results I believe since this is not a SPC-1/E Energy Efficiency test so power usage was never touted. But the largest 2.5″ 15k RPM drive that is available for this platform is 146GB(which is what was used).

One of the customers(I presume) at the HP Storage presentation I was at recently was kind of dogging 3PAR saying the VSP needs less power per rack than the T or V class(which requires 4x208V 30A per fully loaded rack).

I presume it needs less power because of the 2.5″ drives, also overall the drive chassis on 3PAR boxes do draw quite a bit of power by themselves(200W empty on T/V, 150W empty on F – given the T/V hold 250% more drives/chassis it’s a pretty nice upgrade).

Though to me especially on a system like this, power usage is the last thing I would care about. The savings from disk space would pay for the next 10 years of power on the V800 vs the VSP.

My last big 3PAR array cut the number of spindles in half and the power usage in half vs the array it replaced, while giving us roughly 50% better performance at the same time and the same amount of usable storage. So knowing that, and the efficiency in general I’m much less concerned as to how much power a rack requires.

So Hitachi can get 384 x 2.5″ 15k RPM disks in a single rack, and draw less power than 3PAR can get with 320 disks in a single rack.

You could also think of it this way: Hitachi can get ~56TB RAW of 15k RPM space in a single rack, and 3PAR can get 192TB RAW of 15k RPM space in a single rack, nearly four times the RAW space for double the power(nearly five times the usable(4.72x precisely – V800 @ 80% utilization VSP @ 58%) space due to the architecture), for me that’s a good trade off, a no brainer really.

The things I am not clear on with regards to these results – is this the best the VSP can do?  This does appear to be a dual chassis system. The VSP supports 2,000 spindles, the system tested only had 1,152, which is not much more than the 1,024 tested on the original USP-V. Also the VSP supports up to 1TB of cache however the system tested only had 512GB (3PAR V800 had 512GB of data cache too).

Maybe it is one of those situations where this is the most optimal bang for the buck on the platform,  perhaps the controllers just don’t have enough horsepower to push 2,000 spindles at full tilt – I don’t know.

Hitachi may have purpose built hardware, but it doesn’t seem to do them a whole lot of good when it comes to raw performance. I don’t know about you but I’d honestly feel let down if I was an HDS customer. Where is the incentive to upgrade from USP-V  to VSP ? The cost for usable storage is far higher, the performance is not much better. Cost per IOP is less, but I suspect the USP-V at this stage of the game with more current pricing for disks and stuff would be even more competitive than VSP. (correction from above, due to my mis-reading of the digits ($135k/TB instead of $13.5k/TB VSP is in fact cheaper making it a good upgrade from USP from that perspective! Sorry for the error 🙂 )

Maybe it’s in the software – I’ve of course never used a USP/VSP system but perhaps the VSP has newer software that is somehow not backwards compatible with the USP, though I would not expect this to be the case.

Complexity is still high, the configuration of the array stretches from page 67 of the full disclosure documentation to page 100. 3PAR’s configuration by contrast is roughly a single page.

Why do you think someone would want to upgrade from a USP-V to a VSP?

I still look forward to the day when the likes of EMC make their VMAX architecture across their entire product line, and HDS also unifies their architecture. I don’t know if it’ll ever happen but it should be possible at least with VMAX given their sound thumping of the Intel CPU drum set.

The HDS AMS 2000 series of products was by no means compelling either, when you could get higher availability(by means of persistent cache with 4 controllers), three times the amount of usable storage, and about the same performance on a 3PAR F400 for about the same amount of money.

Come on EMC, show us some VMAX SPC-1 love, you are, after all a member of SPC now (though kind of odd your logo doesn’t show up, just the text of the company name).

One thing that has me wondering on this VSP configuration – with such little disk space available on the system I have to wonder why anyone would bother with such a configuration with spinning rust on this platform for performance and just go SSD instead. Well one reason may be a 400GB SSD would run $35-40,000 (after 50% discount, assuming that is list price), ouch. 220 of those (system maximum is 256) would net you roughly 88TB raw (maybe 40TB usable), but cost $7.7M for the SSDs alone.

On the V800 side a 4-pack of 200GB SSDs will cost you roughly $25,000 after discount(assuming that is list price). Fully loading a V800 with the maximum of 384 SSDs(77TB raw) would cost roughly $2.4M for the SSDs alone(and still consume 7 racks) I think I like the 3PAR SSD strategy more  — not that I would ever do such a configuration!

Space would be more of course if the SSD tests used RAID 5 instead of RAID 1, I just used RAID 1 for simplicity.

Goes to show that these big boxes have a long way to go before they can truly leverage the raw performance of SSD. with 200-300 SSDs in either of these boxes the controllers would be maxed out and the SSDs would probably for the most part be idle. I’ve been saying for a while how stupid it seems for such a big storage system to be overwhelmed so completely by SSDs and that the storage systems need to be an order of magnitude(or more) faster. 3PAR did a good start with doubling of performance, but I’m thinking the boxes need to do at least 1M IOPS to disk, if not 2M, for systems such as these — how long will it take to get there?

Maybe HDS could show better results with 900GB 10k RPM disks(and put something like 1700 disks instead of 1,152 instead of the 146GB 15k RPM disks, should hopefully show much lower per usable TB costs, though their IOPS cost would probably shoot up. Though from what I can see, 600GB is the max supported 2.5″ 10k RPM disk supported. 900GB @ 58% utilization would yield about 522GB, and 600GB on 3PAR @ 80% utilization would yield about 480GB.

I suspect they could get their costs down significantly more if they went old school and supported larger 3.5″ 15k RPM disks and used those instead(the VSP supports 3.5″ disks just nothing other than 2TB Nearline (7200 RPM) and 400GB Flash – a curious decision). If you were a customer isn’t that something you’d be interested in ? Though another compromise you make with 3.5″ disks on the VSP is your then limited to 1,280 spindles, rather than 2,048 with 2.5″. Though this could be a side effect of a maximum addressable capacity of 2.5PB which is reached with 1,280 2TB disks, they very well could probably support 2,048 3.5″ disks if they could fit in the 2.5PB limit.

Their documentation says actual maximum usable with Nearline 2TB disks is 1.256PB (with RAID 10). With 58% capacity utilization that 1.25PB drops dramatically to 742TB. With RAID 5 (7+1) they say 2.19PB usable, let’s take that 58% number again 1.2PB @ 58% capacity utilization.

V800 by contrast would top out at 640TB with RAID 10 @ 80% utilization (stupid raw capacity limits!! *shakes fist at sky*),  or somewhere in the 880TB-1.04PB (@80%) range with RAID 5 (depending on data/parity ratio).

Another strange decision on both HDS and 3PAR’s part was to only certify the 2TB disk on their platform, and not anything smaller. Since both systems become tapped out at roughly half the number of supported spindles when using 2TB disks due to capacity limits.

An even more creative solution(to work around the limitation!!), which I doubt is possible is somehow restrict the addressable capacity of each disk to 1/2 the size, so you could in effect get 1,600 2TB disks in each with 1TB of capacity, then when they release software updates to scale the box to even higher levels of capacity (32GB of control cache can EASILY handle more capacity) just unlock the drives and get that extra capacity free. That would be nice at least, probably won’t happen though. I bet it’s technically possible on the 3PAR platform due to their chunklets. 3PAR in general doesn’t seem as interested in creative solutions as I am 🙂 (I’m sure HDS is even more rigid)

Bottom Line

So HDS has managed to get their cost for usable space down quite a bit, from $135,000/TB to around $45,000/TB, and improve performance by about 25%

They still have a long ways to go in the areas of efficiency, performance, scalability, simplicity and unified architecture. I’m sure there will be plenty of customers out there that don’t care about those things(or are just not completely informed) and will continue to buy VSP for other reasons, it’s still a solid platform if your willing to make those trade offs.

October 22, 2011

IBM posts XIV SPC-2 results

Filed under: Storage — Tags: , , — Nate @ 8:35 pm

[UPDATED – as usual I re-read my posts probably 30 times after I post them and refine them a bit if needed, this one got quite a few changes. I don’t run a newspaper here so I don’t aim to have a completely thought out article when I hit post for the first time]

IBM finally came out and posted some SPC-2 results for their XIV platform, which is better than nothing but unfortunately they did not post SPC-1 results.

SPC-2 is a sequential throughput test, more geared towards things like streaming media and data warehousing instead of random I/O which represents a more typical workload.

The numbers are certainly very impressive though, coming in at 7.3 gigabytes/second, besting most other systems out there, 42 megabytes/second per disk, IBM’s earlier high end storage array was only able to inch out 12 megabytes/second per disk(with 4 times the number of disks) with disks that were twice as fast. So at least 8 times the I/O capacity, for only about 25% more performance vs XIV, that’s a stark contrast!

SATA/Nearline/7200RPM SAS disks are typically viewed as good at sequential operations, though I would expect 15k RPM disks to do at least as well, since the faster RPM should result in more data traveling under the head at a faster rate, perhaps a sign of a good architecture in XIV with it’s distributed mirrored RAID.

While the results are quite good – again it doesn’t represent the most common types of workloads out there which is random I/O.

The $1.1M discounted price of the system seems quite high for something that only has 180 disks on it(discounts on the system seem to for the most part be 70%), though there is more than 300 gigabytes of cache. I bought a 2-node 3PAR T400 with 200 SATA disks shortly after the T was released in 2008 for significantly less, of course it only had 24GB of data cache!

I hope the $300 modem IBM(after 70% discount) is using is a USR Courier! (Your Price: $264.99still leaves a good profit for IBM). Such fond memories of the Courier.

I can only assume at this point of time IBM has refrained from posting SPC-1 results is because with a SATA-only system the results would not be impressive. In a fantasy world with nearline disks and a massive 300GB cache maybe they could achieve 200-250 IOPS/disk which would put the $1.1M system with 180 disks 36,000 – 45,000 SPC-1 IOPS, or $24-30/IOP.

A more realistic number is probably going to be 25,000 or less($44/IOP), making it one of the most expensive systems out there for I/O (even if it could score 45,000 SPC-1). 3PAR would do 14,000 IOPS (not SPC-1 IOPS mind you, SPC-1 number would probably be lower) with 180 SATA disks and RAID 10 by contrast, based on their I/O calculator with 80% read/20% write workload for about 50% less cost(after discounts) for a 4-node F400.

One of the weak spots on 3PAR is the addressable capacity per controller pair, for I/O and disk connectivity purposes a 2-node F200 (much cheaper) could easily handle 180 2TB SATA disks, but from a software perspective that is not the case. I have been complaining about this for more than 3 years now, they’ve finally addressed it to some extent in the V-class but I am still disappointed to the extent it has been addressed per the supported limits(1.6PB, should be more than double that) that exist today, but at least with the V they have enough memory on the box to scale it up with software upgrades(time will tell if such upgrades come about however).

I would not even use a F400 for this if it was me opting instead for a T800 (800TB) or a V class(800-1600TB), because with 360TB raw on the system that is very close to the limit of the F400’s addressable capacity (384TB), or the T400(400TB). You could of course get a 4-node T800(or a 2-node V400 or V800)  to start, then add additional controllers to get beyond 400TB of capacity if/when the need arises. With the 4-controller design you also get the wonderful persistent cache feature built in (one of the rare software features that is not separately licensed).

But for this case, comparing a nearly maxed out F400 against a maxed out XIV is still fair – it is one of the main reasons I did not consider XIV during my last couple storage purchases.

So there is a strong use case of when to use XIV with these results – throughput oriented workloads! The XIV would absolutely destroy the F400 in throughput, which tops out at 2.6GB/sec (to disk).

With software such as Vertica out there which slashes the need for disk I/O on data warehouses given it’s advanced design, and systems such as Isilon being so geared towards things like scale out media serving (using NFS for media serving seems like a more ideal protocol anyways), I can’t help but wonder what XIV’s place is in the market, at this price point at least. It does seem like a very nice platform from a software perspective, and with their recent switch to Infiniband from 1 Gigabit ethernet a good part of their hardware has been improved as well, also it has SSD read cache coming.

I will say though that this XIV system will handily beat even a high end 3PAR T800 for throughput. While 3PAR has never released SPC-2 numbers the T800 tops out at a 6.4 gigabytes/second(from disk), and it’s quite likely it’s SPC-2 results would be lower than that.

With the 3PAR architecture being as optimized as it is for random I/O I do believe it would suffer vs other platforms with sequential I/O. Not that the 3PAR would run slow, but it would quite likely run slower due to how data is distributed on the system. That is just speculation though a result of not having real numbers to base it on. My own production random I/O workloads in the past have had 15k RPM disks running in the range of 3-4MB/second(numbers are extrapolated as I have only had SATA and 10k RPM disks in my 3PAR arrays to-date though my new one that is coming is 15k RPM), as such with a random I/O workload you can scale up pretty high before you run into any throughput limits on the system (in fact if you max out a T800 with 1,280 drives you could do as high as 5MB/second/disk before you would hit the limit). Though XIV is distributed RAID too so who knows..

Likewise I suspect 3PAR/HP have not released SPC-2 numbers because it would not reflect their system in the most positive light, unlike SPC-1.

Sorry for the tangents on 3PAR  🙂

« Newer PostsOlder Posts »

Powered by WordPress