TechOpsGuys.com Diggin' technology every day

August 8, 2013

Nth Symposium 2013: HP Bladesystem vs Cisco UCS

Filed under: General — Tags: , , , — Nate @ 11:00 pm

Travel to HP Storage Tech Day/Nth Generation Symposium was paid for by HP; however, no monetary compensation is expected nor received for the content that is written in this blog.

I can feel the flames I might get for this post but I’m going to write about it anyway because I found it interesting. I have written about Cisco UCS in the past(very limited topics), have never been impressed with it, and really at the end of the day I can’t buy Cisco on principle alone – doesn’t matter if it was $1, I can’t do it (in part because I know that $1 cost would come by screwing over many other customers to make that price possible for me).

Cisco has gained a lot of ground in the blade market since they came out with this system a few years ago and I think they are in 3rd place, maybe getting close to 2nd (last I saw 2nd was a very distant position behind HP).

So one of the keynotes (I guess you can call it that? it was on the main stage) was someone from HP who says they recently re-joined HP earlier in the year(or perhaps last year) after spending a couple of years at Cisco both selling and training their partners on how to sell UCS to customers. So obviously that was interesting to me, hearing this person’s perspective on the platform. There was a separate break-out session on this topic that went into more detail but it was NDA-only so I didn’t attend.

I suppose what was most striking is HP going out of their way to compare themselves against UCS, that says a lot right there. They never mentioned Dell or IBM stuff, just Cisco. So Cisco obviously has gotten some good traction (as sick as that makes me feel).

Out of band management

HP claims that Cisco has no out of band management on UCS, there are primary and backup data paths but if those are down then you are SOL. HP obviously has (optionally) redundant out of band management on their blade system.

I love out of band management myself, especially full lights out. My own HP VMware servers have dedicated in-band(1GbE) as well as the typical iLO out of band management interfaces. This is on top of the 4x10GbE and 2x4Gbps FC for storage. Lots of connectivity. When I was having issues with our Qlogic 10GbE NICs last year this came in handy.

Fault domains

This can be a minor issue – mainly an implementation one. Cisco apparently allows UCS to have a fault domain of up to 160 servers, vs HP is 16(one chassis). So you can, of course, lower your fault domain on UCS if you think about this aspect of things — how many customers realize this and actually do something about it? I don’t know.

HP Smart Update Manager

I found this segment quite interesting. HP touts their end to end updates mechanism which includes:

  • Patch sequencing
  • Driver + Firmware management
  • Unified service pack (1 per quarter)

HP claims Cisco has none of these, they cannot sequence patches, their management system does not manage drivers (it does manage firmware), and the service packs are not unified.

At this point the HP person pointed out a situation a customer faced recently where they used the UCS firmware update system to update the firmware on their platform. They then rebooted their ESX systems(I guess for the firmware to take effect), and the systems could no longer see the storage. It took the customer on the line with Cisco, VMware, and the storage company 20 hours until they figured out the problem was the drivers were out of sync with the firmware which was the reason for the downtime.

I recall a few years ago another ~20 hour outage on a Cisco UCS platform at a sizable company in Seattle for similar reasons, I don’t know why in both cases it took so long to resolve, in the Seattle case there was a firmware bug (known bug) that was causing link flapping and as a result massive outage because I believe storage was not very forgiving to that. Fortunately Cisco had a patch but it took em ~20 hours of hard downtime to figure out the problem.

I’m sure there are similar stories for the HP end of things too… I have heard of some nasty issues with flex fabric and virtual connect.  There is one feature I like about flexfabric and virtual connect, that is the chassis-based MAC/WWN assignments. Everything else they can keep. I don’t care about converged ethernet, I don’t care about reducing my cable count(having a few extra fibre cables for storage per chassis really is nothing)…

Myself the only outages I have had that have lasted that long have been because of application stack failures, I think the longest infrastructure related outage I’ve been involved with in the past 15 years was roughly six, maybe eight hours.  I have had outages where it took longer than 20 hours to recover fully from – but the bulk of that time the system was running we just had recovery steps to perform. Never had a 20 hour outage where 15 hours into the thing nobody has any idea what is the problem or how to fix it.

Longest outage ever though was probably ~48-72 hours – and that was entirely application stack failure. That was the time we got all the senior software developers and architects in a room and asked them How do we fix this? and they gave us blank stares and said We don’t know, it’s not supposed to do this.  Not a good situation to be in!

Anyway, back on topic.

HP says since December 2011 they have released 9 critical updates, and Cisco have released 38 critical updates.

The case for intelligent compute

I learned quite a bit from this segment as well. Back in 2003 the company I was at was using HP and Compaq gear, it ran well though obviously was pretty expensive. Everything was DL360s, some DL380s, some DL580s. When it came time to do a big data center refresh we wanted to use SATA disks to cut some costs, so we ended up going with a white box company instead of HP (this was before HP had the DL100 series). I learned a lot from that experience, and was very happy to return to HP as a customer at my next company(though I certainly realize given the right workload HP’s premium may not be worth it – but for highly consolidated virtualized stuff I really don’t want to use anything else). The biggest issue I had with white box stuff was bad ram. It seemed to be everywhere. Not long after we started deployment I started using the Cerberus Test Suite to burn in our systems which caught a lot of it. Cerberus is awesome if you haven’t tried it. I even used it on our HP gear mainly to drive CPU and memory to 100% usage to burn them in (no issues found).

HP Advanced ECC Outcomes

HP Advanced ECC Outcomes

HP has a technology called Advanced ECC, which they’ve had since I believe 1996, and is standard on at least all 300-series servers and up. 10 years ago when our servers rarely had more than 2GB of memory in them(I don’t think we went 64-bit until at least 2005), Advanced ECC wasn’t a huge deal, 2GB of memory is not much. Today, with my servers having 384GB ..I really refuse to run any high memory configuration without something like that. IBM has ChipKill, which is similar. Dell has nothing in this space. Not sure about Cisco(betting they don’t, more on that in a moment).

HP's advanced ECC

HP Advanced ECC

HP talked about their massive numbers of sensors with some systems(I imagine the big ones!) having up to 1,600 sensors in them. (Here is a neat video on Sea of Sensors from one of the engineers who built them – one thing I learned is the C7000 chassis has 104 different fan speeds for maximum efficiency) HP introduced pre failure alerting in 1995, and has had pre failure warranties for a long time (perhaps back to 1995 as well). They obviously have complete hypervisor integration (one thing I wasn’t sure of myself until recently, while upgrading our servers one of the new sticks went bad and an alert popped up in vCenter and I was able to evacuate the host and get the stick replaced without any impact — this failure wasn’t caught by burn-in, just regular processing, I didn’t have enough spare capacity to take out too many systems to dedicate to burn-in at that point).

What does Cisco have? According to HP not much. Cisco doesn’t treat the server with much respect apparently, they treat it as something that can fail and you just get it replaced or repaired at that point.

UCS: Post failure response

UCS: Post failure response

That model reminds me of what I call built to fail which is the model that public clouds like Amazon and stuff run on. It’s pretty bad. Though at least in Cisco’s case the storage is shared and the application can be restarted on another system easily enough, public cloud you have to build a new system and configure it from scratch.

The point here is obviously, HP works hard to prevent the outage in the first place, Cisco doesn’t seem to care.

Simplicity Matters

I’ll just put the full slide here there’s not a whole lot to cover. HP’s point here is the Cisco way is more complicated and seems angled to drive more revenue for the network. HP is less network oriented, and they show you can directly connect the blade chassis to a 3PAR storage system(s). I think HP’s diagram is even a bit too complicated for all but the largest setups you could easily eliminate the distribution layer.

BladeSystem vs UCS: Simplicity matters

BladeSystem vs UCS: Simplicity matters

The cost of the 17th server

I found this interesting as well, Cisco goes around telling folks that their systems are cheaper, but they don’t do an apples to apples comparison, they use a Smart Play Bundle, not a system that is built to scale.

HP put a couple of charts up showing the difference in cost between the two solutions.

BladeSystem vs UCS TCO: UCS Smart Play bundle

BladeSystem vs UCS TCO: UCS Smart Play bundle

BladeSystem vs UCS: UCS Built to scale

BladeSystem vs UCS TCO: UCS Built to scale

Portfolio Matters

Lastly HP went into some depth on comparing the different product portfolios and showed how Cisco was lacking in pretty much every area whether it was server coverage, storage coverage, blade networking options, software suites and the integration between them.

They talked about how Cisco has one way to connect networking to UCS, HP has many whether it is converged ethernet(similar to Cisco), or regular ethernet, native Fibre channel, Infiniband, and even SAS to external disk enclosures. The list goes on and on for the other topics but I’m sure you get the point. HP offers more options so you can build a more optimal configuration for your application.

BladeSystem vs UCS: Portfolio matters

BladeSystem vs UCS: Portfolio matters

Then they went into analyst stuff and I took a nap.

In reviewing the slide deck they do mention Dell once.. in the slide, not by the speaker –

HP vs Dell in drivers/firmware management

HP vs Dell in drivers/firmware management

By attending this I didn’t learn anything that would affect my purchasing in the future, as I mentioned I won’t buy Cisco for any reason already. But it was still interesting to hear about.

August 7, 2013

Nth Symposium 2013: HP Moonshot

Filed under: General — Tags: , , — Nate @ 10:05 pm

Travel to HP Storage Tech Day/Nth Generation Symposium was paid for by HP; however, no monetary compensation is expected nor received for the content that is written in this blog.

HP launched Moonshot a few months ago, I wrote at the time I wasn’t overly impressed. At the Nth Symposium there were several different HP speakers that touched on Moonshot.

HP has been blasting the TV airwaves with Moonshot ads – something that I think is a waste of money – just as much as it would be if HP were blasting the TV with 3PAR ads. Moonshot obviously is a special type of system- and those in that target market will obviously (to me anyway) know about it. Perhaps it’s more of an ad to show HP is innovating still, in which case it’s pretty decent (not as good as the IBM Linux commercials from years back though!).

Initial node for HP Moonshot

Initial node for HP Moonshot for Intel Atom processors

Sure it is cute, the form factor certainly grabs your eye. Micro servers are nothing new though, HP is just the latest entrant into the market. I immediately got into tech mode and wanted to know how it measured up to AMD’s Seamicro platform. In my original post I detail several places where I feel Moonshot falls short of Seamicro which has been out for years.

Seamicro Node for Intel Atom processors

Seamicro Node for Intel Atom processors – Note no storage! All of that is centralized in the chassis, virtualized so that it is very flexible.

HP touts this as a shift in the way of thinking – going from building apps around the servers to building servers around the apps (while they sort of forgot to mention we’ve been building servers around the apps in the form of VMs for many years now). I had not heard of the approach described like that until last week, it is an interesting description.

HP was great in being honest about who should use this system – they gave a few different use cases, but they were pretty adamant that Moonshot is not going to take over the world, it’s not going to invade every SMB and replace your x86 systems with ARM or whatever. It’s a purpose built system for specific applications. There is only one Moonshot node today, in the future there will be others, each targeted at a specific application.

One of them will even have DSPs on it I believe, which is somewhat unique. HP calls Moonshot out as:

  • 77% less costly
  • 89% less energy
  • 80% less space
  • 97% less complex

Certainly very impressive numbers. If I had an application that was suitable for Moonshot then I’d certainly check it out.

One of the days that I was there I managed to get myself over to the HP Moonshot booth and ask the technical person there some questions. I don’t know what his role was, but he certainly seemed well versed in the platform and qualified to answer my basic questions.

My questions were mainly around comparing Moonshot to Seamicro – specifically the storage virtualization layers and networking as well. His answers were about what I expected. They don’t support that kind of thing, and there’s no immediate plans to. Myself, I think the concept of being able to export read-only file system(s) from central SSD-storage to dozens to hundreds of nodes within the Seamicro platform a pretty cool idea. The storage virtualization sounds very flexible and optionally extends well beyond the Seamicro chassis up to ~1,400 drives.

Same for networking, Moonshot is pretty basic stuff. (At one point Seamicro advertised integrated load balancing but I don’t see that now).  The HP person said Moonshot is aimed squarely at web applications, scale out etc.. Future modules may be aimed at memcache nodes, or other things.. There will also be a storage module as well(I forget specifics but it was nothing too exciting).

I believe the HP rep also mentioned how they were going to offer units with multiple servers on a single board (Seamicro has done this for a while as well).

Not to say Moonshot is a bad system, I’m sure HP will do pretty well with them, but I find it hard to get overly excited about it when Seamicro seems to be years ahead of Moonshot already. Apparently Moonshot was in HP Labs for a decade, and it wasn’t until one of the recent CEOs(I think a couple of years ago) came around to HP Labs and said something like “What do you have that I can sell?” and the masterminds responded “We have Moonshot!”, and it took them a bit of time to productize it.

(I have no personal experience with either platform nor have I communicated with anyone who has told me personal experience with either platform so I can only go by what I have read/been told of either system at this point)

August 2, 2013

HP Storage Tech Day – bits and pieces

Filed under: Storage — Tags: , , — Nate @ 9:56 am

Travel to HP Storage Tech Day/Nth Generation Symposium was paid for by HP; however, no monetary compensation is expected nor received for the content that is written in this blog.

For my last post on HP Storage tech day, the remaining topics that were only briefly covered at the event.

HP Converged Storage Management

There wasn’t much here other than a promise to build YASMT (Yet another storage management tool), this time it will be really good though. HP sniped at EMC on several occasions for the vapor-ness of ViPR. Though at least that is an announced product with a name. HP has a vision, no finalized name, no product(I’m sure they have something internally) and no dates.

I suppose if your in the Software defined storage camp which is for the separation of data and control plane, this may be HP’s answer to that.

HP Converged Storage Management Strategy

HP Converged Storage Management Strategy

The vision sounds good as always, time will tell if they can pull it off. The track record for products like this is not good. More often than not the tools lower the bar on what is supported to some really basic set of things, and are not able to exploit the more advanced features of the platform under management.

One question I did ask is whether or not they were going to re-write their tools to leverage these new common APIs, and the answer was sort of what I expected – they aren’t. At least short term the tools will use a combination of these new APIs as well as whatever methods they use today. So this implies that only a subset of functionality will be available via the APIs.

In contrast I recall reading something, perhaps a blog post, about how NetApp’s tools use all of their common APIs(I believe end to end API stuff for them is fairly recent). HP may take a couple of years to get to something like that.

HP Openstack Integration

HP is all about the Openstack. They seem to be living and breathing it. This is pretty neat, I think Openstack is a good movement, though the platform still seems some significant work to mature.

I have concerns, short term concerns about HP’s marketing around Openstack and how easy it is to integrate into customer environments. Specifically Openstack is a fast moving target, lacks maturity and at least as recently as earlier this year lacked a decent community of IT users (most of it was centered on developers – probably still is). HP’s response is they are participating deeply within the community (which is good long term), and are being open about everything (also good).

I specifically asked if HP was working with Red Hat to make sure the latest HP contributions (such as 3PAR support, Fibre Channel support) were included in the RH Open Stack. They said no, they are working with the community, and not partners. This is of course good and bad. Good that they are being open, bad in that it may result in some users not getting things for 12-24 months because the distribution of Openstack they chose is too old to support it.

I just hope that Openstack matures enough that it gets a stable set of interfaces. Unlike say the Linux kernel driver interfaces which just annoy the hell out of me(have written about that before). Compatibility people!!!

Openstack Fibre Channel support based on 3PAR

HP wanted to point out that the Fibre Channel support in Openstack was based on 3PAR. It is a generic interface and there are plugins for a few different array types. 3PAR also has iSCSI support for Openstack as of a recent 3PAR software release as well.

StoreVirtual was first Openstack storage platform

Another interesting tidbit is that Storevirtual was the first(?) storage platform to support Openstack. Rackspace used it(maybe still does, not sure), and contributed some stuff to make it better. HP also uses it in their own public cloud(not sure if they mentioned this or not but I heard from a friend who used to work in that group).

HP Storage with Openstack

Today HP integrates with Openstack at the block level on both the StoreVirtual and 3PAR platforms. Work is in progress for StoreAll which will provide file and object storage. Fibre channel support is available on the 3PAR platform only as far as HP goes. StoreVirtual supports Fibre Channel but not with Openstack(yet anyway, I assume support is coming).

This contrasts with the competition, most of whom have no Openstack support and haven’t announced anything to be released anytime soon. HP certainly has a decent lead here, which is nice.

HP Openstack iSCSI/FC driver functionality

All of HP’s storage work with Openstack is based on the Grizzly release which came out around April 2013.

  • Create / Delete / Attach / Detach volumes
  • Create / Delete Snapshots
  • Create volume from snapshot
  • Create cloned volumes
  • Copy image to volume / Copy volume to image (3PAR iSCSI only)

New things coming in Havana release of Openstack from HP Storage

  • Better session management within the HP 3PAR StoreServ Block Storage Drivers
  • Re-use of existing HP 3PAR Host entries
  • Support multiple 3PAR Target ports in HP 3PAR StoreServ Block Storage iSCSI Driver
  • Support Copy Volume To Image & Copy Image To Volume with Fibre Channel Drivers (Brick)
  • Support Quality of Service (QoS) setting in the HP 3PAR StoreServ Block Storage Drivers
  • Support Volume Sets with predefined QoS settings
  • Update the hp3parclient that is part of the Python Standard Library

Fibre channel enhancements for Havana and beyond

Fibre Channel enhancements for Openstack Havana and beyond

Fibre Channel enhancements for Openstack Havana and beyond

Openstack portability

This was not at the Storage Tech Day – but I was at a break out session that talked about HP and Openstack at the conference and one of the key points they hit on was the portable nature of the platform, run it in house, run it in cloud system, run it at service providers and move your workloads between them with the same APIs.

Setting aside a moment the fact that the concept of cloud bursting is a fantasy for 99% of organizations out there(your applications have to be able to cope with it, your not likely going to be able to scale your web farm and burst into a public cloud when those web servers have to hit databases that reside over a WAN connection the latency hit will make for a terrible experience).

Anyway setting that concept aside – you still have a serious problem- short term of compatibility across different Openstack implementations because different vendors are choosing different points to base their systems off of. This is obviously due to the fast moving nature of the platform and when the vendor decides to launch their project.

This should stabilize over time, but I felt the marketing message on this was a nice vision, it just didn’t represent any reality I am aware of today.

HP contrasted this to being locked in to say the vCloud API. I think there are more clouds public and private using vCloud than Openstack at this point. But in any case I believe use cases for the common IT organization to be able to transparently leverage these APIs to burst on any platform- VMware, Openstack, whatever – is still years away from reality.

If you use Openstack’s API, you’re locked into their API anyway. I don’t leverage APIs myself(directly) I am not a developer – so I am not sure how easy it is to move between them. I think the APIs are likely much less of a barrier than the feature set of the underlying cloud in question. Who cares if the API can do X and Y, if the provider’s underlying infrastructure doesn’t yet support that capability.

One use case that could be done today, that HP cited, is running development in a public cloud then pulling that back in house via the APIs. Still that one is not useful either. The amount of work involved in rebuilding such an environment internally should be fairly trivial anyway(the bulk of the work should be in the system configuration area, if your using cloud you should also be using some sort of system management tool, whether it is something like CFEngine, Puppet, Chef, or something similar). That and – this is important in my experience – development environments tend to not be resource intensive. Which makes it great to consolidate them on internal resources (even ones that share with production – I have been doing this since for six years already).

My view on Openstack

At least one person at HP I spoke with believes most stuff will be there by the end of this year but I don’t buy that for a second. I look at things like Red Hat’s own Openstack distribution taking seemingly forever to come out(I believe it’s a few months behind already and I have not seen recent updates on it), and Rackspace abandoning their promise to support 3rd party Open stack clouds earlier this year.

All of what I say is based on what I read — I have no personal experience with Openstack (nor do I plan to get immediate experience, the lack of maturity of the product is keeping me away for now). Based on what I have read, conferences(was at a local Red Hat conference last December where they covered Openstack – that’s when reality really hit me and I learned a good deal about it and honestly lost some enthusiasm in the project) and some folks I have chatted/emailed with Openstack is still a VERY much work in progress, evolving quickly. There’s really no formal support community in place for a stable product, developers are wanting to stay at the leading edge and that’s what they are willing to support. Red Hat is off in one corner trying to stabilize the Folsum release from last year to make a product out of it, HP is in another corner contributing code to the latest versions of Openstack that may or may not be backwards compatible with Red Hat or other implementations.

It’s a mess.. it’s a young project still so it’s sort of to be expected. Though there are a lot of folks making noise about it. The sense I get is if you are serious about running an Open Stack cloud today, as in right now, you best have some decent developers in house to help manage and maintain it. When Red Hat comes out with their product, it may solve a bunch of those issues, but still it’ll be a “1.0”, and there’s always some not insignificant risk to investing in that without a very solid support structure inside your organization (Red Hat will of course provide support but I believe that won’t be enough for most).

That being said it sounds like Openstack has a decent future ahead of it – with such large numbers of industry players adopting support for it, it’s really only a matter of time before it matures and becomes a solid platform for the common IT organization to be able to deploy.

How much time? I’m not sure. My best guesstimate is I hope it can reach that goal within five years. Red Hat, and others should be on versions 3 and perhaps 4 by then. I could see someone such as myself starting to seriously dabble in it in the next 12-16 months.

Understand that I’m setting the bar pretty high here.

Last thoughts on HP Storage Tech Day

I had a good time, and thought it was a great experience. They had very high caliber speakers, were well organized and the venue was awesome as well. I was able to drill them pretty good, the other bloggers seemed to really appreciate that I was able to drive some of the technical conversations. I’m sure some of my questions they would of rather not of answered since the answers weren’t always “yes we’ve been doing that forever..!”, but they were honest and up front about everything. When they could not be, they said so(“can’t talk about that here we need a Nate Disclosure Agreement”).

I haven’t dealt much at all with the other internal groups at HP, but I can say the folks I have dealt with on the storage side have all been AWESOME. Regardless of what I think about whatever storage product they are involved with they are all wonderful people both personally and professionally.

These past few posts have been entirely about what happened on Monday.  There are more bits that happened at the main conference on Tues-Thur, and once I get the slides for those slide decks I’ll be writing more about that, there were some pretty cool speakers. I normally steer far clear of such events, this one was pretty amazing though. I’ll save the details for the next posts.

I want to thank the team at HP, and Ivy Worldwide for organizing/sponsoring this event – it was a day of nothing but storage (and we literally ran out of time at the end, one or two topics had to be skipped). It was pretty cool. This is the first event I’ve ever traveled for, and the only event where there was some level of sponsorship (as mentioned HP covered travel, lodging and food costs).

July 30, 2013

HP Storage Tech Day – 3PAR

Filed under: Events,Storage — Tags: , , — Nate @ 2:04 am

Before I forget again..

Travel to HP Storage Tech Day/Nth Generation Symposium was paid for by HP; however, no monetary compensation is expected nor received for the content that is written in this blog.

So, HP hammered us with a good seven to eight hours of storage related stuff today, the bulk of the morning was devoted to 3PAR and the afternoon covered StoreVirtual, StoreOnce, StoreAll, converged management and some really technical bits from HP Labs.

This post is all about 3PAR. They covered other topics of course but this one took so long to write I had to call it a night, will touch on the other topics soon.

I won’t cover everything since I have covered a bunch of this in the past. I’ll try not to be too repetitive…

I tried to ask as many questions as I could, they answered most .. the rest I’ll likely get with another on site visit to 3PAR HQ after I sign another one of those Nate Disclosure Agreements (means I can’t tell you unless your name is Nate). I always feel guilty about asking questions directly to the big cheeses at 3PAR. I don’t want to take up any of their valuable time…

There wasn’t anything new announced today of course, so none of this information is new, though some of is new to this blog, anyway!

I suppose if there is one major take away for me for this SSD deep dive, is the continued insight into how complex storage really is, and how well 3PAR does at masking that complexity and extracting the most of everything out of the underlying hardware.

Back when I first started on 3PAR in late 2006, I really had no idea what real storage was. As far as I was concerned one dual controller system with 15K disks was the same as the next. Storage was never my focus in my early career (I did dabble in a tiny bit of EMC Clariion (CX6/700) operations work – though when I saw the spreadsheets and visios the main folks used to plan and manage I decided I didn’t want to get into storage), it was more servers, networking etc.

I learned a lot in the first few years of using 3PAR, and to a certain extent you could say I grew up on it. As far as I am concerned being able to wide stripe, or have mesh active controllers is all I’ve ever (really) known. Sure since then I have used a few other sorts of systems. When I see architectures and processes of doing things on other platforms I am often sort of dumbfounded why they do things that way. It’s sometimes not obvious to me that storage used to be really in the dark ages many years ago.

Case in point below, there’s a lot more to (efficient, reliable, scalable, predictable) SSDs than just tossing a bunch of SSDs into a system and throwing a workload at them..

I’ve never tried to proclaim I am a storage expert here(or anywhere) though I do feel I am pretty adept at 3PAR stuff at least, which wasn’t a half bad platform to land on early on in the grand scheme of things. I had no idea where it would take me over the years since. Anyway, enough about the past….

New to 3PAR

Still the focus of the majority of HP storage related action these days, they had a lot to talk about. All of this initial stuff isn’t there yet(up until the 7450 stuff below), just what they are planning for at some point in the future(no time frames on anything that I recall hearing).

Asynchronous Streaming Replication

Just a passive mention of this on a slide, nothing in depth to report about, but I believe the basic concept is instead of having asynchronous replication running on snapshots that kick off every few minutes (perhaps every five minutes) the replication process would run much more frequently (but not synchronous still), perhaps as frequent as every 30 seconds or something.

I’ve never used 3PAR replication myself. Never needed array based replication really. I have built my systems in ways that don’t require array based replication. In part because I believe it makes life easier(I don’t build them specifically to avoid array replication it’s merely a side effect), and of course the license costs associated with 3PAR replication are not trivial in many circumstances(especially if your only needing to replicate a small percentage of the data on the system). The main place where I could see leveraging array based replication is if I was replicating a large number of files, doing this at the block layer is often times far more efficient(and much faster) than trying to determine changed bits from a file system perspective.

I wrote/built a distributed file transfer architecture/system for another company a few years ago that involved many off the shelf components(highly customized) that was responsible for replicating several TB of data a day between WAN sites, it was an interesting project and proved to be far more reliable and scalable than I could of hoped for initially.

Increasing Maximum Limits

I think this is probably out of date, but it’s the most current info I could dig up on HP’s site. Though this dates back to 2010. These pending changes are all about massively increasing the various supported maximum limits of various things. They didn’t get into specifics. I think for most customers this won’t really matter since they don’t come close to the limits in any case(maybe someone from 3PAR will read this and send me more up to date info).

3PAR OS 2.3.1 supported limits

3PAR OS 2.3.1 supported limits(2010)

The PDF says updated May 2013, though the change log says last update is December. HP has put out a few revisions to the document(which is the Compatibility Matrix) which specifically address hardware/software compatibility, but the most recent Maximum Limits that I see are for what is now considered quite old – 2.3.1 release – this was before their migration to a 64-bit OS (3.1.1).

Compression / De-dupe

They didn’t talk about it, other than mention it on a slide, but this is the first time I’ve seen HP 3PAR publicly mention the terms. Specifically they mention in-line de-dupe for file and block, as well as compression support. Again, no details.

Personally I am far more interested in compression than I am de-dupe. De-dupe sounds great for very limited workloads like VDI(or backups, which StoreOnce has covered already). Compression sounds like a much more general benefit to improving utilization.

Myself I already get some level of “de duplication” by using snapshots. My main 3PAR array runs roughly 30 MySQL databases entirely from read-write snapshots, part of the reason for this is to reduce duplicate data, another part of the reason is to reduce the time it takes to produce that duplicate data for a database(fraction of a second as opposed to several hours to perform a full data copy).

File + Object services directly on 3PAR controllers

No details here other than just mentioning having native file/object services onto the existing block services. They did mention they believe this would fit well in the low end segment, they don’t believe it would work well at the high end since things can scale in different ways there. Obviously HP has file/object services in the IBRIX product (though HP did not get into specifics what technology would be used other than taking tech from several areas inside HP), and a 3PAR controller runs Linux after all, so it’s not too far fetched.

I recall several years ago back when Exanet went bust, I was trying to encourage 3PAR to buy their assets as I thought it would of been a good fit. Exanet folks mentioned to me that 3PAR engineering was very protective of their stuff and were very paranoid about running anything other than the core services on the controllers, it is sensitive real estate after all.  With more recent changes such as supporting the ability to run their reporting software(System Reporter) directly on the controller nodes I’m not sure if this is something engineering volunteered to do themselves or not. Both approaches have their strengths and weaknesses obviously.

Where are 3PAR’s SPC-2 results?

This is a question I asked them (again). 3PAR has never published SPC-2 results. They love to tout their SPC-1, but SPC-2 is not there……. I got a positive answer though: Stay tuned.  So I have to assume something is coming.. at some point. They aren’t outright disregarding the validity of the test.

In the past 3PAR systems have been somewhat bandwidth constrained due to their use of PCI-X. Though the latest generation of stuff (7xxx/10xxx) all leverage PCIe.

The 7450 tops out at 5.2 Gigabytes/second of throughput, a number which they say takes into account overhead of a distributed volume system (it otherwise might be advertised as 6.4 GB/sec as a 2-node system does 3.2GB/sec). Given they admit the overhead to a distributed system now, I wonder how, or if, that throws off their previous throughput metrics of their past arrays.

I have a slide here from a few years ago that shows a 8-controller T800 supporting up to 6.4GB/sec of throughput, and a T400 having 3.2GB/sec (both of these systems were released in Q3 of 2008). Obviously the newer 10400 and 10800 go higher(don’t recall off the top of my head how much higher).

This compares to published SPC-2 numbers from IBM XIV at more than 7GB/sec, as well as HP P9500/HDS VSP at just over 13GB/sec.

3PAR 7450

Announced more than a month ago now, the 7450 is of course the purpose built flash platform which is, at the moment all SSD.

Can it run with spinning rust?

One of the questions I had, was I noticed that the 7450 is currently only available in a SSD-only configuration. No spinning rust is supported. I asked why this was and the answer was pretty much what I expected. Basically they were getting a lot of flak for not having something that was purpose built. So at least in the short term, the decision not to support spinning rust is purely a marketing one. The hardware and software is the same(other than being more beefy in CPU & RAM) than the other 3PAR platforms. The software is identical as well. They just didn’t want to give people more excuses to label the 3PAR architecture as something that wasn’t fully flash ready.

It is unfortunate that the market has compelled HP to do this, as other workloads would still stand to gain a lot especially with the doubling up of data cache on the platform.

Still CPU constrained

One of the questions asked by someone was about whether or not the ASIC is the bottleneck in the 7450 I/O results. The answer was a resounding NO – the CPU is still the bottleneck even at max throughput. So I followed up with why did HP choose to go with 8 core CPUs instead of 10-core which Intel of course has had for some time. You know how I like more cores! The answer was two fold to this. The primary reason was cooling(the enclosure as is has two sockets, two ASICs, two PCIe slots, 24 SSDs, 64GB of cache and a pair of PSUs in 2U). The second answer was the system is technically Ivy-bridge capable but they didn’t want to wait around for those chips to launch before releasing the system.

They covered a bit about the competition being CPU limited as well especially with data services, and the amount of I/O per CPU cycle is much lower on competing systems vs 3PAR and the ASIC.  The argument is an interesting one though at the end of the day the easy way to address that problem is throw more CPUs at it, they are fairly cheap after all. The 7000-series is really dense so I can understand the lack of ability to support a pair of dual socket systems within a 2U enclosure along with everything else. The 10400/10800 are dual socket(though older generation of processors).

TANGENT TIME

I really have not much cared for Intel’s code names for their recent generation of chips. I don’t follow CPU stuff all that closely these days(haven’t for a while), but I have to say it’s mighty easy to confuse code name A from B, which is newer? I have to look it up. every. single. time.

I believe in the AMD world (AMD seems to have given up on the high end, sadly), while they have code names, they have numbers as well. I know 6200 is newer than 6100 ..6300 is newer than 6200..it’s pretty clear and obvious. I believe this goes back to Intel and them not being able to trademark the 486.

On the same note, I hate Intel continuing to re-use the code word i7 in laptops. I have an Core i7 laptop from 3 years ago, and guess what the top end today still seems to be? I think it’s i7 still. Confusing. again.
</ END TANGENT >

Effortless SSD management of each SSD with proactive alerts

I wanted to get this in before going deeper into the cache optimizations since that is a huge topic. But the basic gist of this is they have good monitoring of the wear of the SSDs in the platform(something I think that was available on Lefthand a year or two ago), in addition to that the service processor (dedicated on site appliance that monitors the array) will alert the customer when the SSD is 90% worn out. When the SSD gets to 95% then the system pro-actively fails the drive and migrates data off of it(I believe). They raised a statistic that was brought up at Discover that something along the lines of 95% of all deployed SSDs in 3PAR were still in the field – very few have worn out. I don’t recall anyone mentioning the # of SSDs that have been deployed on 3PAR but it’s not an insignificant number.

SSD Caching Improvements in 3PAR OS 3.1.2

There have been a number of non trivial caching optimizations in the 3PAR OS to maximize performance as well as life span of SSDs. Some of these optimizations also benefit spinning rust configurations as well – I have personally seen a noticeable drop in latency in back end disk response time since I upgraded to 3.1.2 back in May(it was originally released in December), along with I believe better response times under heavy load on the front end.

Bad version numbers

I really dislike 3PAR’s version numbering, they have their reasons for doing what they do, but I still think it is a really bad customer experience. For example going from 2.2.4 to 2.3.1 back in what was it 2009 or 2010. The version number implies minor update, but this was a MASSIVE upgrade.  Going from 2.3.x to 3.1.1 was a pretty major upgrade too (as the version implied).  3.1.1 to 3.1.2 was also a pretty major upgrade. On the same note the 3.1.2 MU2 (patch level!) upgrade that was released last month was also a major upgrade.

I’m hoping they can fix this in the future, I don’t think enough effort is made to communicate major vs minor releases. The version numbers too often imply minor upgrades when in fact they are major releases. For something as critical as a storage system I think this point is really important.

Adaptive Read Caching

Adaptive Read Caching

3PAR Adaptive Read Caching for SSD (the extra bits being read there from the back end are to support the T10 Data Integrity Feature- available standard on all Gen4 ASIC 3PAR systems, and a capability 3PAR believes is unique in the all flash space for them)

One of the things they covered with regards to caching with SSD is the read cache is really not as effective(vs with spinning rust), because the back end media is so fast, there is significantly less need for caching reads. So in general, significantly more cache is used with writes.

For spinning rust 3PAR reads a full 16kB of data from the back end disk regardless of the size of the read on the front end (e.g. 4kB). This is because the operation to go to disk is so expensive already and there is no added penalty to grab the other 12kB while your grabbing the 4kB you need. The next I/O request might request part of that 12kB and you can save yourself a second trip to the disk when doing this.

With flash things are different. Because the media is so fast, you are much more likely to become bandwidth constrained rather than IOPS constrained.  So if for example you have that 500,000 4k read IOPS on the front end, and your performing those same 16kB read IOPS on the back end, that is, well 4x more bandwidth that is required to perform those operations. Again because the flash is so fast, there is significantly less penalty to go back to SSD again and again to retrieve those smaller blocks. It also improves latency of the system.

So in short, read more from disks because you can and there is no penalty, read only what you need from SSDs because you should and there is (almost) no penalty.

Adaptive Write Caching

Adaptive Write Caching

Adaptive Write Caching

With writes the situation is similar to reads, to maximize SSD life span, and minimize latency you want to minimize the number of write operations to the SSD whenever possible.

With spinning rust again 3PAR works with 16kB pages, if a 4kB write comes in then the full 16kB is written to disk, again  because there is no additional penalty for writing the 16kB vs writing 4kB. Unlike SSDs your not likely bandwidth constrained when it comes to disks.

With SSDs, the optimizations they perform, again to maximize performance and reduce wear, is if a 4kB write comes in, a 16kB write occurs to the cache, but only the 4kB of changed data is committed to the back end.

If I recall right they mentioned this operation benefits RAID 1 (anything RAID 1 in 3PAR is RAID 10, same for RAID 5 – it’s RAID 50) significantly more than it benefits RAID 5/6, but it still benefits RAID 5/6.
 

Autonomic Cache offload

Autonomic Cache Offload

Autonomic Cache Offload

Here the system changes the frequency at which it flushes cache to back end media based on utilization. I think this plays a lot into the next optimization.

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Multi Tenant I/O Processing

3PAR has long been about multi tenancy of their systems. The architecture lends itself well to running in this mode though it wasn’t perfect, I believe for the most part the addition of Priority Optimization that was announced late last year and finally released last month fills the majority of the remainder of that hole. I have run “multi tenant” 3PAR systems since the beginning. Now to be totally honest the tenants were all me, just different competing workloads, whether it is disparate production workloads or a mixture of production and non production(and yes in all cases they ran on the same spindles). It wasn’t nearly as unpredictable as say a service provider with many clients running totally different things, that would sort of scare me on any platform. But there was still many times where rogue things (especially horrible SQL queries) overran the system (especially write cache). 3PAR handles it as well, if not better than anyone else but every system has it’s limits.

Front end operations

The caching flushing process to back end media is now multi threaded. This benefits both SSD as well as existing spinning rust configurations. Significantly less(no?) locking involved when flushing cache to disk.

Here is a graph from my main 3PAR array, you can see the obvious latency drop from the back end spindles once 3.1.2 was installed back in May (again the point of this change was not to impact back end disk latency as much as it was to improve front end latency, but there is a significant positive behavior change post upgrade):

Latency Change on back end spinning rust with 3.1.2
Latency Change on back end spinning rust with 3.1.2

There was a brief time when latency actually went UP on the back end disks. I was concerned at first but later determined this was the disk defragmentation processes running(again with improved algorithms), before the upgrade they took FAR too long, post upgrade they completed a big backlog in a few days and latency returned to low levels.

Multi Tenant Caching

Multi Tenant Caching

Back end operations

On the topic of multi tenant with SSDs an interesting point was raised which I had never heard of before. They even called it out as being a problem specific to SSDs, and does not exist with spinning rust. Basically the issue is if you have two workloads going to the same set of SSDs, one of them issuing large I/O requests(e.g. sequential workload), and the other issuing small I/O requests(e.g. 4kB random read), the smaller I/O requests will often get stuck behind the larger ones causing increases in latency to the app using smaller I/O requests.

To address this, the 128kB I/Os are divided up into four 32kB I/O requests and sent in parallel to the other workload. I suppose I can get clarification but I assume for a sequential read operation with 128kB I/O request there must not be any additional penalty for grabbing the 32kB, vs splitting it up even further into even more smaller I/Os.
 
 
 

Maintaining performance during media failures

3PAR has always done wide striping, and sub disk distributed RAID so the rebuild times are faster, the latency is lower and all around things run better(no idle hot spares) that way vs the legacy designs of the competition. The system again takes additional steps now to maximize SSD life span by optimizing the data reads and writes under a failure condition.

HP points out that SSDs are poor at large sequential writes, so as mentioned above they divide the 128kB writes that would be issued during a rebuild operation (since that is largely a sequential operation) into 32kB I/Os again to protect those smaller I/Os from getting stuck behind big I/Os.

They also mentioned that during one of the SPC-1 tests (not sure if it was 7400 or 7450) one of the SSDs failed and the system rebuilt itself. They said there was no significant performance hit(as one might expect given experience with the system) as the test ran. I’m sure there was SOME kind of hit especially if you drive the system to 100% of capacity and suffer a failure. But they were pleased with the results regardless. The competition would be lucky to have something similar.

What 3PAR is not doing

When it comes to SSDs and caching something 3PAR is not doing, is leveraging SSDs to optimize back end I/Os to other media as sequential operations. Some storage startups are doing this to gain further performance out of spinning rust while retaining high random performance using SSD. 3PAR doesn’t do this and I haven’t heard of any plans to go this route.

Conclusion

I continue to be quite excited about the future of 3PAR, even more so pre acquisition. HP has been able to execute wonderfully on the technology side of things. Sales from all accounts at least on the 7000 series are still quite brisk. Time will tell if things hold up after EVA is completely off the map, but I think they are doing many of the right things. I know even more of course but can’t talk about it here(yet)!!!

That’s it for tonight, at ~4,000 (that number keeps going up, I should goto bed) words this took three hours or more to write+proof read, it’s also past 2AM. There is more to cover, the 3PAR stuff was obviously what I was most interested in. I have a few notes from the other sessions but they will pale in comparison to this.

Today I had a pretty good idea on how HP could improve it’s messaging around whether to choose 3PAR or StoreVirtual for a particular workload. The messaging to-date to me has been very confusing and conflicting (HP tried to drive home a point about single platforms and reducing complexity, something this dual message seems to conflict with).  I have been communicating with HP off and on for the past few months, and today out of the blue I  came up with this idea which I think will help clear the air. I’ll touch on this soon when I cover the other areas that were talked about today.

Tomorrow seems to be a busy day, apparently we have front row seats, and the only folks with power feeds. I won’t be “live blogging”(as some folks tend to love to do), I’ll leave that to others. I work better at spending some time to gather thoughts and writing something significantly longer.

If you are new to this site you may want to check out a couple of these other articles I have written about 3PAR(among the dozens…)

Thanks for reading!

July 29, 2013

HP Storage Tech Day Live

Filed under: Storage — Tags: , , — Nate @ 12:38 am

In about seven and a half hours the HP tech day event will be starting, I thought it was going to be a private event but it looks like they will be broadcasting it via one of the folks here is adept at that sort of thing.

If your interested, the info is here. It starts at 8AM Pacific. Fortunately it’s literally downstairs from my hotel room.

Topics include

  • 3par deep dive (~3 hrs worth)
  • StoreVirtual  (Lefthand VSA), StoreOnce VSA, and Open stack integration
  • StoreAll (IBRIX), express query(Autonomy)

No Vertica stuff.. Vertica doesn’t get any storage people excited since it is so fast and reduces I/O by so much.. so you don’t need really fancy storage stuff to make it fly.

HP asked that I put this notice on these posts so the FCC doesn’t come after us..

Travel to HP Storage Tech Day/Nth Generation Symposium was paid for by HP; however, no monetary compensation is expected nor received for the content that is written in this blog.

(Many folks have accused me of being compensated by 3PAR and/or HP in the past based on what I have written here but I never have been – by no means do I love HP as a whole there are really only a few product lines that have me interested which is 3PAR, Proliant, and Vertica – ironically enough each of those came to HP via acquisition). I have some interest in StoreOnce though have yet to use it. (Rust in Peace WebOS — I think I will be on Android before the end of the year – more on that later..)

I’m pretty excited about tomorrow (well today given that it’s almost 1AM), though getting up so early is going to be a challenge!

Apparently I’m the only person in the group here that is not on twitter. I don’t see that changing anytime soon. Twitter and Facebook are like the latest generation of Star Trek movies, they basically don’t exist to me.

The one thing that I am sort of curious about is what, if any plans HP has for the P9500 range, they don’t talk about it much.. I’m sure they won’t come out and say they are retiring it any time soon, since it’s still fairly popular with select customers. I just want to try to get them to say something about it, I am curious.

(this is my first trip to a vendor-supported event that included travel)

April 10, 2013

HP Project Moonshot micro servers

Filed under: Datacenter — Tags: , , , , — Nate @ 12:11 am

HP made a little bit of headlines recently when they officially unveiled their first set of ultra dense micro servers, under the product name Moonshot. Originally speculated as likely being an ARM-platform, it seems HP has surprised many in making this first round of products Intel Atom based.

Picture of HP Moonshot chassis with 45 servers

They are calling it the world’s first software defined server. Ugh. I can’t tell you how sick I feel whenever I hear the term software defined <insert anything here>.

In any case I think AMD might take issue with that, with their SeaMicro unit which they acquired a while back. I was talking with them as far back as 2009 I believe and they had their high density 10U virtualized Intel Atom-based platform(I have never used Seamicro though knew a couple folks that worked there). Complete with integrated switching, load balancing and virtualized storage(the latter two HP is lacking).

Unlike legacy servers, in which a disk is unalterably bound to a CPU, the SeaMicro storage architecture is far more flexible, allowing for much more efficient disk use. Any disk can mount any CPU; in fact, SeaMicro allows disks to be carved into slices called virtual disks. A virtual disk can be as large as a physical disk or it can be a slice of a physical disk. A single physical disk can be partitioned into multiple virtual disks, and each virtual disk can be allocated to a different CPU. Conversely, a single virtual disk can be shared across multiple CPUs in read-only mode, providing a large shared data cache. Sharing of a virtual disk enables users to store or update common data, such as operating systems, application software, and data cache, once for an entire system

Really the technology that SeaMicro has puts the Moonshot Atom systems to shame. SeaMicro has the advantage that this is their 2nd or 3rd (or perhaps more) generation product. Moonshot is on it’s first gen.

Picture of Seamicro chassis with 256 servers

Moonshot provides 45 hot pluggable single socket dual core Atom processors, each with 8GB of memory and a single local disk in a 4.5U package.

SeaMicro provides up to 256 sockets of dual core Atom processors, each with 4GB of memory and virtualized storage. Or you can opt for up to 64 sockets of either quad core Intel Xeon or eight core AMD Opteron, with up to 64GB/system (32GB max for Xeon). All of this in a 10U package.

Let’s expand a bit more – Moonshot can get 450 servers(900 cores) and 3.6TB of memory in a 47U rack. SeaMicro can get 1,024 servers (2,048 cores) and 4TB of memory in a 47U rack. If that is not enough memory you could switch to Xeon or Opteron with similar power profile, at the high end 2,048 Opteron(AMD uses a custom Opteron 4300 chip in the Seamicro system – a chip not available for any other purpose) cores with 16TB of memory.  Or maybe you mix/match .. There is also fewer systems to manage – HP having 10 systems, and Sea Micro having 4 per rack. I harped on HP’s SL-series a while back for similar reasons.

Seamicro also has dedicated external storage which I believe extends upon the virtualization layer within the chassis but am not certain.

All in all it appears Seamicro has been years ahead of Moonshot before Moonshot ever hit the market. Maybe HP should of scrapped Moonshot and taken out Seamicro when they had the chance.

At the end of the day I don’t see anything to get excited about with Moonshot – unless perhaps it’s really cheap (relative to Seamicro anyway). The micro server concept is somewhat risky in my opinion. I mean if you really got your workload nailed down to something specific and you can fit it into one of these designs then great. Obviously the flexibility of such micro servers is very limited. Seamicro of course wins here too, given that an 8 core Opteron with 64GB of memory is quite flexible compared to the tiny Atom with tiny memory.

I have seen time and time again people get excited about this and say oh how they can get so many more servers per watt vs the higher end chips. Most of the time they forget to realize how few workloads are CPU bound, and simply slapping a hypervisor on top of a system with a boatload of memory can get you significantly more servers per watt than a micro server could hope to achieve. HOWEVER, if your workload can effectively exploit the micro servers, drive utilization up etc, then it can be a real good solution — in my experience those sorts of workloads are the exception rather than the rule, I’ll put it that way.

It seems that HP is still evaluating whether or not to deploy ARM processors in Moonshot – in the end I think they will – but won’t have a lot of success – the market is too niche. You really have to go to fairly extreme lengths to have a true need for something specialized like ARM. The complexities in software compatibility are not trivial.

I think HP will not have an easy time competing in this space. The hyper scale folks like Rackspace, Facebook, Google, Microsoft etc all seem to be doing their own thing, and are unlikely to purchase much from HP. At the same time there of course is Seamicro, amongst other competitors (Dell DCS etc) who are making similar systems. I really don’t see anything that makes Moonshot stand out, at least not at this point. Maybe I am missing something.

February 25, 2013

WebOS sold to LG

Filed under: News — Tags: , — Nate @ 12:04 pm

Hey folks – long time no see. Sorry it has been so long since I wrote anything, I don’t know why I haven’t. I have been a fair bit busier. I do have a ~5,000 word post that I had been writing back in December about a Red Hat conference I was at, hopefully I can finish that soon and post it.

Anyway for now I just came across news that WebOS has been sold to LG to use as an operating system for Smart TVs. Several months ago there were rumours about LG working with HP for WebOS-powered TVs.

This is hot on the heels of HP releasing a low end Android tablet (well, going to release).

So for me this is just another sad story for the slow death of WebOS. Unlike most of the other die hard fans my Palm Pre was the first (and only) Palm branded device I ever owned. I had owned a couple different Handspring Visors way back when, and there was a couple different apps on them that I liked, so I looked to WebOS when a Palm emulator was made available for WebOS (later this emulator was killed, and hasn’t been functional in years now). Of course I don’t trust Google, never been much of a fan of Microsoft, and RIM at the time (2009) still seemed very business oriented.  The Pre was also my first smart phone. I did have a blackberry at one company though I really only used it for email (not even phone calls). It was awesome for email with a sort of monochrome-color display, the battery lasted forever.

Where to now ? I’m not sure, I certainly have enough WebOS devices to last for some time, I have very meagre requirements, just need to sync to exchange email, and would like a decent web browser, being able to watch video is nice too for travel. I really have spent no time on any other ecosystem so I don’t know what all is out there(ignorance is bliss).

If there is one thing HP did get right with this new Android tablet is conceding the high end market, which they really don’t have a chance to compete in. Most folks, myself included don’t believe HP will be able to compete with this new Android tablet either, but who knows.

It’s been roughly a year since my HP Pre3 and Touchpad got any software updates, the Pre3 still works fine, I suppose the only 3rd party app on it that I ever use is a weather app. Touchpad working fine too, there too there really aren’t any 3rd party apps that I have used in a long time (3rd party apps I do use are WebOS/Mobilenations news, Accuweather, and Kalemsoft video).

WebOS was never at a point where I was ready to go try to tell my friends and family they just had to go buy it. I saw the vision and believed they could do it — if only they had enough resources(they never did). Once HP bought them I expected significant investment but that didn’t happen either they continued on a rapid release cycle which led to over promising and massive under delivery.

I had hope that things could turn around more after HP somewhat recently announced they need to do something with mobile. But I guess not. Windows mobile continues to disappoint, and most Android manufacturers are in the toilet as well (even Google’s own Motorola).  BB10 is still new, so too early to tell if they can turn it around.

I found it interesting how seemingly neglected the Android market is compared to iOS. I bought my sister a TrendNET HD IP-camera to act as a “baby cam” of sorts she just had a kid. This camera had both IOS and Android apps. Her boyfriend has an iPhone she has some sort of Android device.  While both apps could connect, only the IOS app could remotely control the camera, the Android app just saw the streaming video. I thought, how hard is it to add the remote control functionality ? For some reason it wasn’t there.  Of course for WebOS there is no app 🙂

I’ll keep using my WebOS devices for now, I don’t have any pressing need to make a change.

Oh how I wish HP would of kept that $10B they spent on Autonomy and invested it in mobile instead.

Also in recent news, Mozilla has announced many carriers have picked up their new OS – based on Firefox.

Mozilla is unlocking the Web as a mobile development platform with Firefox Marketplace and unwrapping mobile apps to enable more opportunity and control for developers and consumers

Where have I heard that one before…

HP did give Palm more time though, so I thank them(again) for that, just not enough time.

December 4, 2012

3PAR: The Next Generation

Filed under: Storage — Tags: , — Nate @ 12:40 am

(Cue Star Trek: The Next Generation theme music)

[Side note: I think this is one of my most popular post ever with nearly 3,000 hits to it so far (excluding my own IPs). Thanks for reading!]

[I get the feeling I will get lots of people linking to this since I suspect what is below will be the most complete guide as to what was released – for those of you that haven’t been here before I am in no way associated with HP or 3PAR – or compensated by them in any way of course! Just been using it for a long time and it’s one of the very few technologies that I am passionate about – I have written a ton about 3PAR over past three years]

HP felt their new storage announcements were so ground breaking that they decided to have a special event a day before HP Discover is supposed to start. They say it’s the biggest announcement for storage from HP in more than a decade.

I first got wind of what was coming last Fall, though there wasn’t much information available at the time other than a picture and some thoughts as to what might happen. Stuff wasn’t nailed down yet. I was fortunate enough to finally visit 3PAR HQ a couple of months ago and get a much more in depth briefing as to what was coming, and I’ll tell you what it’s been damn hard to contain my excitement.

HP announced a 75% year over year increase in 3PAR sales, along with more than 1,200 new customers in 2012 alone. Along with that HP said that their StoreOnce growth is 45% year over year.

By contrast HP did not reveal any growth numbers for either their Lefthand  StoreVirtual platform nor their IBRIX StoreAll platforms.

David Scott, former CEO of 3PAR tried to set the tone as a general storage product launch, they have enhancements to primary storage, to file/object scale-out storage as well as backup/archive storage.

You know I’m biased, I don’t try to hide that. But it was obvious to me at the end of the presentation this announcement was all about one thing: David’s baby – 3PAR.

Based on the web site, I believe the T-class of 3PAR systems is finally retired now. Replaced last year by the V-Class (aka P10000 or 10400 and 10800)

Biggest changes to 3PAR in at least six years

The products that are coming out today are in my opinion, the largest set of product (AND policy) enhancements/changes/etc from 3PAR in at least the past six years that I’ve been a customer.

First – a blast from the past.

The first mid range 3PAR system – the E200

Hello 2006!

There is some re-hashing of old concepts, specifically the concept of mid range. 3PAR introduced their first mid range system back in 2006, which was the system I was able to deploy – the E200. The E200 was a dual node system that went up to 4GB data cache per controller and up to 128 drives or 96TB of usable capacity whichever came first. It was powered by the same software and same second generation ASIC (code named Eagle if I remember right) that was in the high end S-class at the time.

The E200 was replaced by the F200, and the product line extended to include the first quad controller mid range system the F400 in 2009. The F-class, along with the T-class (which replaced the S-class) had the third generation ASIC in it (code named Osprey if I remember right?? maybe I have those reversed). The V-class which was released last year, along with what came out today has the 4th generation ASIC (code named Harrier).

To-date – as far as I know the F400 is still the most efficient SPC-1 result out there, with greater than 99% storage utilization – no other platforms (3PAR included) before or since have come close.

These systems, while coined mid range in the 3PAR world were still fairly costly. The main reason behind this was the 3PAR architecture itself. It is a high end architecture. Where other vendors like EMC and HDS chose radically different designs for their high end vs. their mid range, 3PAR aimed a shrink ray at their system and kept the design the same. NetApp on the other hand was an exception – they too have a single architecture that scales from the bottom on up. Though as you might expect – NetApp and 3PAR architectures aren’t remotely comparable.

Here is a diagram of the V-series controller architecture, which is very similar to the 7200 and 7400, just at a much larger scale:

3PAR V-Series ASIC/CPU/PCI/Memory Architecture

Here is a diagram of the inter-node communications on an 8-node P10800, or T800 before it, again similar to the new 7000-series just larger scale:

3PAR Cluster Architecture with low cost high speed passive backplane with point to point connections totalling 96 Gigabytes/second of throughput

Another reason for the higher costs was the capacity based licensing (& associated support). Some things were licensed per controller pair, some things based on raw capacity, some things licensed per system, etc. 3PAR licensing was not very friendly to the newbie.

Renamed Products

There was some basic name changes for 3PAR product lines:

  • The HP 3PAR InServ is now the HP 3PAR StorServ
  • The HP 3PAR V800 is now the HP 3PAR 10800
  • The HP 3PAR V400 is now the HP 3PAR 10400

The 3PAR 7000-series – mid range done right

The 3PAR 7000-series leverages all of the same tier one technology that is in the high end platform and puts it in a very affordable package, starting at roughly $25,000 for a two-node 7200 system, and $32,000 for an entry level two-node 7400 system(which can later be expanded to four nodes, non disruptively).

I’ve seen the base 7200 model (2 controllers, no disks, 3 year 24×7 4-hour on site support “parts only”) online for as low as $10,000.

HP says this puts 3PAR in a new $11 Billion market that it was previously unable to compete.

This represents roughly a 55-65% discount over the previous F-class mid range 3PAR solution. More on this later.

Note that it is not possible to upgrade in place a 7200 to a 7400. So you still have to be sure if you want a 4-node capable system to choose the 7400 up front (you can, of course purchase a two-node 7400 and add the other two nodes later).

Dual vs quad controller

The controller configurations are different between the two and the 7400 has extra cluster cross connects to unify the cluster across enclosures. The 7400 is the first 3PAR system that is not leveraging a passive backplane for all inter-node communications. I don’t know what technology 3PAR is using to provide this interconnect over a physical cable – it may be entirely proprietary. They use their own custom light weight protocols on the connection, so from a software standpoint it is their own stuff. Hardware – I don’t have that information yet.

A unique and key selling point for having a 4-node 3PAR system is persistent cache, which keeps the cache in write back mode during planned or unplanned controller maintenance.

3PAR Persistent Cache mirrors cache from a degraded controller pair to another pair in the cluster automatically.

The 3PAR 7000 series is based on what I believe is the Xyratex OneStor SP-2224 enclosure, the same one IBM uses for their V7000 StorWize system (again, speculation). Speaking of the V7000 I learned tonight that this IBM system implemented RAID 5 in software resulting in terrible performance. 3PAR RAID 5 is well – you really can’t get any faster than 3PAR RAID, that’s another topic though.

3PAR 7000 Series StorServs

3PAR 7000 Series StorServs

3PAR has managed to keep it’s yellow color, and not go to the HP beige/grey. Somewhat surprising though I’m told it’s because it helps the systems stand out in the data center.

The 7000 series comes in two flavors – a two node 7200, and a two or four node 7400. Both will be available starting December 14.

2.5″ or 3.5″ (or both)

There is also a 3.5″ drive enclosure for large capacity SAS (up to 3TB today). There are also 3.5″ SSDs but their capacities are unchanged from the 2.5″ variety – I suspect they are just 2.5″ drives in a caddy. This is based, I believe on the Xyratex OneStor SP-2424.

Xyratex OneStor SP-2424

This is a 4U, 24-drive enclosure for disks only(controllers go in the 2U chassis). 3PAR kept their system flexible by continuing to allow customers to use large capacity disks, however do keep in mind that for the best availability you do need to maintain at least two (RAID 10),  three (RAID 5), or six (RAID 6)  drive enclosures. You can forgo cage level availability if you want, but I wouldn’t recommend it – that provides an extra layer of protection from hardware faults, at basically no cost of complexity on the software side (no manual layouts of volumes etc).

HP has never supported the high density 3.5″ disk chassis on the mid range systems I believe primarily for cost, as they are custom designed. By contrast the high end systems only support the high density enclosures at this time.

3PAR High Density 3.5" Disk Chassis - not available on mid range systems

The high end chassis is designed for high availability. The disks are not directly accessible with this design. In order to replace disks the typical process is to run a software task on the array which then migrates all of the data from the disks in that particular drive sled (pack of four drives), to other disks on the system(any disks of the same RPM), once the drive sled is evacuated it can be safely removed. Another method is you can just pull the sled, the system will go into logging mode for writes for those disks(sending the writes elsewhere), and you have roughly seven minutes to do what you need to do and re-insert the sled before the system marks those drives as failed and begins the rebuild process.

The one thing that HP does not allow on SP-2424-based 3.5″ drive chassis is high performance (10 or 15K RPM) drives. So you will not be able to build a 7000-series with the same 650GB 15k RPM drives that are available on the high end 10000-series. However they do have a nice 900GB 10k RPM option in a 2.5″ form factor which I think is a good compromise.  Or you could go with a 300GB 15k RPM 2.5″. I don’t think there is a technical reason behind this, so I imagine if enough customers really want this sort of setup and yell about it, then HP will cave and start supporting it. Probably won’t be enough demand though.

Basic array specifications

Array
Model
Max
Cont.
Nodes
Max
Raw
Capacity
Max
Drives
Max
Ports
Max
Data
Cache
72002250TB144Up to 12x8Gbps FC OR
4x8Gbps FC AND 4x10Gbps iSCSI
24GB
74004864TB480Up to 24x8Gbps FC OR
8x8Gbps FC AND 8x10Gbps iSCSI
64GB
104004800TB960Up to 96x8Gbps FC ports
Up to 16x10Gbps iSCSI
128GB
1080081600TB1920Up to 192x8Gbps FC ports
Up to 32x10Gbps iSCSI
512GB

(Note: All current 3PAR arrays have dedicated gigabit network ports on each controller for IP-based replication)

In a nut shell, vs the F-class mid range systems, the new 7000-series:

  • Doubles the data cache per controller to 12GB compared to F200, almost triple if you compare the 7400 to the F200/F400)
  • Doubles the control cache per controller to 8GB, The control cache is dedicated memory for the operating system completely isolated from the data cache.
  • Brings PCI-Express support to the 3PAR mid range allowing for 8Gbps Fibre Channel and 10Gbps iSCSI
  • Brings the mid range up to spec with the latest 4th generation ASIC, and latest Intel processor technology.
  • Nearly triples the raw capacity
  • Moves from an entirely Fibre channel based system to a SAS back end with a Fibre front end
  • Moves from exclusively 3.5″ drives to primarily 2.5″ drives with a couple 3.5″ drive options
  • Brings FC0E support to the 3PAR mid range (in 2013) for the four customers who use FCoE.
  • Cuts the size of the controllers by more than half
  • Obviously dramatically increases the I/O and throughput of the system with the new ASIC with PCIe, faster CPU cores, more CPU cores(in 7400)  and the extra cache.

Where’s the Control Cache?

Control cache is basically dedicated memory associated with the Intel processors to run the Debian Linux operating system which is the base for 3PAR’s own software layer.

HP apparently has removed all references to the control cache in the specifications, I don’t understand why. I verified with 3PAR last night that there was no re-design in that department, the separated control cache still exists, and as previously mentioned is 8GB on the 7000-series. It’s important to note that some other storage platforms share the same memory for both data and control cache and they give you a single number for how much cache there is – when in reality the data cache can be quite a bit less.

Differences between the 7200 and 7400 series controllers

Unlike previous generations of 3PAR systems, where all controllers for a given class of system were identical, the new controllers for the 104800 vs 10800, as well as the 7200 vs 7400 are fairly different.

  • 7200 has quad core 1.8Ghz CPUs, 7400 has hex core 1.8Ghz CPUs.
  • 7200 has 12GB cache/controller, 7400 has 16GB/controller.
  • 7200 supports 144 disks/controller pair, 7400 is 240 disks.
  • Along that same note 7200 supports 5 disk enclosures/pair, 7400 supports nine.
  • 7400 has extra cluster interconnects to link two enclosures together forming a mesh active cluster.

iSCSI No longer a second class citizen

3PAR has really only sort of half heartily embraced iSCSI over the years, their customer base was solidly fibre channel. When you talk to them of course they’ll say yes they do iSCSI as well as anyone else but the truth is they didn’t. They didn’t because the iSCSI HBA that they used was the 4000 series from Qlogic. The most critical failing of this part is it’s pathetic throughput. Even though it has 2x1Gbps ports, the card itself is only capable of 1Gbps of throughput. So you look at your 3PAR array and make a decision:

  • I can install a 4x4Gbps Fibre channel card and push the PCI-X bus to the limit
  • I can install a 2x1Gbps iSCSI card and hobble along with less capacity than a single fibre channel connection

I really don’t understand why they did not go back and re-visit alternative iSCSI HBA suppliers since they kept the same HBA for a whole six years. I would of liked to have seen at least a quad port 1Gbps card that could do 4Gbps of throughput. I hammered on them for years it just wasn’t a priority.

But no more! I don’t know what card they are using now, but it is PCIe and it is 10Gbps! Of course the same applies to the 10000-series – I’d assume they are using the same HBA in both but I am not certain.

Lower cost across the board for the SME

For me these details are just as much, if not more exciting than the new hardware itself. These are the sorts of details people don’t learn about until you actually get into the process of evaluating or purchasing a system.

Traditionally 3PAR has all been about margin – at one point I believe they were known to have the highest margins in the industry (pre acquisition). I don’t know where that point stands today, but from an up front standpoint they were not a cheap platform to use. I’ve always gotten a ton of value out of the platform, making the cost from my standpoint trivial to justify. But to less experienced management out there they often see cost per TB or cost per drive or support costs or whatever, compared to other platforms at a high level they often cost more. How much value you derive from those costs can very greatly.

Now it’s obvious that HP is shifting 3PAR’s strategy from something that is entirely margin focused to most likely lower margins but orders of magnitude more volume to make up for it.

I do not know if any of these apply to anything other than the 7000-series, for now assume they do not.

Thin licensing included in base software

Winning the no brainer of the year award in the storage category HP is throwing in all thin licensing as part of the  array with the base license. Prior to this there were separate charges to license thin functionality based on how much written storage was used for thin provisioning. You could license only 10TB on a 100TB array if you want, but you lose the ability to provision new thin provisioned volumes if you exceed that license (I believe there is no impact on existing volumes, but the system will pester you on a daily basis that you are in violation of the license). This approach often caught customers off guard during upgrades – they sometimes thought they only needed to buy disks – but they needed software licenses for those disks, as well as support for those software licenses.

HP finally realized that thin provisioning is the norm rather than the exception. HP is borrowing a page from the Dell Compellent handbook here.

Software License costs capped

Traditionally, most of 3PAR’s software features are based upon some measure of capacity of the system, in most cases it is raw capacity, for thin provisioning it is a more arbitrary value.

HP is once again following the Dell Compellent handbook which caps license costs at a set value(in Dell’s case I believe it is 96 spindles). For the 3PAR 7000-series the software license caps are:

  • 7200: 48 drives (33% of array capacity)
  • 7400: 168 drives (35% of array capacity)

Easy setup with Smart Start

Leveraging technology from the EVA line of arrays, HP has radically simplified the installation process of a 7000-series array, so much so that the customer can now perform the installation on their own without professional services. This is huge for this market segment. The up front professional services to install a mid range F200 storage system had a list price of $10,000 (as of last year anyway).

User serviceable components

Again for the first time in 3PAR’s history a customer will be allowed to replace their own components (disks at least, I assume controllers as well though). This again is huge – it will slash the entry level pricing for support for organizations that have local support staff available.

The 7000-series comes by default with a 24x7x365 4-hour on site support (parts only). I believe software support and higher end on site services are available for an additional charge.

All SSD 7000 series

Like the 10000-series, the 7000-series can run on 100% SSDs, a configuration that for some reason was not possible on the previous F-series of midrange systems (also I think T-class could not as well).

HP claims that with a maximum configuration, a 4-node 7400 maxed out with 240 x 100 or 200GB SSDs the system can achieve 320,000 IOPS, a number which HP claims is a 2.4x performance advantage to their closest priced competitor. This number is based on a 100% random read test with 8kB block sizes @ 1.6 milliseconds of latency. SPC-1 numbers are coming – I’d guesstimate that SPC-1 for the 7400 will be in the ~110,000 IOPS range since it’s roughly 1/4th the power of a 10800 (half the nodes, and each node has half the ASICs & CPUs and far less data cache).

HP is also announcing their intention to develop a purpose built all-SSD solution based on 3PAR technology.

Other software announcements

Most of them from here.

Priority Optimization

For a long time 3PAR has touted it’s ability to handle many workloads of different types simultaneously, providing multiple levels of QoS on a single array. This was true, to a point.

3PAR: Mixed quality of service in the same array

While it is true that you can provide different levels of QoS on the same system, 3PAR customers such as myself realized years ago that it could be better. A workload has the potential to blow out the caches on the controllers (my biggest performance headache with 3PAR – it doesn’t happen often, all things considered I’d say it’s probably a minor issue compared to competing platforms but for me it’s a pain!). This is even more risky in a larger service provider environment where the operator has no idea what kind of workloads the customers will be running. Sure you can do funky things like carve the system up so less of it is impacted when that sort of event happens but there are trade offs there as well.

Priority Optimization

The 3PAR world is changing – with Priority Optimization – a feature that essentially beta at this point, allows the operator to set thresholds both on an IOPS as well as bandwidth perspective. The system reacts basically in real time. Now on a 3PAR platform you can guarantee a certain level of performance to a workload. Whereas in the past, there was a lot more hope involved.  Correct me if I’m wrong but I thought this sort of QoS was exactly the sort of thing that Oracle Pillar used to tout. I’m not sure if they had knobs like this, but I do recall them touting QoS a lot.

Priority Optimization will be available sometime in 2013 – I’d imagine it’d be early 2013 but not sure.

Autonomic Replication

As I’ve said before – I’ve never used 3PAR replication – never needed it. I’ve tended to build things so that data is replicated via other means, and low level volume-based replication is just overkill – not to mention the software licensing costs.

3PAR Synchronous long distance replication: unique in the mid range

But many others I’m sure do use it, and this industry first as HP called it is pretty neat. Once you have your arrays connected, and your replication policies defined, when you create a new volume on the source array, all details revolving around replication are automatically configured to protect that volume according to the policy that is defined. 3PAR replication was already a breeze to configure, this just made it that much easier.

Autonomic Rebalance

3PAR has long had the ability to re-stripe data across all spindles when new disks were added, however this was always somewhat of a manual process, and it could take a not insignificant amount of time because your basically reading and re-writing every bit of data on the system. It was a very brute force approach. On top of that you had to have a software license for Dynamic Optimization in order to use it.

Autonomic rebalance is now included in the base software license and will automatically re-balance the system when resources change, new disks, new controllers etc. It will try, whenever possible, to move the least amount of data – so the brute force approach is gone, the system has the ability to be more intelligent about re-laying out data.

I believe this approach also came from the EVA storage platform.

Persistent Ports

This is a really cool feature as well – it gives the ability to provide redundant connectivity to multiple controllers on a 3PAR array without having to have host-based multipathing software. How is this possible? Basically it is NPIV for the array. Peer controllers can assume the world wide names for the ports on their partner controller. If a controller goes down, it’s peer assumes the identities of that controller’s ports, instantaneously providing connectivity for hosts that were (not directly) connected to the ports on the downed controller. This eliminates pauses for MPIO software to detect faults and fail over, and generally makes life a better place.

HP claims that some other tier 1 vendors can provide this functionality for software changes, but they do not today, provide it for hardware changes. 3PAR provides this technology for both hardware and software changes – on all of their currently shipping systems!

Peer Persistence

This is basically a pair of 3PAR arrays acting as a transparent fail over cluster for local or metro distances. From the PDF

The Peer Persistence software achieves this key enhancement by taking advantage of the Asymmetric Logical Unit Access (ALUA) capability that allows paths to a SCSI device to be marked as having different characteristics.

Peer persistence also allows for active-active to maximize available storage I/O under normal conditions.

Initially Peer Persistence is available for VMware, other platforms to follow.

3PAR Peer Persistence

Virtualized Service Processor

All 3PAR systems have come with a dedicated server known as the Service Processor, this acts as a proxy of sorts between the array and 3PAR support. It is used for alerting as well as remote administration. The hardware configuration of this server was quite inflexible and it made it needlessly complex to deploy in some scenarios (mainly due to having only a single network port).

The service processor was also rated to consume a mind boggling 300W of power (it may of been a legacy typo but that’s the number that was given in the specs).

The Service processor can now be deployed as a virtual machine!

Web Services API

3PAR has long had a CIM API (never really knew what that was to be honest), and it had a very easy-to-use CLI as well (used that tons!), but now they’ll have a RESTful Web Services API that uses JSON (ugh, I hate JSON as you might recall! If it’s not friends with grep or sed it’s not friends with me!). Fortunately for people like me we can keep using the CLI.

This API is, of course, designed to be integrated with other provisioning systems, whether it’s something off the shelf like OpenStack, or custom stuff organizations write on their own.

Additional levels of RAID 6

3PAR first introduced RAID 6 (aka RAID DP) with the aforementioned last major software release three years ago, with that version there were two options for RAID 6:

  • 6+2
  • 14+2

The new software adds several more options:

  • 4+2
  • 8+2
  • 10+2

Thick Conversion

I’m sure many customers have wanted this over the years as well. The new software will allow you to convert a thin volume to a thick (fat) volume. The main purpose of this of course is to save on licensing for thin provisioning when you have a volume that is fully provisioned (along with the likelihood of space reclamation on that volume being low as well). I know I could of used this years ago.. I always shook my fist at 3PAR when they made it easy to convert to thin, but really impossible to convert back to thick (without service disruption anyway). Basically all that is needed is to flip a bit in the OS (I’m sure the nitty gritty is more complicated).

Online Import

This basically allows EVA customers to migrate to 3PAR storage without disruption (in most cases).

System Tuner now included by default

The System Tuner package is now included in the base operating system (at least on 7000-series). System Tuner is a pretty neat little tool written many years ago that can look at a 3PAR system in real time, and based on thresholds that you define recommend dynamic movement of data around the system to optimize the data layout. From what I recall it was written in response to a particular big customer request to prove that they could do such data movement.

3PAR System Tuner moves chunklets around in real time

It is important to note that this tool is an on demand tool, when running it gathers tens of thousands of additional performance statistics from the chunklets on the system. It’s not something that can(or should be) run all the time. You need to run it when the workload you want to analyse is running in order to see if further chunklet optimization would benefit you.

System Tuner will maintain all existing availability policies automatically.

In the vast majority of cases the use of this tool is not required. In fact in my experience going back six years I’ve used it on a few different occasions, and in all cases it didn’t provide any benefit. The system generally does a very good job of distributing resources. But if your data access patterns change significantly, System Tuner may be for you – and now it’s included!

3PAR File Services

This announcement was terribly confusing to me at first. But I got some clarification. The file services module is based on the HP StoreEasy 3830 storage gateway.

  • Hardware platform is a DL380p Gen8 rack server attached to the 3PAR via Fibre Channel
  • Software platform is Microsoft Windows Storage Server 2012 Standard Edition
  • Provides NFS, CIFS for files and iSCSI for block
  • SMB 3.0 supported (I guess that is new, I don’t use CIFS much)
  • NFS 4.1 supported (I’ll stick to NFSv3, thanks – I assume that is supported as well)
  • Volumes up to 16TB in size
  • Integrated de-duplication (2:1 – 20:1)
  • VSS Integration – I believe that means no file system-based snapshots (e.g. transparent access of the snapshot from within the same volume) ?
  • Uses Microsoft clustering for optional HA
  • Other “Windowsey” things

The confusion comes from them putting this device under the 3PAR brand. It doesn’t take a rocket scientist to look at the spec sheets and see there are no Ethernet ports on the arrays for file serving. I’d be curious to find out the cost of this file services add-on myself, and what it’s user interface is like. I don’t believe there is any special integration between this file services module and 3PAR – it’s just a generic gateway appliance.

For someone with primarily a Linux background I have to admit I wouldn’t feel comfortable relying on a Microsoft implementation of NFS for my Linux boxes (by the same token I feel the same way about using Samba for serious Windows work – these days I wouldn’t consider it – I’d only use it for light duty simple stuff).

Oh while your at it HP – gimme a VSA of this thing too.

Good-bye EVA and VSP, I never knew thee

Today I think was one of the last nails in the coffin for EVA. Nowhere was EVA present on the presentation other than providing tools to seamlessly migrate off of EVA onto 3PAR. Well that and they have pulled some of the ease of use from EVA into 3PAR.

Literally nowhere was Hitachi VSP (aka HP P9500). Since HP acquired 3PAR the OEM’d Hitachi equipment has been somewhat of a fifth wheel in the HP storage portfolio. Like the EVA, HP had customers who wanted the VSP for things that 3PAR simply could not or would not do at the time. Whether it was mainframe connectivity, or perhaps ultra high speed data warehousing. When HP acquired 3PAR, the high end was still PCI-X based and there wasn’t a prayer it was going to be able to dish out 10+ GB/second. The V800 changed that though. HP is finally making inroads into P9500 customers with the new 3PAR gear. I personally know of two shops that have massive deployments of HP P9500 that will soon have their first 3PAR in their respective data centers. I’m sure many more will follow.

Time will tell how long P9500 sticks around, but I’d be shocked – really shocked if HP decided to OEM whatever came next out of Hitachi.

What’s Missing

This is a massive set of announcements, the result of blood sweat and tears of many engineers work, assuming it all works as advertised they did an awesome job!

BUT.

There’s always a BUT isn’t there.

There is one area that I have hammered on 3PAR for what feels like three years now and haven’t gotten anywhere, the second area is more of a question/clarification.

SSD-Accelerated write caching

Repeat after me – AO (Adaptive Optimization) is not enough. Sub LUN auto tiering is not enough. I brought this up with David Scott himself last year, and I bring it up every time I talk to 3PAR. Please, I beg you please, come out with SSD-accelerated write caching technology. The last time I saw 3PAR in person I gave them two examples – EMC FastCache which is both a read and a write back cache. The second is Dell Compellent’s Data Progression technology. I’ve known about Compellent’s storage technology for years but there was one bit of information that I was not made aware of until earlier this year. That is their Data Progression technology by default automatically sends ALL writes (regardless of what tier the blocks live on), to the highest tier. On top of that, this feature is included in the base software license, it is not part of the add-on automatic tiering software.

The key is accelerating writes. Not reads, though reads are nice too. Reads are easy to accelerate compared to writes. The workload on my 3PAR here at my small company is roughly 92% write (yes you read that right). Accelerating reads on the 3PAR end of things won’t do anything for me!

If they can manage to pull themselves together and create a stable product, the Mt. Rainier technology from Qlogic could be a stop gap. I believe NetApp is partnered with them already for those products. Mt. Rainier, other than being a mountain near Seattle, is a host-based read and write acceleration technology for fibre channel storage systems.

Automated Peer Motion

HP released this more than a year ago – however to-date I have not noticed anything revolving around automatic movement of volumes. Call it what you want, load balancing, tiering, or something, as far as I know at this point any actions involving peer motion are entirely manual. Another point is I’m not sure how many peers an array can have. HP tries to say it’s near limitless – could you have 10 ? 20 ? 30 ? 100 ?  I don’t know the answer to that.

Again going back to Dell Compellent (sorry) their Live Volume software has automatic workload distribution. I asked HP about this last year and they said it was not in place then – I don’t see it in place yet.

That said – especially with the announcements here I’m doubling down on my 3PAR passion. I was seriously pushing Compellent earlier in the year(one of the main drivers was cost – one reseller I know calls them the Poor Man’s 3PAR) but where things stand now, their platform isn’t competitive enough at this point, from either a cost or architecture standpoint. I’d love to have my writes going to SSD as Compellent’s Data Progression does things, but now that the cost situation is reversed, it’s a no brainer to stick with 3PAR.

More Explosions

HP needs to take an excursion and blow up some 3PAR storage to see how fast and well it handles disaster recovery, take that new Peer Persistence technology and use it in the test.

Other storage announcements

As is obvious by now, the rest of the announcements pale in comparison to what came out of 3PAR. This really is the first major feature release of 3PAR software in three years (the last one being 2.3.1 which my company at the time participated in the press event and I was lucky enough to be the first production customer to run it in early January 2010 (had to for Exanet support – Exanet was going bust and I wanted to get on their latest code before they went *poof*)).

StoreOnce Improvements

The StoreOnce product line was refreshed earlier in the year and HP made some controversial performance claims. From what I see the only improvement here is they brought down some performance enhancements from the high end to all other levels of the StoreOnce portfolio.

I would really like to see HP release a VMware VSA with StoreOnce, really sounds like a no brainer, I’ll keep waiting..

StoreAll Improvements

StoreAll is the new name for the IBRIX product line, HP’s file and object storage offering. The main improvement here is something called Express Query which I think is basically a meta data search engine that is 1000s of times faster than using regular search functions for unstructured data. For me I’d rather just structure the data a bit more, the example given is tagging all files for a particular movie  to make it easier to retrieve later. I’d just have a directory tree and put all the files in the tree – I like to be organized. I think this new query tool depends on some level of structure – the structure being the tags you can put on files/objects in the system.

HP Converged storage growth - 38% YoY - notice no mention of StoreAll/IBRIX! Also no mention of growth for Lefthand either

HP has never really talked a whole lot about IBRIX – and as time goes on I’m understanding why. Honestly it’s not in the same league (or sport for that matter) for quality and reliability as 3PAR is, not even close. It lacks features, and according to someone I know who has more than a PB on HP IBRIX storage (wasn’t his idea it’s a big company)  it’s really not pleasant to use. I could say more but I’ll end by saying it’s too bad that HP does not have a stronger NAS offering. IBRIX may scale well on paper, but there’s a lot more to it than the paper specs of course. I went over the IBRIX+3PAR implementation guide, for using 3PAR back end storage on a IBRIX system and wasn’t impressed with some of the limitations.

Like everything else, I would like to see a full IBRIX cluster product deployable as a VMware VSA. It would be especially handy for small deployments(e.g. sub 1TB). The key here is the high availability.

HP also announced integration between StoreAll ExpressQuery and Autonomy software. When the Autonomy guy came on the stage I really just had one word to describe it: AWKWARD – given what happened recently obviously!

StoreVirtual

This was known as the P4000, or Lefthand before that. It was also refreshed earlier in the year. Nothing new announced today. HP is trying to claim the P4000 VSA as Software Defined Storage (ugh).

Conclusion

Make no mistake people – this storage announcement was all about 3PAR. David Scott tried his best to share the love, but there just wasn’t much exciting to talk about outside of 3PAR.

6,000+ words ! Woohoo. That took a lot of time to write, hopefully it’s the most in depth review of what is coming out.

November 20, 2012

NEWSFLASH: Autonomy sucked

Filed under: General — Tags: — Nate @ 8:34 am

HP came out this morning and told the world what most of the world knew already, the acquition of Autonomy was a mistake, resulting in a $8.8B write down. This is basically hot on the heels of another $8B write down on their EDS acquisition.

$17 Billion in losses

That number is just staggering. People say government is bad at waste, HP (among others) shows that the private sector is just as good at waste.

I suspected something with Autonomy was up when I saw a post by someone I know on LinkedIn saying he was laid off along with 200 other people – he was at Autonomy, I was not sure if the rest were part of Autonomy or not(I assume most were).

Imagine if they skipped out on Autonomy – and instead invested that $10B in mobile, a market that dwarfs that of Autonomy. Instead they prematurely killed Palm – who was widely regarded as having the best user experience at the time.  Most people knew it was going to take billions of further investment in Palm to see if they could make it work. Palm was broke when they were acquired, so it obviously lacked the resources to do it themselves. The best example of this is Microsoft’s mobile platform. Floundering for more than a decade at this point. But they don’t give up! Because they can’t give up.

HP’s cash reserves are low – which caused a credit reporting agency to cut their outlook.

I guess the good thing that is coming out of this is HP is admitting their mistakes. Hopefully they can run things better going forward.

It is fascinating to me how many large companies are doing so poorly in recent years. Part of the reason for these companies to be as big as they are is they are diversified much more than their smaller counterparts. Whether it is HP or Dell in the PC space, or the once high flyer consumer electronics makers Sony, Sharp,  Panasonic and NEC. IBM by contrast of course is doing quite well, they are doing stuff HP and Dell are constantly struggling to try to copy. Apparently Korean (I believe) makers Samsung and LG are eating the Japanese lunch. I suppose at some point this will shift further west to China decimating the Koreans.

People were convinced that Japan was going to take over the world in the late 80s and early 90s – and they never did, nor are there signs they ever will. For some reason I’ll probably never forget I was a freshman in high school in Palo Alto and one of my friends was the grandson of one of the founders of HP of all places. As a second language course I took spanish – naturally because it was supposed to be easier. He took Japanese. I asked him why and he said because his father thought it was the business language of the future. Now it seems the popular language to teach is Chinese. Anyway getting off topic.

I haven’t seen recent mention of whether Hitachi – one of the other Japanese conglomerates is having issues other than these lay-offs a few years ago. For some reason or another I get the sense Hitachi is more industrial than the others so aren’t as impacted with the shift to Korea for consumer electronics.

Next I’m waiting for Ballmer to get the boot for the abortion that is Windows 8.

November 4, 2011

Hitachi VSP SPC-1 Results posted

Filed under: Storage — Tags: , , , , — Nate @ 7:41 pm

[(More) Minor updates since original post] I don’t think I’ll ever work for a company that is in a position to need to leverage something like a VSP, but that doesn’t stop me from being interested as to how it performs. After all it has been almost four years since Hitachi released results for their USP-V, which had, at the time very impressive numbers(record breaking even, only to be dethroned by the 3PAR T800 in 2008) from a performance perspective(not so much from a cost perspective not surprisingly).

So naturally, ever since Hitachi released their VSP (which is OEM’d by HP as their P9500)about a year ago I have been very curious as to how well it performs, it certainly sounded like a very impressive system on paper with regards to performance. After all it can scale to 2,000 disks and a full terabyte of cache. I read an interesting white paper (I guess you could call it that) recently on the VSP out of curiosity.

One of the things that sort of stood out is the claim of being purpose built, they’re still using a lot of custom components and monolithic architecture on the system, vs most of the rest of the industry which is trying to be more of a hybrid of modular commodity and purpose built, EMC VMAX is a good example of this trend. The white paper, which was released about a year ago, even notes

There are many storage subsystems on the market but only few can be considered as real hi-end or tier-1. EMC announced its VMAX clustered subsystem based on clustered servers in April 2009, but it is often still offering its predecessor, the DMX, as the first choice. It is rare in the industry that a company does not start ramping the new product approximately half a year after launching. Why EMC is not pushing its newer product more than a year after its announcement remains a mystery. The VMAX scales up by adding disks (if there are empty slots in a module) or adding modules, the latter of which are significantly more expensive. EMC does not participate in SPC benchmarks.

The other interesting aspect of the VSP is it’s dual chassis design(each chassis in a separate rack), which, the white paper says is not a cluster, unlike a VMAX or a 3PAR system(3PAR isn’t even a cluster the way VMAX is a cluster intentionally with the belief the more isolated design would lead to higher availability). I would assume this is in response to earlier criticisms of the USP-V design in which the virtualization layer was not redundant(when I learned that I was honestly pretty shocked) – Hitachi later rectified the situation on the USP-V by adding some magic sauce that allowed you to link a pair of them together.

Anyways with all this fancy stuff, obviously I was pretty interested when I noticed they had released SPC-1 numbers for their VSP recently. Surely, customers don’t go out and buy a VSP because of the raw performance, but because they may need to leverage the virtualization layers to abstract other arrays, or perhaps the mainframe connectivity, or maybe they get a comfortable feeling knowing that Hitachi has a guarantee on the array where they will compensate you in the event there is data loss that is found to be the fault of the array (I believe the guarantee only covers data loss and not downtime, but am not completely sure). After all, it’s the only storage system on the planet that has such a stamp of approval from the manufacturer (earlier high end HDS systems had the same guarantee).

Whatever the reason,  performance is still a factor given the amount of cache and the sheer number of drives the system supports.

One thing that is probably never a factor is ease of use – since the USP/VSP are complex beasts to manage, something your very likely need significant amounts of training. One story I remember being told is a local HDS rep in the Seattle area mentioned to a 3PAR customer “after a few short weeks of training you’ll feel as comfortable on our platform as you do on 3PAR”. Something like that, the customer said “you just made my point for me”. Something like that anyways.

Would the VSP dethrone the HP P10000 ? I found the timing of the release of the numbers an interesting coincidence after all, I mean the VSP is over a year old at this point, why release now ?

So I opened the PDF, and hesitated .. what would be the results? I do love 3PAR stuff and I still view them as  somewhat of an underdog even though they are under the HP umbrella.

Wow, was I surprised at the results.

Not. Even. Close.

The VSP’s performance was not much better than that of the USP-V which was released more than four years ago. The performance itself is not bad but it really puts the 3PAR V800 performance in some perspective:

  • VSP -   ~270,000 IOPS @ $8.18/IOP
  • V800 – ~450,000 IOPS @ $6.59/IOP

But it doesn’t stop there

  • VSP -  ~$45,600 per usable TB (39% unused storage ratio)
  • V800 – ~$13,200 per usable TB (14% unused storage ratio)

Hitachi managed to squeeze in just below the limit for unused storage – which is not allowed to be above 45%, sounds kind of familiar. The VSP had only 17TB additional usable capacity than the original USP-V as tested. This really shows how revolutionary sub disk distributed RAID is for getting higher capacity utilization out of your system, and why I was quite disappointed when the VSP came out without anything resembling it.

Hitachi managed to improve their cost per IOP on the VSP vs the USP but their cost per usable TB has skyrocketed, about triple the price of the original USP-V results. I wonder why this is ? (I mis counted the digits in my original calculations, in fact the VSP is significantly cheaper than the USP when it was tested!) One big change in the VSP is the move to 2.5″ disk drives. The decision to use 2.5″ drives did kind of hamper results I believe since this is not a SPC-1/E Energy Efficiency test so power usage was never touted. But the largest 2.5″ 15k RPM drive that is available for this platform is 146GB(which is what was used).

One of the customers(I presume) at the HP Storage presentation I was at recently was kind of dogging 3PAR saying the VSP needs less power per rack than the T or V class(which requires 4x208V 30A per fully loaded rack).

I presume it needs less power because of the 2.5″ drives, also overall the drive chassis on 3PAR boxes do draw quite a bit of power by themselves(200W empty on T/V, 150W empty on F – given the T/V hold 250% more drives/chassis it’s a pretty nice upgrade).

Though to me especially on a system like this, power usage is the last thing I would care about. The savings from disk space would pay for the next 10 years of power on the V800 vs the VSP.

My last big 3PAR array cut the number of spindles in half and the power usage in half vs the array it replaced, while giving us roughly 50% better performance at the same time and the same amount of usable storage. So knowing that, and the efficiency in general I’m much less concerned as to how much power a rack requires.

So Hitachi can get 384 x 2.5″ 15k RPM disks in a single rack, and draw less power than 3PAR can get with 320 disks in a single rack.

You could also think of it this way: Hitachi can get ~56TB RAW of 15k RPM space in a single rack, and 3PAR can get 192TB RAW of 15k RPM space in a single rack, nearly four times the RAW space for double the power(nearly five times the usable(4.72x precisely – V800 @ 80% utilization VSP @ 58%) space due to the architecture), for me that’s a good trade off, a no brainer really.

The things I am not clear on with regards to these results – is this the best the VSP can do?  This does appear to be a dual chassis system. The VSP supports 2,000 spindles, the system tested only had 1,152, which is not much more than the 1,024 tested on the original USP-V. Also the VSP supports up to 1TB of cache however the system tested only had 512GB (3PAR V800 had 512GB of data cache too).

Maybe it is one of those situations where this is the most optimal bang for the buck on the platform,  perhaps the controllers just don’t have enough horsepower to push 2,000 spindles at full tilt – I don’t know.

Hitachi may have purpose built hardware, but it doesn’t seem to do them a whole lot of good when it comes to raw performance. I don’t know about you but I’d honestly feel let down if I was an HDS customer. Where is the incentive to upgrade from USP-V  to VSP ? The cost for usable storage is far higher, the performance is not much better. Cost per IOP is less, but I suspect the USP-V at this stage of the game with more current pricing for disks and stuff would be even more competitive than VSP. (correction from above, due to my mis-reading of the digits ($135k/TB instead of $13.5k/TB VSP is in fact cheaper making it a good upgrade from USP from that perspective! Sorry for the error 🙂 )

Maybe it’s in the software – I’ve of course never used a USP/VSP system but perhaps the VSP has newer software that is somehow not backwards compatible with the USP, though I would not expect this to be the case.

Complexity is still high, the configuration of the array stretches from page 67 of the full disclosure documentation to page 100. 3PAR’s configuration by contrast is roughly a single page.

Why do you think someone would want to upgrade from a USP-V to a VSP?

I still look forward to the day when the likes of EMC make their VMAX architecture across their entire product line, and HDS also unifies their architecture. I don’t know if it’ll ever happen but it should be possible at least with VMAX given their sound thumping of the Intel CPU drum set.

The HDS AMS 2000 series of products was by no means compelling either, when you could get higher availability(by means of persistent cache with 4 controllers), three times the amount of usable storage, and about the same performance on a 3PAR F400 for about the same amount of money.

Come on EMC, show us some VMAX SPC-1 love, you are, after all a member of SPC now (though kind of odd your logo doesn’t show up, just the text of the company name).

One thing that has me wondering on this VSP configuration – with such little disk space available on the system I have to wonder why anyone would bother with such a configuration with spinning rust on this platform for performance and just go SSD instead. Well one reason may be a 400GB SSD would run $35-40,000 (after 50% discount, assuming that is list price), ouch. 220 of those (system maximum is 256) would net you roughly 88TB raw (maybe 40TB usable), but cost $7.7M for the SSDs alone.

On the V800 side a 4-pack of 200GB SSDs will cost you roughly $25,000 after discount(assuming that is list price). Fully loading a V800 with the maximum of 384 SSDs(77TB raw) would cost roughly $2.4M for the SSDs alone(and still consume 7 racks) I think I like the 3PAR SSD strategy more  — not that I would ever do such a configuration!

Space would be more of course if the SSD tests used RAID 5 instead of RAID 1, I just used RAID 1 for simplicity.

Goes to show that these big boxes have a long way to go before they can truly leverage the raw performance of SSD. with 200-300 SSDs in either of these boxes the controllers would be maxed out and the SSDs would probably for the most part be idle. I’ve been saying for a while how stupid it seems for such a big storage system to be overwhelmed so completely by SSDs and that the storage systems need to be an order of magnitude(or more) faster. 3PAR did a good start with doubling of performance, but I’m thinking the boxes need to do at least 1M IOPS to disk, if not 2M, for systems such as these — how long will it take to get there?

Maybe HDS could show better results with 900GB 10k RPM disks(and put something like 1700 disks instead of 1,152 instead of the 146GB 15k RPM disks, should hopefully show much lower per usable TB costs, though their IOPS cost would probably shoot up. Though from what I can see, 600GB is the max supported 2.5″ 10k RPM disk supported. 900GB @ 58% utilization would yield about 522GB, and 600GB on 3PAR @ 80% utilization would yield about 480GB.

I suspect they could get their costs down significantly more if they went old school and supported larger 3.5″ 15k RPM disks and used those instead(the VSP supports 3.5″ disks just nothing other than 2TB Nearline (7200 RPM) and 400GB Flash – a curious decision). If you were a customer isn’t that something you’d be interested in ? Though another compromise you make with 3.5″ disks on the VSP is your then limited to 1,280 spindles, rather than 2,048 with 2.5″. Though this could be a side effect of a maximum addressable capacity of 2.5PB which is reached with 1,280 2TB disks, they very well could probably support 2,048 3.5″ disks if they could fit in the 2.5PB limit.

Their documentation says actual maximum usable with Nearline 2TB disks is 1.256PB (with RAID 10). With 58% capacity utilization that 1.25PB drops dramatically to 742TB. With RAID 5 (7+1) they say 2.19PB usable, let’s take that 58% number again 1.2PB @ 58% capacity utilization.

V800 by contrast would top out at 640TB with RAID 10 @ 80% utilization (stupid raw capacity limits!! *shakes fist at sky*),  or somewhere in the 880TB-1.04PB (@80%) range with RAID 5 (depending on data/parity ratio).

Another strange decision on both HDS and 3PAR’s part was to only certify the 2TB disk on their platform, and not anything smaller. Since both systems become tapped out at roughly half the number of supported spindles when using 2TB disks due to capacity limits.

An even more creative solution(to work around the limitation!!), which I doubt is possible is somehow restrict the addressable capacity of each disk to 1/2 the size, so you could in effect get 1,600 2TB disks in each with 1TB of capacity, then when they release software updates to scale the box to even higher levels of capacity (32GB of control cache can EASILY handle more capacity) just unlock the drives and get that extra capacity free. That would be nice at least, probably won’t happen though. I bet it’s technically possible on the 3PAR platform due to their chunklets. 3PAR in general doesn’t seem as interested in creative solutions as I am 🙂 (I’m sure HDS is even more rigid)

Bottom Line

So HDS has managed to get their cost for usable space down quite a bit, from $135,000/TB to around $45,000/TB, and improve performance by about 25%

They still have a long ways to go in the areas of efficiency, performance, scalability, simplicity and unified architecture. I’m sure there will be plenty of customers out there that don’t care about those things(or are just not completely informed) and will continue to buy VSP for other reasons, it’s still a solid platform if your willing to make those trade offs.

« Newer PostsOlder Posts »

Powered by WordPress