TechOpsGuys.com Diggin' technology every day

August 27, 2010

HP Now offering $2 billion

Filed under: News,Storage — Tags: — Nate @ 8:46 am

Dell apaprently is being a little bitch again and matched HP’s $27 offer for 3PAR, so HP came right back and offered $30 a share, or $2 billion, up from the $1.1 billion original offer from Dell ($18/share).

PALO ALTO, Calif., Aug 27, 2010 (BUSINESS WIRE) — HP /quotes/comstock/13*!hpq/quotes/nls/hpq (HPQ 37.66, -0.56, -1.47%) today announced that it has increased its proposal to acquire all of the outstanding shares of 3PAR Inc. /quotes/comstock/13*!par/quotes/nls/par (PAR 31.71, +5.68, +21.82%) to $30 per share in cash, or an enterprise value of $2.0 billion. The proposal represents an 11 percent premium above the most recent price offered by Dell Inc. of $27 per share. HP’s proposal is not subject to any financing contingency and has been approved by HP’s board of directors. Once approved by 3PAR’s board, HP expects the transaction to close by the end of the calendar year.

Cut your losses and run Dell. Go buy Compellent.

What should HP do with 3PAR

Filed under: Storage — Tags: — Nate @ 7:36 am

Assuming HP gets them, which I am optimistic will occur. This is what I think HP should do.

I’m sure this is all pretty obvious but it gives me something to write about 🙂

Why the changes to the EVA offerings and dropping of the F200 from 3PAR? To me, it all comes down to the 4 node architecture that 3PAR has, and the ability to offer persistent cache.

3PAR Persistent Cache is a resiliency feature designed to gracefully handle component failures by eliminating the substantial performance penalties associated with “write-through” mode. Supported on all quad-node and larger InServ arrays, Persistent Cache leverages the InServ’s unique Mesh-Active design to preserve write-caching by rapidly re-mirroring cache to the other nodes in the cluster in the event of a controller node failure.

(click on images for larger version)

Persistent cache allows service providers to operate at higher levels of utilization because they know they can maintain high performance even when a controller fails(or if two/three controllers fail in a 6/8 node T800 as long as they are the right nodes!), one of my former employers has a bunch of NetApp stuff, and I’m told they run them pretty much entirely active/passive, so as to protect performance in the event a controller fails. I’m sure that is a fairly common setup.

This is also useful during software upgrades, where the controllers have to be rebooted, or hardware upgrades (adding more FC ports or whatever).

Another reason is the ease of use around configuring multi site replication, and the ability to do synchronous long distance replication on the mid range systems.

3PAR® is the first storage vendor to offer autonomic disaster recovery (DR) configuration that enables you to set up and test your entire DR environment—including multi-site, multi-mode replication using both mid-range and high-end arrays—in just minutes.

[..]

Synchronous Long Distance replication combines the best of both worlds by offering the data integrity of synchronous mode disaster recovery and the extended distances (including cross-continental reach) possible with asynchronous replication. Remote Copy makes all of this possible without the complexity or professional services required by the monolithic vendors that offer multi-target disaster recovery products, and at half the cost or less.

I can understand why 3PAR came up with the F200, it is a bit cheaper, the only difference is the chassis the nodes go in, the nodes are the same, everything else is the same. So to me it’s a no brainer to spend the extra what 10-15% up front and get the capability to go to four controllers even if you don’t need that up front. Takes an extra 4U of rack space. If you really want to be cheap, go with the small 2-node EVA.

I find it kind of funny that on the main page for EVA, the EVA-4000’s blurb for what it is “Ideal for” is blank.

Assuming HP gets them, which I am optimistic will occur. This is what I think HP should do.

* Phase out current USP-based XP line with the 800-series of 3PAR systems, currently the T800
* Phase out the EVA Cluster with the enterprise 400-series of 3PAR systems, currently the T400
* Phase out the EVA 6400 and 8400 with the mid range 400-series of 3PAR systems, currently the F400
* Phase out the 3PAR F200, replace it with the EVA 4400-series

I’m sure this is all pretty obvious but it gives me something to write about 🙂

Why the changes to the EVA offerings and dropping of the F200 from 3PAR? To me, it all comes down to the 4 node architecture that 3PAR has, and the ability to offer persistent cache.

3PAR Persistent Cache is a resiliency feature designed to gracefully handle component failures by eliminating the substantial performance penalties associated with “write-through” mode. Supported on all quad-node and larger InServ arrays, Persistent Cache leverages the InServ’s unique Mesh-Active design to preserve write-caching by rapidly re-mirroring cache to the other nodes in the cluster in the event of a controller node failure.

(click on image for larger version)

Persistent cache allows service providers to operate at higher levels of utilization because they know they can maintain high performance even when a controller fails(or if two controllers fail in a 6/8 node T800 as long as they are the right nodes!), one of my former employers has a bunch of NetApp stuff, and I’m told they run them pretty much entirely active/passive, so as to protect performance in the event a controller fails. I’m sure that is a fairly common setup.

Another reason is the ease of use around configuring multi site replication, and the ability to do synchronous long distance replication on the mid range systems.

I can understand why 3PAR came up with the F200, it is a bit cheaper, the only difference is the chassis the nodes go in, the nodes are the same, everything else is the same. So to me it’s a no brainer to spend the extra what 10-15% up front and get the capability to go to four controllers even if you don’t need that up front. Takes an extra 4U of rack space. If you really want to be cheap, go with the small 2-node EVA.

August 26, 2010

Thank you HP

Filed under: News,Storage — Tags: — Nate @ 4:05 pm

That’s more like it, HP knows what they are doing, they just boosted their offer to $27/share for 3PAR.

SEATTLE (AP) — Hewlett-Packard Co. has again raised its bid for 3Par Inc. above an offer from rival Dell Inc., suggesting that the little-known data-storage maker could be worth more with one of the PC companies’ marketing muscle behind it.

The latest offer from HP for $27 per share in cash, or about $1.69 billion, is nearly three times what 3Par had been trading at before Dell made the first bid last week.

Bring it on. Did I ever mention 3PAR went IPO on my birthday? Coincidence yeah I know but maybe it was a sign.. if I recall right they were supposed to IPO one day earlier but something delayed it by one day. I never did buy any stock(as I mentioned before I don’t buy stocks or bonds).

This is a joke, right?

Filed under: News,Storage — Tags: — Nate @ 7:55 am

This is a joke, right?

So today, right after the jobless claims came out, Dell came out and increased their bid for 3PAR to $24.30, thirty cents above HP’s offer, which was $6.00 above Dell’s original offer.

Even now, hours later I can’t help but laugh, I mean this is a good example showing what kind of company Dell is. Why are they wasting everyone’s time with a mere 1% increase in their bid?

A survey recently done by Reuters came up with an estimated $29 final price for 3PAR.

[..] That’s why some analysts say traditional metrics aren’t sufficient in assessing the value of 3PAR — a small company with unique technology that could grow exponentially with the the massive salesforces of either Dell or HP.

This morning on Squawk Box folks were saying the next step is for HP to bid up again and get 3PAR to eliminate the price matching clause with Dell to level the playing field.

I keep seeing people ask who needs 3PAR more. I think it’s clear Dell needs them more, Dell has nothing right now. But I’m sure Dell will do a lot to screw up the 3PAR technology over time, so HP is the better fit, more innovative company with more market leadership and of course a lot more resources from pretty much every angle.

August 24, 2010

EMC and IBM’s Thick chunks for automagic storage tiering

Filed under: Storage,Virtualization — Tags: , , , , — Nate @ 12:59 pm

If you recall not long ago IBM released some SPC-1 numbers with their automagic storage tiering technology Easy Tier. It was noted that they are using 1GB blocks of data to move between the tiers. To me that seemed like a lot.

Well EMC announced the availability of FAST v2 (aka sub volume automagic storage tiering) and they too are using 1GB blocks of data to move between tiers according to our friends at The Register.

Still seems like a lot. I was pretty happy when 3PAR said they use 128MB blocks, which is half the size of their chunklets. I thought to myself when I first heard of this sub LUN tiering that you may want a block size as small as, I don’t know 8-16MB. At the time 128MB still seemed kind of big(before I had learned of IBM’s 1GB size).

Just think of how much time it takes to read 1GB of data off a SATA disk (since the big target for automagic storage tiering seems to be SATA + SSD).

Anyone know what size Compellent uses for automagic storage tiering?

August 23, 2010

HP to the rescue

Filed under: Datacenter,Events,News,Storage — Tags: , , , , — Nate @ 6:03 am

Knock knock.. HP is kicking down your back door 3PAR..

Well that’s more like it, HP offered $1.6 Billion to acquire 3PAR this morning topping Dell’s offer by 33%. Perhaps the 3cV solution can finally be fully backed by HP. More info from The Register here. And more info on what this could mean to HP and 3PAR products from the same source here.

3PAR’s website is having serious issues, this obviously has spawned a ton of interest in the company, I get intermittent blank pages and connection refused messages.

I didn’t wake my rep up for this one.

The 3cV solution was announced about three years ago –

Elements of the 3cV solution include:

  • 3PAR InServ® Storage Servers—highly virtualized, tiered-storage arrays built for utility computing. Organizations creating virtualized IT infrastructures for workload consolidation use InServ arrays to reduce the cost of allocated storage capacity, storage administration, and SAN infrastructure.
  • HP BladeSystem c-Class Server Blades—the leading blade server infrastructure on the market for datacenters of all sizes. HP BladeSystem c-Class server blades minimize energy and space requirements and increase administrative productivity through advantages in I/O virtualization, powering and cooling, and manageability.
  • VMware vSphere—the leading virtualization platform for industry-standard servers. VMware vSphere helps customers reduce capital and operating expenses, improve agility, ensure business continuity, strengthen security, and go green.

While I could not find the image that depicts the 3cV solution(not sure how long it’s been gone for), here is more info on it for posterity.

The Advantages of 3cV
3cV offers combined benefits that enable customers to manage and scale their server and storage environments simply, allowing them to halve server, storage and operational costs while lowering the environmental impact of the datacenter.

  • Reduces storage and server costs by 50%—The inherently modular architectures of the HP BladeSystem c-Class and the 3PAR InServ Storage Server—coupled with the increased utilization provided by VMware Infrastructure and 3PAR Thin Provisioning—allow 3cV customers to do more with less capital expenditure. As a result, customers are able to reduce overall storage and server costs by 50% or more. High levels of availability and disaster recovery can also be affordably extended to more applications through VMware Infrastructure and 3PAR thin copy technologies.
  • Cuts operational costs by 50% and increases business agility—With 3cV, customers are able to provision and change server and storage resources on demand. By using VMware Infrastructure’s capabilities for rapid server provisioning and the dynamic optimization provided by VMware VMotion and Distributed Resource Scheduler (DRS), HP Virtual Connect and Insight Control management software, and 3PAR Rapid Provisioning and Dynamic Optimization, customers are able to provision and re-provision physical servers, virtual hosts, and virtual arrays with tailored storage services in a matter of minutes, not days. These same technologies also improve operational simplicity, allowing overall server and storage administrative efficiency to increase by 3x or more.
  • Lowers environmental impact—With 3cV, customers are able to cut floor space and power requirements dramatically. Server floor space is minimized through server consolidation enabled by VMware Infrastructure (up to 70% savings) and HP BladeSystem density (up to 50% savings). Additional server power requirements are cut by 30% or more through the unique virtual power management capabilities of HP Thermal Logic technology. Storage floor space is reduced by the 3PAR InServ Storage Server, which delivers twice the capacity per floor tile as compared to alternatives. In addition, 3PAR thin technologies, Fast RAID 5, and wide striping allow customers to power and cool as much as 75% less disk capacity for a given project without sacrificing performance.
  • Delivers security through virtualization, not dedicated hardware silos—Whereas traditional datacenter architectures force tradeoffs between high resource utilization and the need for secure segregation of application resources for disparate user groups, 3cV resolves these competing needs through advanced virtualization. For instance, just as VMware Infrastructure securely isolates virtual machines on shared severs, 3PAR Virtual Domains provides secure “virtual arrays” for private, autonomous storage provisioning from a single, massively-parallel InServ Storage Server.

Though due to the recent stack wars it’s been hard for 3PAR to partner with HP to promote this solution since I’m sure HP would rather push their own full stack. Well hopefully now they can. The best of both worlds technology wise can come together.

More details from 3PAR’s VMware products site.

From HP’s offer letter

We propose to increase our offer to acquire all of 3PAR outstanding common stock to $24.00 per share in cash. This offer represents a 33.3% premium to Dell’s offer price and is a “Superior Proposal” as defined in your merger agreement with Dell. HP’s proposal is not subject to any financing contingency. HP’s Board of Directors has approved this proposal, which is not subject to any additional internal approvals. If approved by your Board of Directors, we expect the transaction would close by the end of the calendar year.

In addition to the compelling value offered by our proposal, there are unparalleled strategic benefits to be gained by combining these two organizations. HP is uniquely positioned to capitalize on 3PAR’s next-generation storage technology by utilizing our global reach and superior routes to market to deliver 3PAR’s products to customers around the world. Together, we will accelerate our ability to offer unmatched levels of performance, efficiency and scalability to customers deploying cloud or scale-out environments, helping drive new growth for both companies.
As a Silicon Valley-based company, we share 3PAR’s passion for innovation.
[..]

We understand that you will first need to communicate this proposal and your Board’s determinations to Dell, but we are prepared to execute the merger agreement immediately following your termination of the Dell merger agreement.

Music to my ears.

[tangent — begin]

My father worked for HP in the early days back when they were even more innovative than they are today, he recalled their first $50M revenue year. He retired from HP in the early 90s after something like 25-30 years.

I attended my freshman year at Palo Alto Senior High school, and one of my classmates/friends (actually I don’t think I shared any classes with him now that I think about it) was Ben Hewlett, grandson of one of the founders of HP. Along with a couple other friends Ryan and Jon played a bunch of RPGs (I think the main one was Twilight 2000, something one of my other friends Brian introduced me to in 8th grade).

I remember asking Ben one day why he took Japanese as his second language course when it was significantly more difficult than Spanish(which was the easy route, probably still is?) I don’t think I’ll ever forget his answer. He said “because my father says it’s the business language of the future..”

How times have changed.. Now it seems everyone is busy teaching their children Chinese. I’m happy knowing English, and a touch of bash and perl.

I never managed to keep in touch with my friends from Palo Alto, after one short year there I moved back to Thailand for two more years of high school there.

[tangent — end]

HP could do some cool stuff with 3PAR, they have much better technology overall, I have no doubt HP has their eyes on their HDS partnership and the possibility of replacing their XP line with 3PAR technology in the future has got to be pretty enticing. HDS hasn’t done a whole lot recently, and I read not long ago that regardless what HP says, they don’t have much (if any) input into the HDS product line.

The HP USP-V OEM relationship is with Hitachi SSG. The Sun USP-V reseller deal was struck with HDS. Mikkelsen said: “HP became a USP-V OEM in 2004 when the USP-V was already done. HP had no input to the design and, despite what they say, very little input since.” HP has been a Hitachi OEM since 1999.

Another interesting tidbit of information from the same article:

It [HDS] cannot explain why it created the USP-V – because it didn’t, Hitachi SSG did, in Japan, and its deepest thinking and reasons for doing so are literally lost in translation.

The loss of HP as an OEM customer of HDS, so soon after losing Sun as an OEM customer would be a really serious blow to HDS(one person I know claimed it accounts for ~50% of their business), whom seems to have a difficult time selling stuff in western countries, I’ve read it’s mostly because of their culture. Similarly it seems Fujitsu has issues selling stuff in the U.S. at least, they seem to have some good storage products but not much attention is paid to them outside of Asia(and maybe Europe). Will HDS end up like Fujtisu as a result of HP buying 3PAR? Not right away for sure, but longer term they stand to lose a ton of market share in my opinion.

And with the USP getting a little stale (rumor has it they are near to announcing a technology refresh for it), it would be good timing for HP to get 3PAR, to cash in on the upgrade cycle by getting customers to go with the T class arrays instead of the updated USP whenever possible.

I read on an HP blog earlier in the year an interesting comment –

The 3PAR is drastically less expensive than an XP, but is an active/active concurrent design, can scale up to 8 clustered controllers, highly virtualized, customers can self-install, self-maintain, and requires no professional services. Its on par with the XP in terms of raw performance, but has the ease of use of the EVA. Like the XP, the 3PAR can be carved up into virtual domains so that service providers or multi-tenant arrays can have delegated administration.

I still think 3PAR is worth more, and should stay independent, but given the current situation would much rather have them in the arms of HP than Dell.

Obviously those analysts that said Dell paid too much for 3PAR were wrong, and didn’t understand the value of the 3PAR technology. HP does otherwise they wouldn’t be offering 33% more cash.

After the collapse of so many of 3PAR’s NAS partners over the past couple of years, the possibility of having Ibrix available again for a longer term solution is pretty good. Dell bought Exanet’s IP earlier in the year. LSI owns Onstor, HP bought Polyserve and Ibrix. Really just about no “open” NAS players left. Isilon seems to be among the biggest NAS players left but of course their technology is tightly integrated into their disk drive systems, same with Panasas.

Maybe that recent legal investigation into the board at 3PAR had some merit after all.

Dell should take their $billion and shove it in Pillar’s(or was it Compellent ? I forgot) face, so the CEO there can make his dream of being a billion dollar storage company come true, if only for a short time.

I’m not a stock holder or anything, I don’t buy stocks(or bonds).

August 16, 2010

Trying not to think about it

Filed under: News,Storage — Tags: , — Nate @ 6:54 am

Hell just got a little colder. It seems 3PAR was bought by Dell for ~$1.15 billion this morning(news is so fresh as of this posting the official 3PAR press release isn’t posted yet, just a blank page).. I woke my rep up and asked him what happened and he wasn’t aware that it had gone down, they did a good job at keeping it quiet.

It’s not like 3PAR was in any trouble, they had no debt, and the highest margins in the industry along with good sales. They haven’t been making too much profits mainly because they are hiring so many new people to grow the company. In my area since I started using 3PAR they’ve gone from 1 Sales and 1 SE to 3 Sales and 2 SEs, and they’ve really expanded over seas and stuff. I would of expected them to hold out for a few more billion, $1 seems far too cheap.

I have read several complaints about how Equallogic has gone downhill since Dell bought them (from original Equallogic users, not that I’ve ever used that stuff so don’t know whether or not they are accurate), I fear the same may happen to 3PAR. But it will take a little while for it to start.

I think the only hope 3PAR has at this point is if Dell keeps them independent for as long as possible. Outside of their DCS division Dell really shows they have no ability to innovate.

I wonder what Marc Farley thinks, as a former Equallogic/Dell employee now he’s at 3PAR, and Dell came and found him again..

Maybe I’ll get lucky and this will just turn out to be a bad dream, some evil hacker out there manipulating the stock price by planting news.

Do me one favor Dell, stay the hell away from Extreme Networks! With Brocade having bought Foundry, HP having bought 3COM. I was told by a Citrix guy that Juniper tried to buy Extreme shortly after they bought Netscreen instead of making their own switches, from what I recall he said Juniper bought Netscreen for $500M which was way over inflated, and Extreme demanded $1 billion at the time. There’s not many other Enterprise/Service provider independent Ethernet companies still around. There is Force10, Dell can go buy them be a lot cheaper too.

I suppose more than anything else, Dell buying 3PAR is Dell admitting the Equallogic technology doesn’t hold a candle to 3PAR technology, ok maybe a candle but not much more than that!

It may be 6:51AM but I think I need a drink.

August 13, 2010

Do you really need RAID 6

Filed under: Storage — Tags: , , , , , — Nate @ 11:34 pm

I’ve touched on this topic before but I don’t think I’ve ever done a dedicated entry on the topic. I came across a blog post from Marc Farley, which got my mind thinking on the topic again. He talks about a leaked document from EMC trying to educate their sales force to fight 3PAR in the field. One of the issues raised is 3PAR’s lack of RAID 6 (nevermind the fact that this is no longer true, 2.3.1 introduced RAID 6(aka RAID DP) in early January 2010).

RAID 6 from 3PAR’s perspective for the most part was mostly just a check box, because there are those customers out there that have hard requirements, they disregard the underlying technology and won’t even entertain the prospect unless it mets some of their own criteria.

What 3PAR did in their early days was really pretty cool, the way they virtualize the disks in the system which in turn distributes the RAID across many many disks. On larger arrays you can have well over 100,000 RAID arrays on the system. This provides a few advantages:

  • Evenly distributes I/O across every available spindle
  • Parity is distributed across every available spindle – no dedicated parity disks
  • No dedicated hot spare spindles
  • Provides a many:many relationship for RAID rebuilds
    • Which gives the benefit of near zero impact to system performance while the RAID is rebuilt
    • Also increases rebuild performance by orders of magnitude (depending on # of disks)
  • Only data that has been written to the disk is rebuilt
  • Since there are no spare spindles, only spare “space” on each disk, in the event you suffer multiple disk failures before having the failed disks swapped(say you have 10 disks fail over a period of a month and for whatever reason you did not have the drives replaced right away) the system will automatically allocate more “spare” space as long as there is available space to write to on the system. Unlike traditional arrays where you may find yourself low or even out of hot spares after multiple disks fail which will make you much more nervous and anxious to replace those disks than if it were a 3PAR system(or similarly designed system)

So do you need RAID 6?

To my knowledge the first person to raise this question was Robin from Storage Mojo, whom a bit over three years ago wrote a blog post talking about how RAID 5 will have lost it’s usefulness in 2009. I have been following Robin for a few years (online anyways), he seems like a real smart guy I won’t try to dispute the math. And I can certainly see how traditional RAID arrays with large SATA disks running RAID 5 are in quite a pickle, especially if there is a large data:parity ratio.

In the same article he speculates on when RAID 6 will become as “useless” as RAID 5.

I think what it all really comes down to is a couple of things:

  • How fast can your storage system rebuild from a failed disk
    • For distributed RAID this is determined by the number of disks participating in the RAID arrays and the amount of load on the system, because when a disk fails one RAID array doesn’t go into degraded mode, potentially hundreds of them do, which then triggers all of the remaining disks to help in the rebuild.
    • For 3PAR systems at least this is determined by how much data has actually been written to the disk.
  • What is the likelihood that a 2nd disk will fail(in the case of RAID 5) or two more disks(RAID 6) fail during this time?

3PAR is not alone with the distributed RAID. As I have mentioned before, others that I know of that have similar technology are at least : Compellent, Xiotech and IBM XIV. I bet there are others as well.

From what I understand of Xiotech’s technology I don’t *think* that RAID arrays can span their ISE enclosures, I think they are limited to a single enclosure(by contrast I believe a LUN can span enclosures), so for example if there are 30 disks in the enclosure and a disk fails the maximum number of disks that can participate in the rebuild is 30. Though in reality I think the number is less given how they RAID based on disk heads, the number of individual RAID arrays is far fewer vs 3PAR’s chunklet-based RAID.

I’ve never managed to get in depth info on Compellent’s or IBM XIV’s design with regards to specifics around how RAID arrays are constructed. Though I haven’t tried any harder than looking at what is publically available on their web sites.

Distributed RAID really changes the game in my opinion as far as RAID 5’s effective life span (same goes for RAID 6 of course).

Robin posted a more recent entry several months ago about the effectiveness of RAID 6, and besides on of the responders being me, there was another person that replied with a story that made me both laugh and feel sorry for the guy, a horrific experience with RAID 6 on Solaris ZFS with Sun hardware –

Depending on your Recovery Time Objectives, RAID6 and other dual-parity schemes (e.g. ZFS RAIDZ2) are dead today. We know from hard experience.

Try 3 weeks to recover from a dual-drive failure on 8x 500GB ZFS RAIDZ2 array.

It goes like this:
– 2 drives fail
– Swap 2 drives (no hot spares on this array), start rebuild
– Rebuild-while-operating took over one week. How much longer, we don’t know because …
– 2 more drives failed 1 week into the rebuild.
– Start restore from several week old LTO-4 backup tapes. The tapes recorded during rebuild were all corrupted.
– One week later, tape restore is finished.
– Total downtime, including weekends and holidays – about 3 weeks (we’re not a 24xforever shop).

Shipped chassis and drives back to vendor – No Trouble Found!

Any system that takes longer than say 48 hours to rebuild you probably do want that extra level of protection in there, whether it is dual parity or maybe even triple parity(something I believe ZFS offers now?).

Add to that disk enclosure/chassis/cage(3PAR term) availability which means you can lose an entire shelf of disks without disruption, which means in their S/T class systems 40 disks can go down and your still ok(protection against a shelf failing is the default configuration and is handled automatically – this can be disabled upon request of the user since it does limit your RAID options based on the number of shelves you have).  So not only do you need to suffer a double disk failure but that 2nd disk has to fail:

  • In a DIFFERENT drive chassis than the original disk failure
  • Happens to be a disk that has portions of RAID set(s) that were also located on the original disk that failed

But if you can recover from a disk failure in say 4 hours even on a 2TB disk with RAID 5, do you really need RAID 6? I don’t know what the math might look like but would be willing to bet that a system that takes 3 days to rebuild a RAID 6 volume has about as much of a chance of suffering a triple disk failure as a system that takes 4 hours (or less) to rebuild a RAID 5 array suffering a double disk failure.

Think about the probability of the two above bullet points on how a 2nd drive must fail in order to cause data loss, combine that with the fast rebuild of distributed RAID, and cosnider whether or not you really need RAID 6. Do you want to take the I/O hit ? Sure it is an easy extra layer of protection, but you might be protecting yourself that is about as likely to happen as a natural disaster taking out your data center.

I mentioned to my 3PAR rep a couple of weeks ago about the theory of RAID 6 with “cage level availability” has the potential of being able to protect against two shelves of disks failing(so you can lose up to 80 disks on the big arrays) without impact. I don’t know if 3PAR went this far to engineer their RAID 6, I’ve never seen it mentioned so I suspect not, but I don’t think there is anything that would stop them from being able to offer this level of protection at least with RAID 6 6+2.

Myself I speculate that on a decently sized 3PAR system (say 200-300 disks) SATA disks probably have to get to 5-8TB in size before I think I would really think hard about RAID 6. That won’t stop their reps from officially reccomending RAID 6 with 2TB disks though.

I can certainly understand the population at large coming to the conclusion that RAID 5 is no longer useful, because probably 99.999% of the RAID arrays out there (stand alone arrays as well as arrays in servers) are not running on distributed RAID technology. So they don’t realize that another way to better protect your data is to make sure the degraded RAID arrays are rebuilt (much) faster, lowering the chance of additional disk failures occurring at the worst possible time.

It’s nice that they offer the option, let the end user decide whether or not to take advantage of it.

November 24, 2009

81,000 RAID arrays

Filed under: Storage,Virtualization — Tags: , , — Nate @ 2:56 pm

I keep forgetting to post about this, I find this number interesting myself. It is the number of mini RAID arrays on my storage system, which has 200 spindles, which comes out to about 400 RAID arrays per disk! Why so many? Well it allows for maximum distribution of storage space and I/O across the system as well as massively parallel RAID rebuilds as every disk in the system will participate when a disk fails, which leads to faster rebuild times and much better service times during rebuild.

While 3PAR talks a lot about their mini RAID arrays(composed of virtual 256MB disks) it turns out there really isn’t an easy way to query how many there are, I suppose because they expect it to be so abstracted that you should not care. But I like to poke around if you haven’t noticed already!

The little script to determine this number is:

#!/bin/bash

export ARRAYS_TOTAL=0
export ARRAY="mrt"
echo "showld -d" | ssh $ARRAY | grep cage | while read line;
do
        export RAWSIZE=`echo $line | awk '{print $7}'`;
        export LD_NAME=`echo $line | awk '{print $2}'`;
        export SETSIZE=`echo $line | awk '{print $10}'`;
        export ARRAYS=`echo "${RAWSIZE}/256/${SETSIZE}" | bc`;
        export ALL_ARRAYS=`echo "${ARRAYS_TOTAL}+${ARRAYS}" | bc `;
        export ARRAYS_TOTAL="$ALL_ARRAYS"; echo "NAME:${LD_NAME} Raw Size:${RAWSIZE}  Set Size:${SETSIZE} Micro arrays in LD:${ARRAYS}  Total Micro arrays so far:${ALL_ARRAYS}";
done

Hopefully my math is right..

Output looks like:

NAME:log2.0 Raw Size:40960  Set Size:2 Micro arrays in LD:80  Total Micro arrays so far:80
NAME:log3.0 Raw Size:40960  Set Size:2 Micro arrays in LD:80  Total Micro arrays so far:160
NAME:pdsld0.0 Raw Size:49152  Set Size:2 Micro arrays in LD:96  Total Micro arrays so far:256
[..]
NAME:tp-7-sd-0.242 Raw Size:19968  Set Size:6 Micro arrays in LD:13  Total Micro arrays so far:81351
NAME:tp-7-sd-0.243 Raw Size:19968  Set Size:6 Micro arrays in LD:13  Total Micro arrays so far:81364

Like the mini RAID arrays the logical disks (the command above is showld or show logical disks) are created/maintained/deleted automatically by the system, another layer of abstraction that you really never need to concern yourself with.

The exact number is 81,364 which is up from about 79,000 in June of this year. To me at least it’s a pretty amazing number when I think about it, 80,000+ little arrays working in parallel, how does the system keep track of it all?

3PAR isn’t unique in this technology though I think maybe they were first. I believe Compellent has something similar, and Xiotech constructs RAID arrays at the disk head level, which I find very interesting, I didn’t know it was possible to “target” areas of the disk as specifically as the head. I think of these three implimentations though the 3PAR one is the most advanced because it’s implimented in hardware(Compellent is software), and it’s much more granular(400 per disk in this example, Xiotech would have up to 8 per disk I think).

The disks are not full yet either, they are running at about ~87% of capacity, so maybe room for another 10,000 RAID arrays on them or something..

I learned pretty quick there’s a lot more to storage than just the number/type of disks..

(Filed under virtualization as well since this is a virtualized storage post)

November 20, 2009

Enterprise SATA disk reliability

Filed under: Storage — Tags: , — Nate @ 7:55 am

..even I was skeptical, though I knew with support it probably wouldn’t be a big deal, a disk fails and it gets replaced in a few hours. When we were looking to do a storage refresh last year I was proposing going entirely SATA for our main storage array because we had a large amount of inactive data, stuff we write to disk and then read back that same day then keep it around for a month or so before deleting it. So in theory it sounded like a good option, we get lots of disks to give us the capacity and those same lots of disks give us enough I/O to do real work too.

I don’t think you can do this with most storage systems, the architecture’s don’t support it nearly as well. To this point the competition was trying to call me out on my SATA solution last year citing reliability and performance reasons. They later backtracked on their statements after I pointed them to some documentation their own storage architects wrote which said the exact opposite.

It’s been just over a year since we had our 3PAR T400 installed with 200x750GB SATA disks, which are Seagate ST3750640NS if you are curious.

Our disks are hit hard, very hard. It’s almost a daily basis that we exceed 90 IOPS/disk at some point during the day, which large I/O sizes this drives the disk’s response time way up, I have another blog entry on that. Fortunately the controller cache is able to absorb that hit. But the point is our disks are not idle, they get slammed 24/7.

pd-svctime20091120

Average Service time across all spindles on my T400

How many disk failures have we had in the past year? One? two? three?

Zero.

For SATA drives, even enterprise SATA drives to me this is a shocking number given the load these disks are put under on a daily basis. Why is it zero? I think a good part of it has to do with the advanced design of the 3PAR disk chassis. Something they don’t really talk about outside of their architecture documentation. I think it is quite a unique design in their enterprise S and T-class systems (not available in their E or F-class systems). The biggest advantages these chassis have I believe is two fold:

  • Vibration absorbing drive sleds – I’ve read in several places that vibration is the #1 cause of disk failure
  • Switched design – no loops, each drive chassis is directly connected to the controllers, and each disk has two independent switched connections to a midplane in the drive chassis. Last year we had two separate incidents on our previous storage array that due to the loop design, allowed a single disk failure to take down the entire loop causing the array to go partially off line(outage), despite there being redundant loops on the system. I have heard stories more recently of other similar arrays doing the same thing.

There are other cool things but my thought is those are the two main ones that drive an improvement in reliability. They have further cool things like fast RAID rebuild which was a big factor in deciding to go with SATA on their system, but even if the RAID rebuilds in 5 seconds that doesn’t make the physical disks more reliable, and this post is specifically about physical disk reliability rather than recovering from failure. But as a note I did measure rebuild rate, and for a fully loaded 750GB disk we can rebuild a degraded RAID array in about three hours, with no impact to array system performance.

My biggest complaint about 3PAR at this point is their stupid naming convention for their PDFs. STUPID! FIX IT! I’ve been complaining off and on for years. But in the grand scheme of things…

Not shocked? Well I don’t know what to say. Even my co-worker who managed our previous storage system is continually amazed that we haven’t had a disk die

Now I’ve jinxed it I’m sure and I’ll get an alert saying a disk has died.

« Newer PostsOlder Posts »

Powered by WordPress