TechOpsGuys.com Diggin' technology every day

26Mar/10Off

Enterprise EqualLogic

So, I attended that Dell/Denali event I mentioned recently. They covered some interesting internals on the architecture of Exchange 2010. Covering technical topics like migrating to it, how it protects data, etc. It was interesting from that standpoint, they didn't just come out and say "Hey we are big market leader you will use us, resistance is futile". So I certainly appreciated that although honestly I don't really deal with MS stuff in my line of work, I was just there for the food and mainly because it was walking distance(and an excuse to get out of the office).

The other topic that was heavily covered was on Dell EqualLogic storage. This I was more interested in. I have known about EqualLogic for years, and never really liked their iSCSI-only approach(I like iSCSI but I don't like single protocol arrays, and iSCSI is especially limiting as far as extending array functionality with other appliances, e.g. you can optionally extend a Fiber channel only array with iSCSI but not vise versa - please correct me if I'm wrong.)

I came across another blog entry last year which I found extremely informative - "Three Years of EqualLogic" which listed some great pros and some serious and legimate cons to the system after nearly three years of using it.

Anyways, being brutally honest if there is anything I did really "take away" from the conference with regards to EqualLogic storage it is this - I'm glad I chose 3PAR for my storage needs(and thanks to my original 3PAR sales rep for making the cold call many years ago to me. I knew him from an earlier company..).

So where to begin, I've had a night to sleep on this information and absorb it in a more logical way, I'll start out with what I think are the pros to the EqualLogic platform:

  • Low cost - I haven't priced it personally but people say over and over it's low cost, which is important
  • Easy to use - It certainly looks very easy to use, very easy to setup, I'm sure they could get 20TB of EqualLogic storage up running in less time than 3PAR could do it no doubt.
  • Virtualized storage makes it flexible. It pales in comparison to 3PAR virtualization but it's much better than legacy storage in any case.
  • All software is included - this is great too, no wild cards with licensing. 3PAR by contrast heavily licenses their software and at times it can get complicated in some situations(their decision to license the zero detection abilities of their new F/T class arrays was a surprise to me)

So it certainly looks fine for low(ish) cost workgroup storage, one of the things the Dell presenter tried to hammer on is how it is "Enterprise ready". And yes I agree it is ready, lots of enterprises use workgroup storage I'm sure for some situations(probably because their real legacy enterprise storage is too expensive to add more applications to, or doesn't scale to meet mixed workloads simultaneously).

Here's where I get down & dirty.

As far as really ready for enterprise storage - no way it's not ready, not in 2010, maybe if it was 1999.

EqualLogic has several critical architectural deficiencies that would prevent me from wanting to use it or advising others to use it:

  • Active/passive controller design - I mean come on, in 2010 your still doing active/passive? They tried to argue the point where you don't need to "worry" about balancing the load between controllers and then losing that performance when a controller fails. Thanks, but I'll take the extra performance from the other active controller(s)[with automagic load balancing, no worrying required], and keep performance high with 3PAR Persistant Cache in the event of a controller failure(or software/hardware upgrade/change).
  • Need to reserve space for volumes/snapshots. Hello, 21st century here, we have the technology for reservationless systems, ditching reservations is especially critical when dealing with thin provisioning.
  • Lack of storage pools. This compounds the effects of a reservation-based storage system. Maybe EqualLogic has storage pools, I just did not hear it mentioned in the conference nor anywhere else. Having to reserve space for each and every volume is just stupidly inefficient. At the very least you should be able to reserve a common pool of space and point multiple volumes to it to share. Again hints to their lack of a completely virtualized design. You get a sense that a lot of these concepts were bolted on after the fact and not designed into the system when you run into system limitations like this.
  • No global hot spares - so the more shelves you have the more spindles are sitting there idle, doing nothing. 3PAR by contrast does not use dedicated spares, each and every disk in the system has spare capacity on it. When a RAID failure occurs the rebuild is many:many instead of many:one. This improves rebuild times by 10x+. Also due to this design, 3PAR can take advantage of the I/O available on every disk on the array. There aren't even dedicated parity disks, parity is distributed evenly across all drives on the system.
  • Narrow striping. They were talking about how the system distributes volumes over all of the disks in the system. So I asked them how far can you stripe say a 2TB volume? They said over all of the shelves if you wanted to, but there is overhead from iSCSI because apparently you need an iSCSI session to each system that is hosting data for the volume, due to this overhead they don't see people "wide striping" of a single volume over more than a few shelves. 3PAR by contrast by default stripes across every drive in the system, and the volume is accessible from any controller(up to 8 in their high end) transparently. Data moves over an extrenely high speed backplane to the controller that is responsible for those blocks. In fact the system is so distributed that it is impossible to know where your data actually is(e.g. data resides on controller 1 so I'll send my request to controller 1), and the system is so fast that you don't need to worry about such things anyways.
  • Cannot easily sustain the failure of a whole shelf of storage. I asked the Dell rep sitting next to me if it was possible, he said it was but you had to have a special sort of setup, it didn't sound like it was going to be something transparent to the host, perhaps involving synchrnous replication from one array to another, in the event of failure you probably had to re-point your systems to the backup, I don't know but my point is I have been spoiled by 3PAR in that by default their system uses what they call cage level availability, which means data is automatically spread out over the system to ensure a failure of a shelf does not impact system availability. This requires no planning in advance vs other storage systems, it is automatic. You can turn it off if you want as there are limitations as far as what RAID levels you can use depending no the number of shelves you have (e.g. you cannot run RAID 5 with cage level availability with only 2 shelves because you need at least 3), the system will prevent you from making mistakes.
  • One RAID level per array(enclosure) from what the Dell rep sitting next to me said. Apparently even on their high end 48-drive arrays you can only run a single level of RAID on all of the disks? Seems very limiting for a array that has such grand virtualization claims. 3PAR of course doesn't limit you in this manor, you can run multiple RAID levels on the same enclosure, you can even run multiple RAID levels on the same DISK, it is that virtualized.
  • Inefficient scale out - while scale out is probably linear, the overhead involved with so many iSCSI sessions with so many arrays has to have some penalty. Ideally what I'd like to see is at least some sort of optional Infiniband connectivity between the controllers to give them higher bandwidth, lower latency, and then do like 3PAR does - traffic can come in on any port, and routed to the appropriate active controller automatically. But their tiny controllers probably don't have the horsepower to do that anyways.

There might be more but those are the top offenders at the top of my list. One part of the presentation which I didn't think was very good was when the presenter streamed a video from the array and tested various failure scenarios.  The amount of performance capacity needed to transfer a video under failure conditions of a storage array is a very weak illustration on how seamless a failure can be. Pulling a hard disk out, or a disk controller or a power supply, really is trivial. To the uninformed I suppose it shows the desired effect(or lack of) though which is why it's done. A better test I think would be running something like IO Zone on the array and showing the real time monitoring of IOPS and latency when doing failure testing(preferably with at least 45-50% of the system loaded).

You never know what you're missing until you don't have it anymore. You can become complacent in what you have as being "good enough" because you don't know any better. I remember feeling this especially strongly when I changed jobs a few years ago, and I went from managing systems in a good tier 4 facility to another "tier 4" facility which had significant power issues(at least one major outage a year seemed like). I took power for granted at the first facility because we had gone so many years without so much as a hiccup. It's times like this I realize (again) the value that 3PAR storage brings to the market and am very thankful that I can take advantage of it.

What I'd like to see though is some SPC-1 numbers posted for a rack of EqualLogic arrays. They say it is enterprise ready, and they talk about the clouds surrounding iSCSI. Well put your money where your mouth is and show the world what you can do with SPC-1.

TechOps Guy: Nate

Comments (9) Trackbacks (2)
  1. Nate –

    GREAT POST! You’re right, there is nothing on the storage market today quite like 3par! If I ever switch jobs, I may make having 3par on the floor a requirement in my job search!

  2. Nate do you know what the parity frequency is on the Equallogic? I looked in to it a few years ago and them were using Raid 5 Parity 3 meaning every 3rd write is a parity bit. I’m wondering if this is still true or have they moved to RAID 5 parity 5 / RAID 5 parity 9. The impact on usable capacity put me off of them for a period.

  3. I don’t know, I assumed it was adjustable, but if not, that wouldn’t be very good either.

  4. My exposure to storage systems is somewhat limited and I haven’t had the pleasure to use the 3PAR ones, however I have got a Dell/EMC Clariion FC setup and also a second tier of Equallogic SANs for a vSphere environment.

    There is much conflicting information around the Active/Passive nature of the Equallogic SAN, and its not that cut and dry. While the controllers are Active/Passive, each controller has two active Ethernet ports so multipathing is still possible, and as each tray has its own set of controllers you’re unlikely to ever overload the network with 16 spindles, specially not with the newer 10gig iSCSI PS6010 series.

    Storage Pools, most certainly availably, surprised they never mentioned it.

    The hot spares could be managed better, I mean even HP can do it right. The only saving grace is that when you add two or more of these together they only require 1 disk per tray as opposed to two if you only have one tray.

    I’m also pretty sure you can choose your raid level per Volume/LUN, at least the option is there but I’ve never tried it, have been meaning to use this to further create additional tiers of storage.

    I’m just about to take receipt of some of the new Hybrid SSD/SAS PS6010XVS ones, pity I had not looked further as the 3PAR as you’ve described it sounds neat.

    Very nice post by the way, thanks!

  5. a fair few biased comments towards HP :-) the 3PAR can certainly scale larger, but depends on what size business you’re spec’ing this for – for up to around 1PB an equallogic is more than sufficient and can perform well IF, like any san, it is spec’d according to requirements (to achieve space, iops etc). eql can and does do storage pooling, unsure how you heard otherwise. eql controllers are more like active/standby – dual ports on each controller, so active/active from the hosts perspective, but still with redundancy of another controller. Your narrow striping comment is irrelevant, as the 3par splits into either 256MB or 1GB “chunklets”, (depending on model) the eql into 15MB page sizes – and if you have a nice mix of ssd/sas/sata then auto-tiering will give you the best performance by moving those pages around just like the 3par – but with no additional license needed, as 3par is licensed per TB for this – which is plain stupid. your comment about raid only being able to be done on a whole shelf for eql – correct, at a hardware level – for data protection, then luns themselves can do virtual raid or just leave it automated and let it work out the best for you. just like the 3par, so another irrelevant comment. equallogic are also frameless, in that every member you add, you double the bandwidth to it. 3par by comparison still works on several controllers and daisy chains the storage off them. eql has a group (virtual) ip where by the host (with appropriate mpio s/w installed) can effectively “talk” with the equallogic and know which physical interface/ip it needs to connect to in order to get the block of storage. it by no means implies that there is an iscsi session established to every san IP at all times, they are dynamically established as needed. in saying all this though, the new 3par range supposedly to be announced around early december 2012 (cut down more afforable model for mid market) is going to be interesting – i can’t wait to see how it compares price-wise. if it comes in at the same/similar price as equallogic for equivalent storage capacity and iops, we’ll be going 3par, if not then the equallogic will get the tick of approval.

  6. thanks for the comment! Yes I agree they are different solutions for different purposes. I believe most people think Dell acquired Compellent to make up for the serious shortfalls of Equallogic. I believe Compellent is a nice solution by contrast. From what I have heard from at least former Equallogic employees(as well as that one guy at Dell during the presentation note this was ~2 years ago) that it is very uncommon to stripe across more than four shelves.

    I’ve never bought into auto tiering myself (even on 3PAR – I rag on them all the time about auto tiering I don’t believe it is adequate solution). I want to see fully integrated write back SLC SSD caching, as well as of course the read caching. It needs to be cache, not tiering, key point is real time. Heavy emphasis on write support. I confronted HP/3PAR’s own David Scott last year on that topic in a conference hall – but of course did not get an adequate response. Compellent handles the write portion in an acceptable manor (something I wasn’t aware of until earlier this year), though it technically tiering. The concept of sending all new writes to the highest tier by default – it might as well be called a write cache. The other example I hammer on HP/3PAR about is EMC Fast Cache.

    Other things like the Fusion IO cache system, NetApp’s cache PAM cards, and HP’s announced host based caching for Gen8 servers to 3PAR aren’t adequate either as they are all read caches only. Same goes for GridIron’s SAN SSD/RAM cache system – caches reads only. My current workload by contrast is roughly 92% write. Though my biggest problems with storage performance has been a random workload coming in from nowhere and blowing out the caches.

    Can you describe what this virtual raid is and the automatic vs non automatic? Or if there is a nice white paper on the topic I’d be interested to see as well.

    If a volume is striped across multiple enclosures would you not need an iSCSI session at all times ? Unless you have no I/O activity, but even without any activity I’d imagine the benefit of keeping the session up (to avoid the latency hit of establishment at the instant it is needed) would mean it’d be a good idea to keep that session established.

    Yeah I’m pretty excited about what is coming out of HP/3PAR as well, I was at some in depth briefings on the new stuff about a while back, and they recently briefed their close partners on what is coming. Can’t talk about yet of course, even though I want to so much!

    Nimble Storage seems aimed squarely at EQL, who seems to be an easy target. EQL had it’s place in the low end of the market, Compellent replaced them in Dell’s portfolio for anything of reasonable scale (can’t imagine anyone running 1PB of EQL), and Nimble is quite a competitor on the low end of the market with EQL.

    thanks again

  7. hey again, the thing i don’t like about equallogic is its sparing – allocating 2 x hotspares typically per member, as opposed to reserving that space across all existing disks and therefore using all spindles at all times – but, eh its a minor set back for a pretty good san, at least for the price – and all features included. the raid policy is set per member, and is fixed per member except the hybrid arrays (eg: where you can mix ssd and sas – unsure whether you can vary the raid types between though, recommendation is raid6 accelerated) – however the raid is not set across all disks on that member, for example a 14 disk drive array configures 2 x RAID 5 sets, with six disks in each set, and two spare disks, and striped across the sets. It varies depending on RAID type and model (I assume bench marked at dell labs for the optimal setup) when i said virtual raid i mean’t volume raid – sorry for the confusion – it simply means that when you have a multi-member group, say with different raid sets across each member (based on the performance you have tier’d for) then when you create a lun (volume) you can leave the volume raid as automatic and have the eql choose which members to put that volume on for optimal performance, or select say raid10, it will try and balance that volume across members with raid10 configured on them. generally left at automatic and have eql algorithms work out the smarts is the easiest i’ve found. in regards to mutiple iscsi sessions, thats controlled by the host (initiator), not the san (target) – although, with say vmware and running mem (multipath extension module) which provides psa (pluggable storage architecture) in the vmkernel – in lamens terms, it makes the vmware hosts aware of what member to talk too (and therefore initiate an iscsi session) in order to get the data that it requires.

    interesting in terms of your r/w ratio – you are far from the norm with 92% writes. generally its around 70% reads / 30% writes, so this is perhaps where the 3par will benefit due to its powerful asic’s. i guess you need to be aware that 3par and equallogic are very different architectures – the 3par comes with 2 or 4 controllers (or more?) and then each “disk tray” obviously scales up in terms of storage. therefore, the asic’s on each controller (or an aggregate of all controllers) needs to be powerful enough to cope with the worst case scenario – that is a fully loaded SAN at maximum capacity. On the contrary, each equallogic member (or “disk tray”) has dual controllers built into it, therefore each set of controllers only need to be sufficiently spec’d to provide the throughput needed for the disks in that tray. This is perhaps why the 3PAR is more expensive, you’re paying for a really grunty controller up front. I hope that makes sense?

    so yeah, both vendors have their benefits and there drawbacks – one solution won’t fit all so i guess you just gotta weigh up the cost vs benefits.

  8. just a little update, if you are installing an equallogic san this article is a must read… http://www.equallogic.com/WorkArea/DownloadAsset.aspx?id=10799