So, I attended that Dell/Denali event I mentioned recently. They covered some interesting internals on the architecture of Exchange 2010. Covering technical topics like migrating to it, how it protects data, etc. It was interesting from that standpoint, they didn’t just come out and say “Hey we are big market leader you will use us, resistance is futile”. So I certainly appreciated that although honestly I don’t really deal with MS stuff in my line of work, I was just there for the food and mainly because it was walking distance(and an excuse to get out of the office).
The other topic that was heavily covered was on Dell EqualLogic storage. This I was more interested in. I have known about EqualLogic for years, and never really liked their iSCSI-only approach(I like iSCSI but I don’t like single protocol arrays, and iSCSI is especially limiting as far as extending array functionality with other appliances, e.g. you can optionally extend a Fiber channel only array with iSCSI but not vise versa – please correct me if I’m wrong.)
I came across another blog entry last year which I found extremely informative – “Three Years of EqualLogic” which listed some great pros and some serious and legimate cons to the system after nearly three years of using it.
Anyways, being brutally honest if there is anything I did really “take away” from the conference with regards to EqualLogic storage it is this – I’m glad I chose 3PAR for my storage needs(and thanks to my original 3PAR sales rep for making the cold call many years ago to me. I knew him from an earlier company..).
So where to begin, I’ve had a night to sleep on this information and absorb it in a more logical way, I’ll start out with what I think are the pros to the EqualLogic platform:
- Low cost – I haven’t priced it personally but people say over and over it’s low cost, which is important
- Easy to use – It certainly looks very easy to use, very easy to setup, I’m sure they could get 20TB of EqualLogic storage up running in less time than 3PAR could do it no doubt.
- Virtualized storage makes it flexible. It pales in comparison to 3PAR virtualization but it’s much better than legacy storage in any case.
- All software is included – this is great too, no wild cards with licensing. 3PAR by contrast heavily licenses their software and at times it can get complicated in some situations(their decision to license the zero detection abilities of their new F/T class arrays was a surprise to me)
So it certainly looks fine for low(ish) cost workgroup storage, one of the things the Dell presenter tried to hammer on is how it is “Enterprise ready”. And yes I agree it is ready, lots of enterprises use workgroup storage I’m sure for some situations(probably because their real legacy enterprise storage is too expensive to add more applications to, or doesn’t scale to meet mixed workloads simultaneously).
Here’s where I get down & dirty.
As far as really ready for enterprise storage – no way it’s not ready, not in 2010, maybe if it was 1999.
EqualLogic has several critical architectural deficiencies that would prevent me from wanting to use it or advising others to use it:
- Active/passive controller design – I mean come on, in 2010 your still doing active/passive? They tried to argue the point where you don’t need to “worry” about balancing the load between controllers and then losing that performance when a controller fails. Thanks, but I’ll take the extra performance from the other active controller(s)[with automagic load balancing, no worrying required], and keep performance high with 3PAR Persistant Cache in the event of a controller failure(or software/hardware upgrade/change).
- Need to reserve space for volumes/snapshots. Hello, 21st century here, we have the technology for reservationless systems, ditching reservations is especially critical when dealing with thin provisioning.
- Lack of storage pools. This compounds the effects of a reservation-based storage system. Maybe EqualLogic has storage pools, I just did not hear it mentioned in the conference nor anywhere else. Having to reserve space for each and every volume is just stupidly inefficient. At the very least you should be able to reserve a common pool of space and point multiple volumes to it to share. Again hints to their lack of a completely virtualized design. You get a sense that a lot of these concepts were bolted on after the fact and not designed into the system when you run into system limitations like this.
- No global hot spares – so the more shelves you have the more spindles are sitting there idle, doing nothing. 3PAR by contrast does not use dedicated spares, each and every disk in the system has spare capacity on it. When a RAID failure occurs the rebuild is many:many instead of many:one. This improves rebuild times by 10x+. Also due to this design, 3PAR can take advantage of the I/O available on every disk on the array. There aren’t even dedicated parity disks, parity is distributed evenly across all drives on the system.
- Narrow striping. They were talking about how the system distributes volumes over all of the disks in the system. So I asked them how far can you stripe say a 2TB volume? They said over all of the shelves if you wanted to, but there is overhead from iSCSI because apparently you need an iSCSI session to each system that is hosting data for the volume, due to this overhead they don’t see people “wide striping” of a single volume over more than a few shelves. 3PAR by contrast by default stripes across every drive in the system, and the volume is accessible from any controller(up to 8 in their high end) transparently. Data moves over an extrenely high speed backplane to the controller that is responsible for those blocks. In fact the system is so distributed that it is impossible to know where your data actually is(e.g. data resides on controller 1 so I’ll send my request to controller 1), and the system is so fast that you don’t need to worry about such things anyways.
- Cannot easily sustain the failure of a whole shelf of storage. I asked the Dell rep sitting next to me if it was possible, he said it was but you had to have a special sort of setup, it didn’t sound like it was going to be something transparent to the host, perhaps involving synchrnous replication from one array to another, in the event of failure you probably had to re-point your systems to the backup, I don’t know but my point is I have been spoiled by 3PAR in that by default their system uses what they call cage level availability, which means data is automatically spread out over the system to ensure a failure of a shelf does not impact system availability. This requires no planning in advance vs other storage systems, it is automatic. You can turn it off if you want as there are limitations as far as what RAID levels you can use depending no the number of shelves you have (e.g. you cannot run RAID 5 with cage level availability with only 2 shelves because you need at least 3), the system will prevent you from making mistakes.
- One RAID level per array(enclosure) from what the Dell rep sitting next to me said. Apparently even on their high end 48-drive arrays you can only run a single level of RAID on all of the disks? Seems very limiting for a array that has such grand virtualization claims. 3PAR of course doesn’t limit you in this manor, you can run multiple RAID levels on the same enclosure, you can even run multiple RAID levels on the same DISK, it is that virtualized.
- Inefficient scale out – while scale out is probably linear, the overhead involved with so many iSCSI sessions with so many arrays has to have some penalty. Ideally what I’d like to see is at least some sort of optional Infiniband connectivity between the controllers to give them higher bandwidth, lower latency, and then do like 3PAR does – traffic can come in on any port, and routed to the appropriate active controller automatically. But their tiny controllers probably don’t have the horsepower to do that anyways.
There might be more but those are the top offenders at the top of my list. One part of the presentation which I didn’t think was very good was when the presenter streamed a video from the array and tested various failure scenarios. The amount of performance capacity needed to transfer a video under failure conditions of a storage array is a very weak illustration on how seamless a failure can be. Pulling a hard disk out, or a disk controller or a power supply, really is trivial. To the uninformed I suppose it shows the desired effect(or lack of) though which is why it’s done. A better test I think would be running something like IO Zone on the array and showing the real time monitoring of IOPS and latency when doing failure testing(preferably with at least 45-50% of the system loaded).
You never know what you’re missing until you don’t have it anymore. You can become complacent in what you have as being “good enough” because you don’t know any better. I remember feeling this especially strongly when I changed jobs a few years ago, and I went from managing systems in a good tier 4 facility to another “tier 4” facility which had significant power issues(at least one major outage a year seemed like). I took power for granted at the first facility because we had gone so many years without so much as a hiccup. It’s times like this I realize (again) the value that 3PAR storage brings to the market and am very thankful that I can take advantage of it.
What I’d like to see though is some SPC-1 numbers posted for a rack of EqualLogic arrays. They say it is enterprise ready, and they talk about the clouds surrounding iSCSI. Well put your money where your mouth is and show the world what you can do with SPC-1.