Storage « TechOpsGuys.com

June 15, 2010

Next generation SSD

Filed under: News,Storage — Nate @ 4:04 am

Another interesting article from our friends at The Register. This one talking about a new startup which is promising SLC-like peformance and reliability for MLC-like prices.

[..]

says the 200GB product has a five-year endurance at 2TB/day write data and the 400GB model a five-year endurance at 4TB/day. This is with random, non-compressible data.

[..]

Genesis has a 3Gbit/s SATA interface and has a 30,000 random read IOPS rating (4KB blocks), and a 20,000 random write IOPS rating. It provides 180MB/s sustained write and 220MB/s sustained read bandwidth.

Certainly looks interesting, not nearly as fast (or reliable) as say Fusion IO SLC or MLC for that matter, but probably a bit cheaper too.

Comments Off

May 28, 2010

That’s not a knife…

Filed under: Datacenter,Storage — Tags: container, icecube, sgi — Nate @ 9:10 pm

There’s been a lot of talk (no thanks to Cisco/EMC) about infrastructure blocks recently. Myself I never (and still don’t) like the concept. I think it makes sense in the SMB world where you have very limited IT staff and they need a canned, integrated solution. Companies like HP and IBM have been selling these sorts of mini stacks for years. As for Microsoft I think they have a “Small business” version of their server platform which includes a bunch of things integrated together as well.

I think the concept falls apart at scale though, I’m a strong believer in best of breed technologies, and what is best of breed really depends on the requirements of the organization. I have my own favorites of course for the industries I’ve been working with/in for the past several years but I know they don’t apply to everyone.

I was reading up yesterday on some new containerized data centers that SGI released in their Ice Cube series. The numbers are just staggering.

In their most dense configuration, in 320 square feet of space consuming approximately 1 megawatt of power you can have either:

More then 45,000 CPU cores
More than 29 Petabytes of storage

In both cases you can get roughly 45kW per rack, while today most legacy data centers top out at between 2-5kW per rack.

Stop and think about that for a minute, think about the space, think about the density. 320 square feet is smaller than even a studio apartment,, though in Japan it may be big enough to house a family of 10-12 (I hear space is tight over there).

How’s that for an infrastructure block? And yes you can stack one on top of another

ICE Cube utilizes an ISO standard commercially available 9.5′ x 8′ x 40′ container. SGI intentionally designed the offering such that the roof of the container is clear of obstruction and fully capable of utilizing its stacking container feature. Because of this, SGI is positioned to supply a compelling density multiplier for future expansion of the data center. If installed in a location without overhead height restriction the 9.5′ x 8′ x 40′ containers in our primary product offering can be stacked up to three-high, thus allowing customers to double or triple the per square foot density of the facility over the already industry-leading density of a single ICE Cube.

All of this made me think of a particular scene from a ’80s movie.

Really makes these other blocks some vendors are talking about sound like toys by comparison doesn’t it.

Comments (2)

April 14, 2010

First SPC-1 Numbers with automagic storage tiering

Filed under: News,Storage — Tags: benchmarks, ibm, spc-1, storagetiering — Nate @ 8:38 am

IBM recently announced that they are adding an “easy tier” of storage to some of their storage systems. This seems to be their form of what I have been calling automagic storage tiering. They are doing it at the sub LUN level in 1GB increments. And they recently posted SPC-1 numbers for this new system, finally someone posted numbers.

Configuration of the system included:

1 IBM DS8700
96 1TB SATA drives
16 146GB SSDs
Total ~100TB raw space
256GB Cache

Performance of the system:

32,998 IOPS
34.1 TB Usable space

Cost of the system:

$1.58 Million for the system
$47.92 per SPC-1 IOP
$46,545 per usable TB

Now I’m sure the system is fairly power efficient given that it only has 96 spindles on it, but I don’t think that justifies the price tag. Just take a look at this 3PAR F400 which posted results almost a year ago:

384 disks, 4 controllers, 24GB data cache
93,050 SPC-1 IOPS
26.4 TB Usable space (~56TB raw)
$548k for the system (I’m sure prices have come down since)
$5.89 per SPC-1 IOP
$20,757 per usable TB

The system used 146GB disks, today the 450GB disks seem priced very reasonably, I would opt for those instead and get the extra space for not much of a premium.

Take a 3PAR F400 with 130 450GB 15k RPM disks, that would be about 26TB of usable space with RAID 1+0 (the tested configuration above is 1+0). That would give about 33.8% of the performance of the above 384-disk system, so say 31,487 SPC-1 IOPS, very close to the IBM system and I bet the price of the 3PAR would be close to half of the $548k above (taking into account the controllers in any system are a good chunk of the cost). 3PAR has near linear scalability making extrapolations like this possible and accurate. And you can sleep well at night knowing you can triple your space/performance online without service disruption.

Note of course you can equip a 3PAR system with SSD and use automagic storage tiering as well, they call it Adaptive Optimization, if you really wanted to. The 3PAR system moves data around in 128MB increments by contrast.

It seems the cost of the SSDs and the massive amount of cache IBM dedicated to the system more than offset the benefits of using lower cost nearline SATA disks in the system. If you do that, what’s the point of it then?

So consider me not impressed with the first results of automagic storage tiering. I expected significantly more out of it. Maybe it’s IBM specific, maybe not, time will tell.

Comments (4)

April 2, 2010

Grid Iron decloaks

Filed under: News,Storage — Tags: flash, gridiron, ssd — Nate @ 10:30 am

Grid Iron Systems seems to have left stealth mode somewhat recently, they are another start up that makes an accelerator appliance that sits in between your storage and your server(s). Kind of what Avere does on the NAS side, Grid Iron does on the SAN side with their “TurboCharger“.

Certainly looks like an interesting product but it appears they make it “safe” by making it cache only reads, I want a SSD system that can cache writes too! (yes I know that wears the SSDs out faster I’m sure, but just do warranty replacement). I look forward to seeing some SPC-1 numbers on how Grid Iron can accelerate systems, at the same time I look forward to SPC-1 numbers on how automatic storage tiering can accelerate systems as well.

I’d also be interested in seeing how Grid Iron can accelerate NetApp systems vs using NetApp’s own read-only PAM (since Grid Iron specifically mentions NetApp in their NAS accelerator, although yes I’m sure they just used NetApp as an example).

Comments (1)

March 26, 2010

Enterprise EqualLogic

Filed under: Storage — Tags: dell, equallogic, iscsi — Nate @ 6:33 am

So, I attended that Dell/Denali event I mentioned recently. They covered some interesting internals on the architecture of Exchange 2010. Covering technical topics like migrating to it, how it protects data, etc. It was interesting from that standpoint, they didn’t just come out and say “Hey we are big market leader you will use us, resistance is futile”. So I certainly appreciated that although honestly I don’t really deal with MS stuff in my line of work, I was just there for the food and mainly because it was walking distance(and an excuse to get out of the office).

The other topic that was heavily covered was on Dell EqualLogic storage. This I was more interested in. I have known about EqualLogic for years, and never really liked their iSCSI-only approach(I like iSCSI but I don’t like single protocol arrays, and iSCSI is especially limiting as far as extending array functionality with other appliances, e.g. you can optionally extend a Fiber channel only array with iSCSI but not vise versa – please correct me if I’m wrong.)

I came across another blog entry last year which I found extremely informative – “Three Years of EqualLogic” which listed some great pros and some serious and legimate cons to the system after nearly three years of using it.

Anyways, being brutally honest if there is anything I did really “take away” from the conference with regards to EqualLogic storage it is this – I’m glad I chose 3PAR for my storage needs(and thanks to my original 3PAR sales rep for making the cold call many years ago to me. I knew him from an earlier company..).

So where to begin, I’ve had a night to sleep on this information and absorb it in a more logical way, I’ll start out with what I think are the pros to the EqualLogic platform:

Low cost – I haven’t priced it personally but people say over and over it’s low cost, which is important
Easy to use – It certainly looks very easy to use, very easy to setup, I’m sure they could get 20TB of EqualLogic storage up running in less time than 3PAR could do it no doubt.
Virtualized storage makes it flexible. It pales in comparison to 3PAR virtualization but it’s much better than legacy storage in any case.
All software is included – this is great too, no wild cards with licensing. 3PAR by contrast heavily licenses their software and at times it can get complicated in some situations(their decision to license the zero detection abilities of their new F/T class arrays was a surprise to me)

So it certainly looks fine for low(ish) cost workgroup storage, one of the things the Dell presenter tried to hammer on is how it is “Enterprise ready”. And yes I agree it is ready, lots of enterprises use workgroup storage I’m sure for some situations(probably because their real legacy enterprise storage is too expensive to add more applications to, or doesn’t scale to meet mixed workloads simultaneously).

Here’s where I get down & dirty.

As far as really ready for enterprise storage – no way it’s not ready, not in 2010, maybe if it was 1999.

EqualLogic has several critical architectural deficiencies that would prevent me from wanting to use it or advising others to use it:

Active/passive controller design – I mean come on, in 2010 your still doing active/passive? They tried to argue the point where you don’t need to “worry” about balancing the load between controllers and then losing that performance when a controller fails. Thanks, but I’ll take the extra performance from the other active controller(s)[with automagic load balancing, no worrying required], and keep performance high with 3PAR Persistant Cache in the event of a controller failure(or software/hardware upgrade/change).
Need to reserve space for volumes/snapshots. Hello, 21st century here, we have the technology for reservationless systems, ditching reservations is especially critical when dealing with thin provisioning.
Lack of storage pools. This compounds the effects of a reservation-based storage system. Maybe EqualLogic has storage pools, I just did not hear it mentioned in the conference nor anywhere else. Having to reserve space for each and every volume is just stupidly inefficient. At the very least you should be able to reserve a common pool of space and point multiple volumes to it to share. Again hints to their lack of a completely virtualized design. You get a sense that a lot of these concepts were bolted on after the fact and not designed into the system when you run into system limitations like this.
No global hot spares – so the more shelves you have the more spindles are sitting there idle, doing nothing. 3PAR by contrast does not use dedicated spares, each and every disk in the system has spare capacity on it. When a RAID failure occurs the rebuild is many:many instead of many:one. This improves rebuild times by 10x+. Also due to this design, 3PAR can take advantage of the I/O available on every disk on the array. There aren’t even dedicated parity disks, parity is distributed evenly across all drives on the system.
Narrow striping. They were talking about how the system distributes volumes over all of the disks in the system. So I asked them how far can you stripe say a 2TB volume? They said over all of the shelves if you wanted to, but there is overhead from iSCSI because apparently you need an iSCSI session to each system that is hosting data for the volume, due to this overhead they don’t see people “wide striping” of a single volume over more than a few shelves. 3PAR by contrast by default stripes across every drive in the system, and the volume is accessible from any controller(up to 8 in their high end) transparently. Data moves over an extrenely high speed backplane to the controller that is responsible for those blocks. In fact the system is so distributed that it is impossible to know where your data actually is(e.g. data resides on controller 1 so I’ll send my request to controller 1), and the system is so fast that you don’t need to worry about such things anyways.
Cannot easily sustain the failure of a whole shelf of storage. I asked the Dell rep sitting next to me if it was possible, he said it was but you had to have a special sort of setup, it didn’t sound like it was going to be something transparent to the host, perhaps involving synchrnous replication from one array to another, in the event of failure you probably had to re-point your systems to the backup, I don’t know but my point is I have been spoiled by 3PAR in that by default their system uses what they call cage level availability, which means data is automatically spread out over the system to ensure a failure of a shelf does not impact system availability. This requires no planning in advance vs other storage systems, it is automatic. You can turn it off if you want as there are limitations as far as what RAID levels you can use depending no the number of shelves you have (e.g. you cannot run RAID 5 with cage level availability with only 2 shelves because you need at least 3), the system will prevent you from making mistakes.
One RAID level per array(enclosure) from what the Dell rep sitting next to me said. Apparently even on their high end 48-drive arrays you can only run a single level of RAID on all of the disks? Seems very limiting for a array that has such grand virtualization claims. 3PAR of course doesn’t limit you in this manor, you can run multiple RAID levels on the same enclosure, you can even run multiple RAID levels on the same DISK, it is that virtualized.
Inefficient scale out – while scale out is probably linear, the overhead involved with so many iSCSI sessions with so many arrays has to have some penalty. Ideally what I’d like to see is at least some sort of optional Infiniband connectivity between the controllers to give them higher bandwidth, lower latency, and then do like 3PAR does – traffic can come in on any port, and routed to the appropriate active controller automatically. But their tiny controllers probably don’t have the horsepower to do that anyways.

There might be more but those are the top offenders at the top of my list. One part of the presentation which I didn’t think was very good was when the presenter streamed a video from the array and tested various failure scenarios.Â The amount of performance capacity needed to transfer a video under failure conditions of a storage array is a very weak illustration on how seamless a failure can be. Pulling a hard disk out, or a disk controller or a power supply, really is trivial. To the uninformed I suppose it shows the desired effect(or lack of) though which is why it’s done. A better test I think would be running something like IO Zone on the array and showing the real time monitoring of IOPS and latency when doing failure testing(preferably with at least 45-50% of the system loaded).

You never know what you’re missing until you don’t have it anymore. You can become complacent in what you have as being “good enough” because you don’t know any better. I remember feeling this especially strongly when I changed jobs a few years ago, and I went from managing systems in a good tier 4 facility to another “tier 4” facility which had significant power issues(at least one major outage a year seemed like). I took power for granted at the first facility because we had gone so many years without so much as a hiccup. It’s times like this I realize (again) the value that 3PAR storage brings to the market and am very thankful that I can take advantage of it.

What I’d like to see though is some SPC-1 numbers posted for a rack of EqualLogic arrays. They say it is enterprise ready, and they talk about the clouds surrounding iSCSI. Well put your money where your mouth is and show the world what you can do with SPC-1.

Comments (11)

March 11, 2010

Panasas NFS performance posted

Filed under: Storage — Tags: benchmarks, nfs, panasas, specsfs — Nate @ 5:48 pm

I have heard of Panasas on occasion and for some reason recently I saw a story or a link to them so I decided to poke around to see what they do. I like technology..

Anyways I was shocked to see their system design. I mean I’ve seen systems like Isilon and Xiotech and Pillar who have embedded controllers in each of their storage shelves, this is an interesting concept for boosting performance though given the added complexity and stuff to each shelf I imagine can boost the costs by quite a bit too I don’t know.

But Panasas has taken it to an even further extreme, putting a disk controller for every two disks in the system! I mean I’m sure it’s great for maximum performance but wow, it just seems like such a massive overkill(which can be good for certain apps I’m sure). I was/am still shocked 🙂

So today I was poking around again at the latest SPEC SFS results for NFS, and saw they posted some numbers finally.

Fairly impressive numbers but I just can’t get past the number of CPUs they are using. They posted 77,137 IOPS with 160 disks hosting NAS data (80 SATA and 80 SSD). They used a total of 110 Intel CPUs (80 1.5Ghz Celerons and 30 1.8Ghz Pentium Ms) and 440 gigabytes ofÂ RAM cache.

By contrast, Avere which I posted about recently (never used their stuff, never talked to them before), posted 131,591 IOPS with 72 disks hosting NAS data(48 15k SAS, 24 SATA), 14 Intel CPUs(2.5Ghz quad core, so 56 cores) and 423 gigabytes of RAM cache. This is on a 6-node cluster. This Avere configuration is not using SSD (they released an SSD version since these results were posted)

The bar certainly is being raised by these players implementing massive caches. NetApp showed off some pretty impressive numbers as well with their PAM last year, more than 500GB of cache(PAM is a read cache only) though again not nearly as effective as Avere since they came in at 60,507 IOPS with 56 15k RPM disks.

Comments (6)

March 4, 2010

Dell/Denali Servers/Storage luncheon March 25th

Filed under: Events,Storage — Nate @ 11:06 am

Been a while since I posted an event, but if your looking for new servers/storage for your Exchange setup this event may be a good excuse to get away from work for a while.

Choose the Right Storage Solution for your Microsoft Exchange Environment

ThursdayÂ March 25^th, 2010

11:30am â€“ 1:30pm

El Gaucho

City Center Plaza

450 108th Ave NE

Bellevue, WA 98004

Join us for a complimentary technical seminar and learn how the Dell EqualLogic PS Series storage solution and Microsoft Exchange, deployed on Dell PowerEdge servers can deliver[..]

Myself I don’t expect to learn anything, andÂ 3PAR storage can run exchange for a large number of users(from these numbers you could extrapolate a max of 192,000 mailboxes on a single storage system each with a heavy I/O profile), so not really in the market for some Equallogic storage. BUT I like to get away, especially if it’s local. I do find it curious that the event is specifically about Exchange, that is the mindset of dedicated storage to a particular application. When the industry trend seems to be leaning towards storage that is shared amongst many applications. Given that Microsoft doesn’t appear to be an event sponsor, I find this idea curious.

Thought this was interesting as well, Microsoft recommends RAID 1 for Exchange but (from one of the links above)..

Internal tests performed by 3PAR show that using RAID 5 (7+1)â€”i.e., seven data blocks per parity blockâ€”demonstrated that the same simulated Exchange workload used for Exchange 2007 ESRP testing had disk latencies that were higher than RAID 1 but well within Microsoftâ€™s recommendations[..]

Going from RAID 1+0 to RAID 5+0 (7+1) is a pretty dramatic shift, showing how fast their “Fast” RAID is, and of course if you find out you laid data out incorrectly you can fix it on the fly. I wonder what Dell will say about their stuff.

Comments (1)

March 2, 2010

Avere front ending Isilon

Filed under: Storage — Tags: avere, isilon, netapp, nfs — Nate @ 1:21 pm

UPDATED

How do all these cool people find our blog? A friendly fellow from Isilon commented that apparently the article from The Register isn’t accurate in that Avere is front ending NetApp gear not Isilon. But in any case I have been thinking about Avere and the Symantec stuff off and on recently anyways.. END UPDATE

A really interesting article over at The Register about how Sony has deployed an Avere cluster(s) to front end their Isilon(and perhaps other) gear too. A good quote:

The thing that grabs your attention here is that Avere is being used to accelerate some of the best scale-out NAS on the planet, not bog standard filers with limited scalability.

Avere certainly has some good performance metrics(pay attention to the IOPS per physical disk), and more recently they introduced a model that run on top of SSD, I haven’t seen any performance results for it yet but I’m sure it’s a significant boost. As The Register mentions in their article if this technology really is good enough for this purpose it has the potential(of course) to be extremely disruptive in the industry, wrecking havoc with many of the remaining (and very quickly dwindling) smaller scale out NAS vendors. Kind of funny really seeing how Isilon spun the news.

From Avere’s site, in talking about comparing Spec SFS results:

A comparison of these results and the number of disks required shows that Avere used dramatically fewer disks. BlueArc used 292 disks to achieve 146,076 ops/sec with 3.34 ms ORT. Exanet used 592 disks to achieve 119,550 ops/sec with 2.07ms ORT (overall response time). HP used 584 disks to achieve 134,689 ops/sec and 2.53 ms ORT. Huawei Symantec used 960 disks to achieve 176,728 ops/sec with 1.67ms ORT. NetApp used 324 disks to achieve 120,011 ops/sec with 1.95ms ORT. By contrast, Avere used only 79 drives to achieve 131,591 ops/sec with 1.38ms ORT. Doing a little math, Avere achieves 3.3, 8.2, 7.2, 9.0, and 4.5 times more ops/sec per disk used than the other vendors.

Which got me thinking again, Symantec last year released a Filestore product, my friends over at 3PAR were asking me if I was interested in it. To-date I have not been because the only performance numbers released to-date have been not very efficient. And it’s still a new product so who knows how well it works in the real world, granted that Symantec does have a history of file systems with their Norton File System (NFS) product.

Unfortunately there isn’t much technical info on the Filestore product on their web site.

Built to run on commodity servers and most storage arrays, FileStore is an incredibly simple-to-install soft appliance. This combination of low-cost hardware, “pay as you grow” scalability and easy administration give FileStore a significant cost advantage over specialized appliances. With support for both SAN and iSCSI storage, FileStore delivers the performance needed for the most demanding applications.

It claims N-way active-active or active-passive clustering, up to 16 nodes in a cluster, up to 2PB of storage and 200 million files per file system. Which for most people is more than enough. I don’t know how it is licensed though or how well it scales on a single node, could it run on a aforementioned 48-all-round system?

Where does 3PAR fit into this? Well Symantec was the first company(so far the only one that I know of) to integrate Thin Reclamation into their file system, which integrates really well with 3PAR arrays at least. The file system uses some sort of SCSI command which is passed back to the array when files are deleted/reclaimed. So that the I/O never hits the spindles, the array transparently re-maps the blocks to be available for use.

3PAR Thin Reclamation for Veritas Storage Foundation keeps storage volumes thin over time by allowing granular, automated, non-disruptive space reclamation within the InServ array. This is accomplished by communicating deleted block information to the InServ using the Thin Reclamation API. Upon receiving this information, the InServ autonomically frees this allocated but unused storage space. The thin reclamation capabilities provide environments using Veritas Storage Foundation by Symantec an easy way to keep their thin volumes thin over time, especially in situations where a large number of writes and deletes occur.

But I was thinking that you could front end one of these Filestore clusters with an Avere cluster and get some pretty flexible high performing storage.

Something I’d like myself to explore at some point.

Comments (3)

February 28, 2010

VMware dream machine

Filed under: Networking,Storage,Virtualization — Tags: 6100, amd, blades, c-class, hp, opteron, vmware — Nate @ 12:47 am

(Originally titled fourty eight all round, I like VMware dream machine more)

UPDATED I was thinking more about the upcoming 12-core Opterons and the next generation of HP c Class blades, and thought of a pretty cool configuration to have, hopefully it becomes available.

Imagine a full height blade that is quad socket, 48 cores (91-115Ghz), 48 DIMMs (192GB with 4GB sticks), 4x10Gbps Ethernet links and 2x4Gbps fiber channel links (total of 48Gbps of full duplex bandwidth). The new Opterons support 12 DIMMs per socket, allowing the 48 DIMM slots.

Why 4x10Gbps links? Well I was thinking why not.. with full height blades you can only fit 8 blades in a c7000 chassis. If you put a pair of 2x10Gbps switches in that gives you 16 ports. It’s not much more $$ to double up on 10Gbps ports. Especially if your talking about spending upwards of say $20k on the blade(guesstimate) and another $9-15k blade on vSphere software per blade. And 4x10Gbps links gives you up to 16 virtual NICs using VirtualConnect per blade, each of them adjustable in 100Mbps increments.

Also given the fact that it is a full height blade, you have access to two slots worth of I/O, which translates into 320Gbps of full duplex fabric available to a single blade.

That kind of blade ought to handle just about anything you can throw at it. It’s practically a super computer in of itself. Right now HP holds the top spot for VMark scores, with a 8 socket 6 core system(48 total cores) out pacing even a 16 socket 4 core system(64 total cores).

The 48 CPU cores will give the hypervisor an amazing number of combinations for scheduling vCPUs. Here’s a slide from a presentation I was at last year which illustrates the concept behind the hypervisor scheduling single and multi vCPU VMs:

There is a PDF out there from VMware that talks about the math formulas behind it all, it has some interesting commentary on CPU scheduling with hypervisors:

[..]Extending this principle, ESX Server installations with a greater number of physical CPUs offer a greater chance of servicing competing workloads optimally. The chance that the scheduler can find room for a particular workload without much reshuffling of virtual machines will always be better when the scheduler has more CPUs across which it can search for idle time.

This is even cooler though, honestly I can’t pretend to understand the math myself! –

Scheduling a two-VCPU machine on a two-way physical ESX Server hosts provides only one possible allocation for scheduling the virtual machine. The number of possible scheduling opportunities for a two-VCPU machine on a four-way or eight-way physical ESX Server host is described by combinatorial mathematics using the formula N! / (R!(N-R)!) where N=the number of physical CPUs on the ESX Server host and R=the number of VCPUs on the machine being scheduled.1 A two-VCPU virtual machine running on a four-way ESX Server host provides (4! / (2! (4-2)!) which is (4*3*2 / (2*2)) or 6 scheduling possibilities. For those unfamiliar with combinatory mathematics, X! is calculated as X(X-1)(X-2)(X-3)â€¦. (X- (X-1)). For example 5! = 5*4*3*2*1.

Using these calculations, a two-VCPU virtual machine on an eight-way ESX Server host has (8! / (2! (8-2)!) which is (40320 / (2*720)) or 28 scheduling possibilities. This is more than four times the possibilities a four-way ESX Server host can provide. Four-vCPU machines demonstrate this principle even more forcefully. A four-vCPU machine scheduled on a four-way physical ESX Server host provides only one possibility to the scheduler whereas a four-VCPU virtual machine on an eight-CPU ESX Server host will yield (8! / (4!(8-4)!) or 70 scheduling possibilities, but running a four-vCPU machine on a sixteen-way ESX Server host will yield (16! / (4!(16-4)!) which is (20922789888000 / ( 24*479001600) or 1820 scheduling possibilities. That means that the scheduler has 1820 unique ways in which it can place the four-vCPU workload on the ESX Server host. Doubling the physical CPU count from eight to sixteen results in 26 times the scheduling flexibility for the four-way virtual machines. Running a four-way virtual machine on a Host with four times the number of physical processors (16-way ESX Server host) provides over six times more flexibility than we saw with running a two-way VM on a Host with four times the number of physical processors (8-way ESX Server host).

Anyone want to try to extrapolate that and extend it to a 48-core system? 🙂

It seems like only yesterday that I was building DL380G5 ESX 3.5 systems with 8 CPU cores and 32GB of ram, with 8x1Gbps links thinking of how powerful they were. This would be six of those in a single blade. And only seems like a couple weeks ago I was building VMware GSX systems with dual socket single core systems and 16GB ram..

So, HP do me a favor and make a G7 blade that can do this, that would make my day! I know fitting all of those components on a single full height blade won’t be easy. Looking at the existingÂ BL685c blade, it looks like they could do it, remove the internal disks(who needs em, boot from SAN or something), and put an extra 16 DIMMs for a total of 48.

I thought about using 8Gbps fiber channel but then it wouldn’t be 48 all round 🙂

UPDATE Again I was thinking about this and wanted to compare the costs vs existing technology. I’m estimating roughly a $32,000 price tag for this kind of blade and vSphere Advanced licensing (note you cannot use Enterprise licensing on a 12-core CPU, hardware pricing extrapolated from existing HP BL685G6 quad socket 6 core blade system with 128GB ram). The approximate price of an 8-way 48-core HP DL785 with 192GB, 4x10GbE and 2x4Gb Fiber with vSphere licensing comes to about roughly $70,000 (because VMWare charges on a per socket basis the licensing costs go up fast). Not only that but you can only fit 6 of these DL785 servers in a 42U rack, and you can fit 32 of these blades in the same rack with room to spare. So less than half the cost, and 5 times the density(for the same configuration). The DL785 has an edge in memory slot capacity, which isn’t surprising given its massive size, it can fit 64 DIMMs vs 48 on my VMware dream machine blade.

Compared to a trio of HP BL495c blades each with 12 cores, and 64GB of memory, approximate pricing for that plus advanced vSphere is $31,000 for a total of 36 cores and 192GB of memory. So for $1,000 more you can add an extra 12 cores, cut your server count by 66%, probably cut your power usage by some amount and improve consolidation ratios.

So to summarize, two big reasons for this type of solution are:

More efficient consolidation on a per-host basis by having less “stranded” resources
More efficient consolidation on a per-cluster basis because you can get more capacity in the 32-node limit of a VMware cluster(assuming you want to build a cluster that big..) Again addressing the “stranded capacity” issue. Imagine what a resource pool could do with 3.3 Thz of compute capacity and 9.2TB of memory? All with line rate 40Gbps networking throughout? All within a single cabinet ?

Pretty amazing stuff to me anyways.

[For reference – Enterprise Plus licensing would add an extra $1250/socket plus more in support fees. VMware support costs not included in above pricing.]

END UPDATE

Comments (5)

February 24, 2010

SSD Not ready yet?

Filed under: Storage — Nate @ 7:26 pm

SSD and storage tiering seem to be hot topics these days, certain organizations are pushing them pretty hard, though it seems the “market” is not buying the hype, or doesn’t see the cost benefit(yet).

In the consumer space SSD seems to be problematic, with seemingly wide spread firmware issues, performance issues, and even reliability issues. In the enterprise space most storage manufacturers have yet to adopt it, and I’ve yet to see a storage array that has enough oomph to drive SSD effectively(TMS units aside). It seems SSD really came out of nowhere and none of the enterprise players have systems that can drive the IOPS that SSD can drive.

And today I see news seeing that STEC stock has tanked because they yet again came out and said EMC customers aren’t buying SSD so they aren’t selling as much stuff as they thought.

With this delay in adoptionn for the enterprise space it makes me wonder if STEC will even be around in the future, HDD manufacturers, like enterprise storage companies sort of missed the boat when it came to SSD, but with such a slow adoption rate it may allow the manufacturers of spinning rust to catch up and win back the business that they lost to STEC in the meantime.

Then there’s the whole concept around automagic storage tiering at the sub volume level. It sounds cool on paper, though I’m not yet convinced on it’s effectiveness in the real world, mainly due to the delay involved in a system detecting particular hot blocks/regions and moving them to SSD, maybe by the time they are moved the data is no longer needed. I’ve not yet talked with someone with real world experience with this sort of thing, so I can only speculate at this point. Compellent of course has the most advanced automagic storage tiering today, they promote it pretty heavily, I’ve only talked to one person who’s worked with Compellent and he said he specifically only recommended their gear for smaller installs. I’ve never seen SPC-1 numbers posted by Compellent so at least in my mind their implementation remains in question, while the core technology certainly sounds nice.

Coincidently, Compellent’s stock took a similar 25% hair cut recently after their earnings were released, I guess expectations were too high.

I’d like to see a long running test, along the lines of what NetApp submitted for SPC-1, for the same array, two tests, one with automagic storage tiering turned on, the other without, and see the difference. I’m not sure how SPC-1 works internally, if it is a suitable test to illustrate automagic storage tiering or not, but at least it’s a baseline that can be used to compare with other systems.