Diggin' technology every day

August 28, 2009

vSphere Storage vMotion for free

Filed under: Storage,Virtualization — Tags: , , — Nate @ 2:47 pm

OK this is slightly obvious but it may not be to everyone.  Among my favorite new abiltiies in vSphere is the evaluation mode. In ESX 3.x, this evaluation mode was fairly locked down, in order to do anything you had to have a paid license, or a VAR that could give you a temporary license. But not with vSphere.

My company went through a storage array migration earlier this year, I had initially started deploying VMs on top of NFS last year on the previous array, when we got the new array all new VMs went to it directly, the old VMs hung around. My plan was basically to re-install them from scratch onto the new array so I could take advantage of the thin provisoning, I could save upwards of 90% of my space with thin provisioning so I didn’t just want to copy the data files over to the VMFS volume(our Thin provisioning is dedicate on write). With our new array came a pair of NAS heads from another company, in order to evacuate the old array I did move those data files over to the NFS side of the storage system as a holding area until I could get the time to re-install them onto VMFS volumes.

Then vSphere came out and the clouds parted. The evaluation mode was fully unlocked, every feature(that I know of) was available for use free for 60 days. After a few fairly quick tests I started migrating my production hosts as quickly as I could to them even before I had my replacement license keys, since I had 60 days to get them.  I setup an evaluation copy of vCenter, and hooked everything up. My first real exposure to vCenter. And I took the opportunity to use the free storage vMotion to migrate those VMs from the NFS data store to the VMFS data store in a “thin” way.

I don’t anticipate having or needing to use Storage vMotion often, but it’s nice to know that if I do need it, I can just fire up a new ESX system under a evaluation license, do my Storage vMotions to my heart’s content and then shut the new box down again. Since all of my systems boot from SAN, I could even do the same in-place, evacuate one system, unmap the original LUN, create a new LUN, install ESX on it, do the basics to get it configured, do the vMotions that I need, and then reboot the host, remove the new LUN, re-instate the old LUN and off we go again. Quite a few more steps but certainly worth it for me, if I only think I will need it once or twice per year.

We bought vCenter standard edition not too long ago, and I still have a couple vSphere hosts running in evaluation mode even now, in case I want to play with any of the more advanced features, I’ve done a few vMotions and stuff to shift VMs back onto freshly installed vSphere hosts. I only have 1 production ESX 3.5 system left, and it will stay 3.5 for a while at least because of the snapshot issues in vSphere.

Personally my four favorite things in vSphere are: Round Robin MPIO, ESXi boot from SAN, the Essentials license pricing, and the expanded evaluation functionality.

I really didn’t have much interest in most of the other things that VMware has been touting, I have simple requirements. And perhaps as a side effect I’ve had fewer problems then most serious ESX users I’ve talked with, have heard lots of horror stories about snapshots and VCB, and Dave here has even had some issues with zombie VMs and vMotion, others having issues with SCSI reservations etc.  I haven’t had any of that I guess because I almost never use any of that functionality. The core stuff is pretty solid, still have yet to see a system crash even. Good design? Luck? both? Other?

August 27, 2009

Know a Rockin’ SysAdmin?

Filed under: General — @ 7:36 pm

Then enter them to be SysAdmin of the Year Here

August 25, 2009

Cheap vSphere installation managable by vCenter

Filed under: Virtualization — Tags: , , — Nate @ 4:53 pm

UPDATED – I don’t mean to turn this into a Vmware blog or a storage blog as those have been almost all of my posts so far, but as someone who works for a company that hasn’t yet invested too much in vmware(was hard enough to convince them to buy any VM solution management wanted the free stuff), I wanted to point out that you can get the “basics” of vSphere in the vSphere essentials pack, what used to cost about $3k now is about $999, and support is optional. Not only that but at least in my testing a system running a “essentials” license is fully able to connect and be managed by a vCenter “standard” edition system.

I just wanted to point it out because when I proposed this scenario a month or so ago to my VAR they wanted to call VMware to talk to see if there were any gotchas, and the initial VMWare rep we talked to couldn’t find anything that said you could or could not do this specifically but didn’t believe there was anything in the product that would block you from managing an “essentials” vSphere host with a “Standard” vCenter server. But he spent what seemed like a week trying to track down a real answer but never got back to us. Then we called in again and got another person who said something similar, he couldn’t find anything that would prevent it, but apparently it’s not something that has been proposed too widely before. The quote I got from the VAR who was still confused had a note saying that you could not do what I wanted to do, but it does work. Yes we basically throw out the “free” vCenter “foundation” edition, but it’s still a lot cheaper then going with vSphere standard:

vSphere Essentials 6 CPUs = Year 1 – $999 with 1 year subscription, support on per incident basis

vSphere Standard 6 CPUs = Year 1 – $6,408 with 1 year subscription, and gold support

Unless you expect to file a lot of support requests that is.

It is true that you get a few extra things with vSphere Standard over Essentials such as “Thin provisioning” and “High availability”. In my case thin provisioning is built into the storage, so I don’t need that. And High availability isn’t that important either as for most things we have more than 1 VM running the app and load balance using real load balancers for fault tolerance(there are exceptions like DB servers etc).

Something that is kind of interesting is that the “free” vSphere supports thin provisioning, I have 11 hosts running that version with local storage at remote sites. Odd that they throw in that with the free license but not with essentials!

The main reason for going this route to me at least is at least you can have a real vCenter server and your systems managed by it, have read-write access to the remote APIs, and of course have the option of running the full hefty ESX instead of using the “thin” ESXI. Myself I prefer the big service console, I know it’s going away at some point, but I’ll use it while it’s there. I have plenty of memory to spare. A good chunk of my production ESX infrastructure is older re-purposed HP DL585G1s with 64GB of memory, they are quad processor, dual core, which makes this licensing option even more attractive for them.

My next goal is to upgrade the infrastructure to HP c-Class blades with either 6 core Opterons or perhaps 12 core when they are out(assuming availability for 2 socket systems), 64GB of memory(the latest HP Istanbul blades have 16 memory slots), 10GbE VirtualConnect and 4Gbps Fiber VirtualConnect, and upgrade to vSphere advanced.  That’ll be sometime in 2010 though. There’s no software “upgrade” path from essentials to advanced, so I’ll just re-purpose essentials to other systems, I have at least 46 sockets in servers running the “free” license as is.

(I still remember how happy I was to pay the $3500 for two socket fee a couple of years ago for ESX “Standard” edition, now it’s about 90% less on a per-socket basis for the same abilities)

UPDATE – I haven’t done extensive testing yet but during my quick tests before a more recent entry that I posted I wanted to check to see if Essentials could/would boot a VM that was thin provisioned. Since I used storage vMotion to move some VMs over, that would be annoying if it could not. And it just so happens that I already have a VM running on my one Essentials ESX host that is thin provisioned! So it appears the license just limits you on the creation of thinly provisioned virtual disks, not the usage of them, which makes sense. It would be an Oracle-like tactic to do the former. And yes I did power off the VM and power it back on today to verify.  But that’s not all – I noticed what seems to be a loop hole in vSphere’s licensing, I mention above that vSphere Essentials does not support thin provisioning, as you can see here in their pricing PDF(and there is no mention of the option in the License configuration page on the host). When I create VMs I always use the Custom option, rather than use the Typical configuration. Anyways I found out that if you use Typical when creating a VM with the Essentials license you CAN USE THIN PROVISIONING. I created the disk, enabled the option, and even started the VM (didn’t go beyond that). If you use Custom the Thin Provisioning option flat out isn’t even offered. I wasn’t expecting the VM to be able to power on. I recall testing another unrelated but still premium option, forgot which one, and when I tried to either save the configuration or power up the VM the system stopped me saying the license didn’t permit that.

August 19, 2009

Does size matter?

Filed under: Storage,Virtualization — Tags: , , — Nate @ 10:30 am

UPDATED – I’ve been a fan of VMware for what seems like more than a decade, still have my VMware 1.0.2 for Linux CD even. I just wanted to dispel a myth that ESXi has a small disk footprint. On VMware’s own site they mention the footprint being 32MB. I believe I saw another number in the ~75MB range or something at a vSphere launch event I attended a few months ago.

Not that it’s a big deal to me but it annoys me when companies spout bullshit like that. I just wanted to dispel the myth that ESXi has a small disk foot print. My storage array has thin provisoning technology and dedicates data in 16kB increments as it is written. So I can get a clear view on how big ESXi actually is.

And the number is: ~900 Megabytes for ESXi v4. I confronted a VMware rep on this number at that event I mentioned earlier and he brushed me off, saying the extra space was other required components not just the hypervisor. In the link above they compare against MS Hyper-V, they take MS’s “full stack” and perhaps compare it to their “hypervisor”(which by itself is unusuable, you need those other required components), hence my claim that their claim is a complete and totally bullshit number.

This is significantly smaller than the full ESX, which from the range of systems I have installed uses between 3-5 Gigabytes. When I was setting up the network installer for ESX I believe it required at least 25GB for vSphere, which is slightly more than ESX 3.5.  Again with the array technology despite me allocating 25GB worth of data to the volume, vSphere has only written between 3-5GB of it, so that is all that is used. But in both cases I get accurate representations of how much real space each system requires.

ESXi v3.5 was unable to boot directly from SAN so I can’t tell with the same level of accuracy how big it is, (“df” says about 200MB) but I can say that our ESXi v3.5 systems are installed on 1GB USB sticks, and the image I decompressed onto those USB sticks is 750MB(VMware-VMvisor-big-3.5.0_Update_4-153875.i386.dd), regardless, it’s FAR from 32MB or even 75MB, at best it’s 10x larger than what they claim.

So let this one rest VMWare, give it up, stop telling people ESXi has such a tiny disk footprint, because it’s NOT TRUE.

You can pry vmware from my cold dead hands, but I still want to dispel this myth on ESXi’s size.

UPDATED – I went back to my storage array again, and found something that didn’t make sense, it’s pretty heavily virtualized itself, but after consulting with the vendor it turns out the volume is in fact 900MB of written space, rather than 1.5GB that I originally posted, if you really want to know I could share the details but I don’t think that’s too important, and without knowing the terminology of their technology it wouldn’t make much sense to anyone anyways!

The first comment I got(thanks!) mentions a significant difference in size between the embedded version of ESXi and the installable(what I’m using). This could be where the confusion lies, I have not used any systems with the embedded ESXi yet(my company is mostly a Dell shop and they charge a significant premimum for the embedded ESXi and force you on a high end support contract so we decided to install it ourselves for free).

August 18, 2009

It’s not a bug, it’s a feature!

Filed under: Storage,Uncategorized,Virtualization — Tags: , , — Nate @ 5:01 pm

I must be among a tiny minority of people who have automated database snapshots moving between systems on a SAN.

Earlier this year I setup an automated snapshot process to snapshot a production  MySQL database and bring it over to QA. This runs every day, and runs fine as-is. There is another on-demand process to copy byte-for-byte the same production MySQL DB to another QA mysql server(typically run once every month or two, and runs fine too!).

I also setup a job to snapshot all of the production MySQL DBs(3 currently), and bring them to a dedicated “backup” VM which then backs up the data and compresses it onto our NFS cluster. This runs every day, and runs fine as-is.


Apparently they introduced new “intelligence” in vSphere in the storage system that tries to be smarter about what storage devices are present. This totally breaks these automated processes. Because the data on the LUN is different after I remove the LUN, delete the snapshot, create a new one, and re-present the LUN to vSphere it says HEY THERE IS DIFFERENT DATA SO I’LL GIVE IT A UNIQUE UUID (Nevermind the fact that it is the SAME LUN). During that process the guest VM loses connectivity to the original storage(of course) and does not regain connectivity because VSPHERE thinks the LUN is different so doesn’t give the VM access to it. The only fix at that point is to power off the VM, delete all of the Raw device maps, re-create all of the raw device maps and then power on the VM again. @#)!#$ No you can’t gracefully halt the guest OS because there are missing LUNs and the guest will hang on shutdown.

So I filed a ticket with vmware, the support team worked on it for a couple of weeks, escalating it everywhere, but as far as anyone could tell it’s “doing what it’s supposed to do”. And they can’t imagine how this process works in ESX 3.5 except for the fact that ESX 3.5 was more “dumb” when it came to this sort of thing.


With ESX Server 2.5, VMware is encouraging the use of raw device mapping in the following
• When SAN snapshot or other layered applications are run in the virtual machine. Raw
device mapping better enables scalable backup offloading systems using the features
inherent to the SAN.



Anyways there are a few workarounds for these processes going forward:
– Migrate these LUNs to use Software iSCSI instead of Fiber channel, there is a performance hit(not sure how much)
– Keep one/more ESX 3.5 systems around for this type of work
– Use physical servers for things that need automated snapshots

The VMWare support rep sounded about as frustrated with the situation as I was/am. He did appear to try his best, but this behavior by vSphere is just unacceptable.  After all it works flawlessly in ESX 3.5!

WAIT! This broken-ness extends to NFS as well!

I filed another support request on a kinda-sorta-similar issue a couple of weeks ago regarding NFS data stores. Our NFS cluster operates with multiple IP addresses. Many(all?) active-active NFS clusters have at least two IPs (one per controller). In vSphere it once again assigns a unique ID based on the IP address rather than the host name to identify the NFS system. As a result if I use the host name on multiple ESX servers there is a very high likelihood(pretty much guaranteed) that I will not be able to do a migration of a VM that is on NFS from one host to another, because vSphere identifies the volumes differently because they are accessing it via a different IP. And if I try to rename the volume to match what is on the other system it tells me there is already a volume named that(when there is not) so I cannot rename it. The only workaround is to hard code the IP to each host, which is not a good solution because you lose multi-node load balancing at that point. Fortunately I have a Fiber channel SAN as well and have migrated all of my VMs off of NFS onto Fiber Channel, so this particular issue doesn’t impact me. But I wanted to illustrate this same sort of behavior with UUIDs is not unique to SAN, it can easily affect NAS as well.

You may not be impacted by the NFS stuff if your NFS system is unable to serve out the same file system over multiple controller systems simultaneously. I believe most fall into this category of being limited to 1 file system per controller at any given point in time. Our NFS cluster does not have this limitation.

August 17, 2009

FCoE Hype

Filed under: Storage — Nate @ 6:41 pm

I feel like I’ve been bombarded by hype about FCoE (Fiber Channel over Ethernet) over the past five months or so, and wanted to rant a bit about it. Been to several conferences and they all seem to hammer on it.

First a little background on what FCoE is and this whole converged networking stuff that some companies are pushing.

The idea behind it is to combine Fiber Channel and traditional Ethernet networking into something that runs on a single cable. So you have two 10 Gigabit connections coming out of your server for traditional FC, as well as your networking. The HBA presents itself to the server as independent FC and Ethernet connectivity. From a 10,000 foot view it sounds like a really cool thing to have, but then you get into the details.

They re-worked the foundations of ethernet networking to be better suited for storage traffic, which is a good thing, but it simultaneously makes this new FCoE technology incompatible with all existing Ethernet switches. You don’t get a true “converged” network based on Ethernet, you can’t even use the same cabling as you can for 10GbE in many cases. You cannot “route” your storage(FC) traffic across a traditional 10GbE switch despite it running over “Ethernet”.

The way it’s being pitched for the most part is somewhat of an aggregation layer, you link your servers to a FCoE switch, and then that switch splits the traffic out, uplinking 10GbE to 10GbE upstream switches, and FC traffic to FC switches(or FC storage). So what are you left with?

  • You still need two seperate networks – one for your regular Ethernet traffic, the other for the FCoE traffic
  • You still need to do things like zone your SAN as the FCoE presents itself as Fiber Channel HBAs
  • At least right now you end up paying quite a premium for the FCoE technology, from numbers I’ve seen, mostly list pricing on both sides an FCoE solution can cost 2x more than a 10GbE+8Gb fiber channel solution(never mind that the split solution as an aggregate can get much more performance).
  • With more and more people deploying blades these days, your really not cutting much of the cable clutter with FCoE, as your cables are aggregated at the chassis level. I even saw one consultant who seemed to imply some people using cables to connect their blades to their blade chassis? He sounded very confusing. Reduce your cable clutter! Cut your cables in half! Going from four, or even six cables to two or something really isn’t much to get excited about.

What would I like to see? Let the FCoE folks keep their stuff, if it makes them happy I’m happy for them. What I’d like to see as far as this converged networking goes is more 10GbE iSCSI converged HBAs. I see that Chelsio has one for example, combines 10GbE iSCSI offload and a 10GbE NIC in one package. I have no experience with their products so don’t know how good it is/not. I’m not personally aware of any storage arrays that have 10GbE iSCSI connectivity to them, though I haven’t checked recently. But what I’d like to see as an alternative is more focus on standardized ethernet as a storage transport, rather than this incompatible stuff.

Ethernet switches are so incredibly fast these days, and cheap! Line rate non blocking 1U 10GbE switches are dirt cheap these days, and many of them can even do 10GbE over regular old Cat 5E. Though I’m sure Cat 6A would provide better performance and/or latency. But the point I’m driving towards is not having to care what I’m plugging into, have it just work.

Maybe I’m just mad because I got somewhat excited about the concept of FCoE and feel totally let down by the details.

What I’d really like to see is a HP VirtualConnect 10GbE “converged” iSCSI+NIC. That’d just be cool. Toss onto that the ability to run a mix of jumbo and non jumbo frames on the same NIC(different vlans of course). Switches can do it, NICs should be able to do it too! I absolutely want jumbo frames on any storage network, but I probably do not want jumbo frames on my regular network for compatibility reasons.

August 6, 2009

Spreading the Load

Filed under: Storage — @ 12:24 pm

I’m sure there are a number of articles out there on 3PAR’s Dynamic Optimization but I thought it would be worth adding one more “holy cow this is easy!” post. My company just added 8 more drives to our 3PAR E200 bringing the total spindle count from 24 to 32. In the past, using another vendor’s SAN, taking advantage of the space on these new drives meant carving out a new LUN. If you wanted to use the space on all 32 drives collectively (in a single LUN for example) it would mean copying all the data off, recreating your LUN(s) and copying the data back. Not with 3PAR. First of all, thanks to their “chunklet” technology, carving out LUNs is a thing of the past. You can create and delete multiple virtual LUNs (VLUNs) on the fly. I won’t got into the details of that here but instead want to look at their Dynamic Optimization feature.

With Dynamic Optimization, after adding those 8 new drives I can then rebalance my VLUNs across all 32 drives – taking advantage of the extra spindles and increasing IOPS (and space). Now comes the part about it being easy. It is essentially 3 commands for a single volume – obviously the total number of commands will vary based on your volumes and common provisioning groups (cpgs).

createcpg -t r5 -ha mag -ssz 9 RAID5_SLOWEST_NEW
The previous command creates a new CPG that is spread out across all of the disks. You can do a lot with CPG’s, but we use them in a pretty flat manner and just use them to define the RAID level and where the data resides on the platter (inside or outside). The “-t r5” flag defines the RAID type (RAID5 in this case). The “-ha mag” flag defines the level of redundancy for this CPG (in this case, at the magazine level which on the E200 equates to disk level). The “-ssz9” defines the set size for the RAID level (in this case 8+1 – obviously a slower RAID level but easy on the overhead). The “RAID5_SLOWEST_NEW” is the name I’m assigning to the CPG.

tunevv usr_cpg RAID5_SLOWEST_NEW -f MY_VIRTVOL1
The “tunevv” command is used for each virtual volume I want to migrate to the newly created CPG in the previous command. It tells the SAN to move the virtual volume MY_VIRTVOL1 to the CPG RAID5_SLOWEST_NEW.

Then, once all of your volumes on a particular CPG are moved to a new CPG, the final command is to delete the old CPG and regain your space.

If you start running low on space before you get to the removecpg command (when you’re moving multiple volumes, for example), you can always issue a compactcpg command that will shrink your “old” CPG and free up some space until you finish moving your volumes. Or, if you’re not moving all the volumes off the old CPG, then be sure to issue a compactcpg when you’re finished to reclaim that space.

The Dynamic Optimization can also be used to migrate a volume from one RAID level to another using commands similar to the ones above. At a previous company we moved a number of volumes from RAID1 to RAID5 because we needed the extra space that RAID5 gives. Also, due to the speed of the 3PAR SAN, we hardly noticed a performance hit! And, in this case, the entire DO operation was done by a 3PAR engineer from his Blackbarry while sitting at a restaurant in another state.

Oh, did I mention this is all done LIVE with zero downtime? In fact, I’m doing it on our production SAN right now in the middle of a weekday while the system is under load. There is a performance hit to the system in terms of disk I/O, but the system will throttle the CPG migration as needed to give priority to your applications/databases.

You can queue up multiple tunevv commands at the same time (I think 4 is the max) – each command kicks off a background task that you can check on with the showtask command.

Through this process I’ve created new CPG’s that are configured the same as my old CPG’s (in terms of RAID level and physical location on the platter) except that the new CPG’s are spread across all 32 disk and not just my original 24. Then I moved my VLUNs from the old CPG’s to the new CPG’s. And finally, I deleted the old CPG. Now all of my CPG’s and the VLUNs they contain are spread across all 32 disks thereby increasing IOPS and space available to the CPG’s.
createcpg -t r5 -ha mag -ssz 9 RAID5_SLOWEST_NEW
tunevv usr_cpg RAID5_SLOWEST_NEW -f MY_VIRTVOL1
# repeat as many times as need for each virtual volume in that CPG

August 5, 2009

FTP to your tape drive

Filed under: Storage — Nate @ 7:53 pm

Just got done with an evaluation of a new product on the market, a Cache-A Prime Cache tape drive. It is based off the same technology that the Quantum Superloader 3A. Quantum as a company hasn’t been doing too hot recently, I was told that they basically let go of the entire team responsible for developing the software for the Superloader 3A. Cache-A then went in and either bought or at least licensed the software to continue development at their company. The first product was released in late June 2009, the Prime Cache, which is a single LTO-4 tape drive hooked up to a small computer running Fedora 10. They have a fairly easy to use UI that is web based.

You can either FTP or use CIFS to interface with the device to upload files. It stages the files on a local internal disk which they call the VTAPE, once the file is uploaded to the share then the system automatically sends it to tape. Eventually it will support NFS as well. It does have the ability to mount remote NFS/CIFS shares and back them up directly though there are some limitations in the current software release. I was unable to get it to see any files on our main NAS cluster which runs Samba for CIFS, and was unable to mount NFS volumes it depends currently on another software package(forgot the name) which broadcasts the available NFS exports to the network for the device to see them, no ability yet to manually input a NFS server/mount point to go to.

I like the concept because being able to use FTP or even smbclient on the command line to be able to tie directly into a tape drive from backup scripts is real handy for me. Add to that pretty much any system on the network being able to push files to the tape without having to go through special backup software has it’s appeals as well. Most of our data that needs to be backed up is spread out all over the place, and I have scripts that gathers the paths and file names of the files that need backing up. Our MySQL database backups are also heavily scripted as well involving snapshots from the SAN etc. So being able to put a few lines of code in the script to pass the files along to the tape is nice.

The system is quite new so has some bugs, and some things aren’t implimented yet, like the ability to delete files directly from the tape or erase/format the tape without using the WebUI, that is coming(along with an API), though no ETA. The device retails for about $7k I believe, which is roughly 1/2 the cost of the SuperLoader 3A. Though this is just one tape drive, no autoloader yet. Though it is LTO-4 and the SuperLoader 3A is LTO-3(with no expectations of it ever being able to get to be LTO-4).

I’ll certainly be following this product/company pretty closely in the future myself as I really like the way they are going, this is certainly a very innovative product, other than the SuperLoader I haven’t seen any other product like it on the market.

August 4, 2009

1 Billion events in Splunk

Filed under: Monitoring — Nate @ 10:43 pm

I was on a conference call with Splunk about a month or so ago, we recently bought it after using it off and on for a while. One thing that stuck out to me on that call was the engineer’s excitement around being able to show off a system that had a billion events in it. I started a fresh Splunk database in early June 2009 I think it was, and recently we passed 1 billion events. The index/DB(whatever you want to call it) just got to about 100GB(the below screenshot is a week or two old). The system is still pretty quick too. Running on a simple dual Xeon system with 8GB memory, and a software iSCSI connection to the SAN.

We have something like 400 hosts logging to it(just retired about 100 additional ones about a month ago, going to retire another 80-100 in the coming weeks as we upgrade hardware). It’s still not fully deployed right now about 99% of the data is from syslog.

Upgraded to Splunk v4 the day it came out, it has some nice improvements, filed a bug the day it came out too(well a few), but the most annoying one is I can’t login to v4 with Mozilla browsers(nobody in my company can). Only with IE. We suspect it’s some behavioral issue with our really basic Apache reverse proxy and Splunk. The support guys are looking at it still. That and both their Cisco and F5 apps do not show any data despite having millions of log events from both Cisco and F5 devices in our index. They are looking into that too.

1 billion logged events

1 billion logged events

Will it hold?

Filed under: Monitoring,Storage — Tags: , , — Nate @ 10:21 pm

I went through a pretty massive storage refresh earlier this year which cut our floorspace in half, power in half, disks in half etc. Also improved performance at the same time. It’s exceeded my expectations, more recently though I have gotten worried as far as how far will the cache+disks scale to before they run out of gas. I have plans to increase the disk count by 150% (from 200 to 300) at the end of the year, but will we last until then? My  past(admittedly limited) storage experience  says we should already be having lots of problems but we are not. The system’s architecture and large caches are absorbing the hit, the performance remains high and very responsive to the servers. How long will that hold up though?  There are thousands of metrics available to me but the one metric that is not available is cache utilization, I can get hit ratios on tons of things, but no info on how full the cache is at any particular period of time(for either NAS or SAN).

To illustrate my point, here is a graphic from my in-house monitoring showing sustained spindle response times over 60 milliseconds:

Physical Disk response time

Physical Disk response time

And yet on the front end, response times are typically 2 milliseconds:

Fiber channel response time to NAS cluster

Fiber channel response time to NAS cluster

There are spikes of course, there is a known batch job that kicks off tons of parallel writes which blows out the cache on occasion, a big gripe I have with the developers of the app and their inability to(so far) throttle their behavior. I do hold my breath on occasion when I personally witness the caches(if you add up both NAS+SAN caches it’s about 70GB of mirrored memory) getting blown out. But as you can see both on the read and especially write side the advanced controllers are absorbing a huge hit. And the trend over the past few months has been a pretty steep climb upwards as more things run on the system. My hope is things level off soon, that hasn’t happened yet.

The previous arrays I have used would not of been able to sustain this, by any stretch.

Will it hold?

Older Posts »

Powered by WordPress