TechOpsGuys.com Diggin' technology every day

2Oct/12Off

Cisco drops price on Nexus vSwitch to free

TechOps Guy: Nate

I saw news yesterday that Cisco dropped the price of their vSwitch to $free, they still have a premium version which has a few more features.

I'm really not all that interested in what Cisco does, but what got me thinking again is the lack of participation by other vendors in making a similar vSwitch, of integrating their stack down to the hypervisor itself.

Back in 2009, Arista Networks launched their own vSwitch (though now that I read more on it, it wasn't a "real" vSwitch),  but you wouldn't know that by looking at their site today, I tried a bunch of different search terms I thought they still had it, but it seems the product is dead and buried. I have not heard myself of any other manufacturers making a software vSwitch of any kind (for VMware at least). I suppose customer demand is not there.

I asked Extreme back then if they would come out with a software vSwitch, and at the time at least they said there was no plans, instead they were focusing on direct attach, a strategy at least for VMware, appears to be dead for the moment, as the manufacturer of the NICs used to make it happen is no longer making NICs(as of about 1-2 years ago). I don't know why they have the white paper on their site still, I guess to show the concept, since you can't build it today.

Direct attach - at least taken to it's logical conclusion is a method to force all inter-VM switching out of the host and into the physical switches layer. I was told that this is possible with Extreme(and possibly others too) with KVM today (I don't know the details), just not with VMware.

They do have a switch that runs in VMware, though it's not a vSwitch, more of a demo/type thing where you can play with commands. Their switching software has run on Intel CPUs since the initial release in 2003 (and they still have switches today that use Intel CPUs), so I imagine the work involved is not herculean to make a vSwitch happen if they wanted to.

I have seen other manufacturers (Brocade at least if I remember right) that were also looking forward to direct attach as the approach to take instead of a vSwitch. I can never remember the official networking name for the direct attach technology...

With VMware's $1.2B purchase of Nicira it seems they believe the future is not direct attach.

Myself I like the concept of switching within the host, though I have wanted to have an actual switching fabric (in hardware) to make it happen. Some day..

Off topic - but it seems the global economic cycle has now passed the peak and now for sure headed down hill? One of my friends said yesterday the economy is "complete garbage", I see tech company after company missing or warning, layoffs abound, whether it's massive layoffs at HP, or smaller layoffs at Juniper that was announced this morning. Meanwhile the stock market is hitting new highs quite often.

I still maintain we are in a great depression. Lots of economists try to dispute that, though if you take away the social safety nets that we did not have in the '20s and '30s during the last depression I am quite certain you'd see massive numbers of people lined up at soup kitchens and the like. I think the economists try to dispute it more because they fear a self fulfilling prophecy rather than their willingness to have a serious talk on the subject. Whether or not we can get out of the depression, I don't know. We need a catalyst - last time it was WWII, at least the last two major economic expansions were bubbles, it's been a long time since we've had a more normal economy. If we don't get a catalyst then I see stagnation for another few years, perhaps a decade while we drift downwards towards a more serious collapse (something that would make 2008 look trivial by comparison).

14Sep/12Off

VMware suggests swapping as a best practice

TechOps Guy: Nate

Just came across this, and going through the PDF it says

Virtualization causes an increase in the amount of physical memory required due to the extra memory needed by ESXi for its own code and for data structures. This additional memory requirement can be separated into two components:
1. A system-wide memory space overhead for the VMkernel and various host agents (hostd, vpxa, etc.).

A new feature in ESXi 5.1 allows the use of a system swap file to reduce this memory overhead by up to 1GB when the host is under memory pressure.

That just scares me that the advocate setting up a swap file to reduce memory usage by up to 1GB. How much memory does the average VMware system have? Maybe 64GB today? So that could save 1.5% of physical memory, with the potential trade off of impacting storage performance (assuming no local storage) for all other systems in the environment.

Scares me just about as much as how 3PAR used to advocate their storage systems can get double the VM density per server because you can crank up the swapping and they can take the I/O hit (I don't think they advocate this today though).

Now if you can somehow be sure that the system won't be ACTIVELY swapping then it's not a big deal, but of course you don't want to actively swap really in any situation, unless your I/O is basically unlimited. You could go and equip your servers with say a pair of SSDs in RAID 1 to do this sort of swapping(remember it is 1GB). But it's just not worth it. I don't understand why VMware spent the time to come up with such a feature.

If anything the trend has been more memory in hosts not less, I'd imagine most serious deployments have well over 100GB of memory per host these days.

My best practice is don't swap - ever. In the environments I have supported performance/latency is important so there is really no over subscription for memory, I've had one time where Vmware was swapping excessively at the host level, and to me it was a bug, but to Vmware it was a feature(there was tons of memory available on the host), I forgot the term but it was a documented behavior on how the hypervisor functions, just not commonly known I guess, and totally not obvious. The performance of the application obviously went in the toilet when this swapping was going on, it felt like the system was running on a 386 CPU.

Windows memory footprint is significantly different than that of Linux, Linux represents probably 98 or 99% of my VMs over the years.

Oh and that transparent page sharing VMware touts so much? I just picked one of my servers at random, 31 VMs and 147GB of memory in use, TPS is saving me a grand 3% of memory, yay TPS.

The cost of I/Os(to spinning rust, or even enterprise SSDs), unless your workload is very predictable and you do not have much active swapping, is just too much to justify the risk in allowing swap in any form in my experience. In fact the bulk of the VMs I run do have a local 500MB swap partition, enough for some very light swapping - but I'd rather have the VM fail & crash, then have it swap like crazy and take the rest of the systems down with it.

But that's me

Tagged as: No Comments
27Aug/12Off

vSphere 5.1: what 5.0 should of been!

TechOps Guy: Nate

I'm not going to bore you with all the mundane details about what is new, so many other folks are doing that, but here is a post from VMware which has links to pdfs as to what is new.

It looks pretty decent, the licensing change is welcome, though not everyone agrees with that. I find it interesting that the web console is going to be the main GUI to manage vSphere going forward. I found the web console in 5.0 severely lacking, but I'm sure it's improved in 5.1. Anyone happen to know if the new console is backwards compatible with vCenter 5.0 ? Also I wonder if this web console applies to managing ESXi hosts directly (without vCenter)? I assume it doesn't apply(yet) ?

I don't see myself upgrading anytime before the end of the year, but it does seem strongly to me that this 5.1 release is what 5.0 should of been last year.

I find this race to a million iops quite silly, whether it is VMware's latest claim of 1 million iops in a VM, or EMC's latest claim, or HDS's latest claim, everyone is trying to show they can do a million too, and the fine print always seems to point to a 100% read workload, maybe customers will buy their arrays with their data pre-loaded, so they don't have to do any writes to them.

 

Tagged as: 2 Comments
20Aug/12Off

The Screwballs have Spoken

TechOps Guy: Nate

Just got this link from Gabriel (thanks!), it seems the screwball VMware community has spoken and VMware listened and is going to ditch their controversial vRAM licensing that they introduced last year.

In its upcoming release of vSphere 5.1, VMware is getting rid of vRAM entitlements, which debuted with vSphere 5 and determine how much memory customers are permitted to allocate to virtual machines on the host, according to sources familiar with VMware's plans.

I tried to be a vocal opponent to this strategy and firmly believed it was going to hurt VMware, I haven't seen any hard numbers as to the up take of vSphere 5, but there have been hints that it has not been as fast as VMware had hoped.

I had a meeting with a VMware rep about a year ago and complained about this very issue for at least 30 minutes but it was like talking to a brick wall. I was told recently that the rep in question isn't with the company anymore.

I have little doubt that VMware was forced into this change because of slow uptake and outright switching to other platforms. They tried to see how much leverage they had at customers and realized they don't have as much as they thought they had.

Now the question is will they repeat the mistake again in the future - myself I am pretty excited to hear that Red Hat is productizing OpenStack, along with RHEV, that really looks like it has a lot of potential (everything I see today about OpenStack says steer clear unless you have some decent in house development resources). I don't have any spare gear to be able to play with this stuff on at the moment.

Thanks VMware for coming to your senses, the harsh feelings are still there though, can I trust you again after what you tried to pull? Time will tell I guess.

(In case you're wondering where I got the title of this post from it's from here.)

 Marge get to make her concluding statement, in which she asks all concerned  parents to write to I&S and express their feelings. In his office, Mr.  Meyers goes through the tons of angry mail he's received... ``The  screwballs have spoken...'' 
Tagged as: 1 Comment
7Aug/12Off

Adventures with vCenter, Windows and expired Oracle passwords

TechOps Guy: Nate

Today's a day that I could have back - it was pretty much a waste/wash.

I'm not a windows person by trade of course, but I did have an interesting experience today. I write this in the hopes that perhaps it can save someone else the same pain.

Last night I kicked off some Windows updates on a vCenter server, done it a bunch of times before never had an issue. There was only about 6-10 updates to install. It installed them, then rebooted, and was taking a really long time to complete the post install stuff, after about 30mins I gave up and went home. It's always come back when it's done.

I forgot about it until this morning when I went to go do stuff with vCenter and could not connect. Then I tried to remote desktop into the system and could not(tcp port not listening). So I resorted to logging in via VMware console. Tried resetting remote desktop to no avail. I went to control panel to check on windows update, and the windows update control panel just hung. I went to the 'add/remove programs' thing to roll back some updates and it hung while looking for the updates.

I tried firing up IE9, and it didn't fire, it just spun an hourglass for a few seconds and stopped. I scoured the event logs and there was really nothing there - no errors. I was convinced at this time an OS update went wrong, I mean why else would something like IE break ? There was an IE update as part of the updates that were installed last night after all.

After some searches I saw some people comment on how some new version of Flash was causing IE to break, so I went to remove flash (forgot why it was installed but there was a reason at the time), and could not. In fact I could not uninstall anything, it just gave me a generic message saying something along the lines of "wait for the system to complete the process before uninstalling this".

I came across a windows tool called System Update Readiness Tool which sounded promising as well, I was unable to launch IE of course, I did have firefox and could load the web page but was unable to download the software without Firefox hanging(!?). I managed to download it on another computer and copy it over the network to the affected server's HD. But when I tried to launch it - sure enough it hung too almost immediately.

Rebooting didn't help, shut down completely and start up again - no luck. Same behavior. After consulting with the IT manager who spends a lot more time in Windows than me we booted to safe mode - came right up. Windows update is not available in safe mode, most services were not started. But I was able to get in and uninstall the hot fix for IE. I rebooted again.

At some point along the line I got the system to where I could remote desktop in, windows update looked ok, IE loaded etc. I called the IT manager over to show him, and decided to reboot to make sure it was OK only to have it break on me again.

I sat at the post install screen for the patches (Stage 3 of 3 0%) for about 30 minutes, at this point I figure I better start getting prepared to install another vCenter server so I started that process in parallel, talked a bit with HP/Vmware support and I shut off the VM again and rebooted - no difference just was sitting there. So I rebooted again into safe mode, and removed the rest of the patches that were installed last night, and rebooted again into normal mode and must've waited 45 minutes or so for the system to boot - it did boot eventually, got past that updates screen. But the system was still not working right, vCenter was hanging and I could not remote desktop in.

About 30 minutes after the system booted I was able to remote desktop in again, not sure why, I kept poking around, not making much progress. I decided to take a VM snapshot (I had not taken one originally but in the grand scheme of things it wouldn't of helped), and re-install those patches again, and let the system work through whatever it has to work through.

So I did that, and the system was still wonky.

I looked and looked - vCenter still hanging, nothing in the event log and nothing in the vpx vCenter log other than stupid status messages like

2012-08-08T01:08:01.186+01:00 [04220 warning 'VpxProfiler' opID=SWI-a5fd1c93] VpxUtil_InvokeWithOpId [TotalTime] took 12000 ms
2012-08-08T01:08:12.535+01:00 [04220 warning 'VpxProfiler' opID=SWI-12d43ef2] VpxUtil_InvokeWithOpId [TotalTime] took 12000 ms
2012-08-08T01:08:23.884+01:00 [04356 warning 'VpxProfiler' opID=SWI-f6f6f576] VpxUtil_InvokeWithOpId [TotalTime] took 12000 ms
2012-08-08T01:08:35.234+01:00 [04220 warning 'VpxProfiler' opID=SWI-a928e16] VpxUtil_InvokeWithOpId [TotalTime] took 12000 ms
2012-08-08T01:08:46.583+01:00 [04220 warning 'VpxProfiler' opID=SWI-729134b2] VpxUtil_InvokeWithOpId [TotalTime] took 12000 ms
2012-08-08T01:08:57.932+01:00 [04328 warning 'VpxProfiler' opID=SWI-a395e0af] VpxUtil_InvokeWithOpId [TotalTime] took 12000 ms
2012-08-08T01:09:09.281+01:00 [04220 warning 'VpxProfiler' opID=SWI-928de6d2] VpxUtil_InvokeWithOpId [TotalTime] took 12000 ms
2012-08-08T01:09:20.631+01:00 [04328 warning 'VpxProfiler' opID=SWI-7a5a8966] VpxUtil_InvokeWithOpId [TotalTime] took 12000 ms
2012-08-08T01:09:32.058+01:00 [04220 warning 'VpxProfiler' opID=SWI-524a7126] VpxUtil_InvokeWithOpId [TotalTime] took 12000 ms
2012-08-08T01:09:43.804+01:00 [04328 warning 'VpxProfiler' opID=SWI-140d23cf] VpxUtil_InvokeWithOpId [TotalTime] took 12000 ms
2012-08-08T01:09:55.551+01:00 [04356 warning 'VpxProfiler' opID=SWI-acadf68a] VpxUtil_InvokeWithOpId [TotalTime] took 12000 ms
2012-08-08T01:10:07.297+01:00 [04328 warning 'VpxProfiler' opID=SWI-e42316c] VpxUtil_InvokeWithOpId [TotalTime] took 12000 ms
2012-08-08T01:10:19.044+01:00 [04356 warning 'VpxProfiler' opID=SWI-3e976f5f] VpxUtil_InvokeWithOpId [TotalTime] took 12000 ms
2012-08-08T01:10:30.790+01:00 [04328 warning 'VpxProfiler' opID=SWI-2734f3ba] VpxUtil_InvokeWithOpId [TotalTime] took 12000 ms

No errors anywhere, I believe I looked at the tomcat logs a few times and there was no logs for today.

Finally I dug into the tomcat logs from last night and came across this -

Aug 6, 2012 11:27:30 PM com.vmware.vim.common.vdb.VdbODBCConfig isConnectableUrl
SEVERE: Unable to get a connection to: jdbc:oracle:thin:@//DB_SERVER:1521/DB_SERVER as username=VPXADMIN due to: ORA-28001: the password has expired

I had encountered a password expiry on my sys account a few weeks ago, but didn't really think much about it at the time. Anyways I reset the password and vCenter was able to start. I disabled password expiry per this page (I have used Oracle 10G and a little of 8/9i and never recall having password expire issues), which says defaults were changed in 11G and passwords do expire now.

I have had vCenter fail to start because of DB issues in the past - in fact because vCenter does not properly release locks on the Oracle DB when it shuts down the easiest workaround is to restart Oracle whenever I reboot the vCenter server (because vCenter is the only thing on the Oracle DB it's just a simpler solution). When vCenter fails in this way it causes no issues to the rest of the OS. Just an error message in the event log saying vCenter failed to start, and a helpful explanation as to why -

Unable to get exclusive access to vCenter repository.   Please check if another vCenter instance is running against the same database schema.

What got me, even now is how the hell did this expired password cascade into Internet Explorer breaking, remote desktop breaking, windows update breaking, etc ? My only guess is that vCenter was perhaps flooding the system with RPC messages causing other things to break. Again - there was no evidence of any errors in the event log anywhere. I even called a friend who works at Microsoft and deploys hundreds of Windows servers for a living (he works as a Lab Manager), hoping he would have an idea. He said he had seen this behavior several times before but never tried to debug it, he just wiped the system out and reinstalled. I was close to doing that today, but fortunately eventually found a solution, and I guess you could say I learned something in the process ?

I don't know.

I have not seriously used windows since the NT4 days (I have used it casually on the desktop and in some server roles like this vCenter system), why I stopped using it, well there was many reasons, I suppose this was sort of a reminder. I'm not really up to moving to the Linux vCenter appliance yet it seems beta-ish, if I ever get to move to that appliance before I upgrade to KVM (at some point, no rush). I have a very vague memory of experimenting one time on NT4, or maybe it was 3.51, where I decided to stop one/more of the RPC services to see what would happen. Havok, of course. I noticed one of the services vCenter depends upon, the DCOM Server Process Launcher, seems similar of importance in Windows 2008, though 2008 smartly does not allow you to stop it, I chuckled when I saw the Recovery Action for this service failure is Restart the Computer. But in this case the service was running... I looked for errors for it in the event log as well and there were none.

7Aug/12Off

ESXi 5 Uptake still slow?

TechOps Guy: Nate

Just came across this article from our friends at The Register, and two things caught my eye -

HP is about to launch a new 2U quad socket system - the HP DL560 Gen8, which is what the article is about. I really can't find any information on this server online, so it seems it is not yet officially announced. I came across this PDF from 2005, which says the 560 has existed in the past - though I never recall hearing about it and I've been using HP gear off and on since before that. Anyways, on the HP site the only 500-series systems I see are the 580 and 585, nothing new there.

HP has taken it's sweet time joining the 4-socket 2U gang, I recall Sun was among the first several years ago on the Opteron, then later Dell and others joined in but HP was bulky still with the only quad socket  rack option being 4U.

The more interesting thing though to me was the lack of ESXi 5.0 results posted with VMware's own benchmark utilities. Of the 23 results posted since ESXi 5 was made generally avaialble, only four are running on the newest hypervisor. I count six systems using ESX 4.1U2 and vCenter 5.0 (a combination I chose for my company's infrastructure). Note I said ESX - not ESXi. I looked at a couple of the disclosure documents and would expect them to specifically call out ESXi if that is in fact what was used.

So not only are they NOT using ESXi 5.0 but they aren't even using ESXi period with these newest results (there is not a single ESXi 4.x system on the site as far as I can tell).

Myself I find that fascinating. Why would they be testing with an older version of the hypervisor and not even using ESXi? I have my own reasons for preferring ESX over ESXi, but I'd really expect for benchmark purposes they'd go with the lighter hypervisor. I mean it consumes significantly less time to install onto a system since it's so small.

I have to assume that they are using this configuration because it's what the bulk of their customers are still deploying today, otherwise it makes no sense to be testing the latest and greatest Intel processors on server hardware that's not even released yet on an OS kernel that is going on three years old at this point. I thought there was supposed to be some decent performance boosts in ESXi 5?

I'm not really a fan of the VMark benchmark itself, it seems rather confusing to interpret results, there are no cost disclosures, and I suspect it only runs on VMware making it difficult or impossible to compare with other hypervisors. Also the format of the results is not ideal, I'd like to see at least CPU/Memory/Storage benchmarks included so it's easier to tell how each subsystem performed. Testing brand X with processor Y and memory Z against brand W with processor Y and memory Z by itself doesn't seem very useful.

SPEC has another VM benchmark, though it seems similarly confusing to interpret results, though at least they have results for more than one hypervisor.

vSphere, aka ESX 4, when it was released really was revolutionary, it ditched the older 32-bit system for a more modern 64-bit system, and introduced a ton of new things as well.

I was totally underwhelmed by ESXi 5, even before the new licensing change was announced. I mean just compare What's New between vSphere 4 and vSphere 5.

Tagged as: 2 Comments
7Apr/12Off

Interesting discussion on vSphere vs Hyper-V

TechOps Guy: Nate

I stumbled upon this a few days ago and just got around to reading it now. It came out about a month ago, I forgot where I saw it, I think from Planet V12n or something.

Anyways it's two people who sound experienced(I don't see information on their particular backgrounds) each talking up their respective solution. Two things really stood out to me:

  • The guy pimping Microsoft was all about talking about a solution that doesn't exist yet ("it'll be fixed in the next version, just wait!")
  • The guy pimping VMware was all about talking about how cost doesn't matter because VMware is the best.

I think they are both right - and both wrong.

It's good enough

I really believe that in Hyper-V's case and also in KVM/RHEV's case that for the next generation of projects these products will be "good enough" (in Hyper-V's case - whenever Windows 8 comes out) for a large(majority) number of use cases out there. I don't see Linux-centric shops considering Hyper-V or Windows-centric considering KVM/RHEV/etc so VMware will still obviously have a horse in the  race (as the pro-VMware person harps on in the debate).

Cost is becoming a very important issue

One thing that really got me the wrong way was when the pro-VMware person said this

Some people complain about VMware's pricing but those are not the decision makers, they are the techies. People who have the financial responsibility for SLAs and customers aren't going to bank on an unproven technology.

I'm sorry but that is just absurd. If cost wasn't an issue then the techies wouldn't be complaining about it because they know, first hand that it is an issue in their organizations. They know, first hand that they have to justify the purchase to those decision makers. The company I'm at now was in that same situation - the internal IT group could only get the most basic level of vSphere approved for purchase at the time for thier internal IT assets(this was a year or two or three ago). I hear them constantly complaining about the lack of things like vMotion, or shared storage etc. Cost was a major issue so the environment was built with disparate systems and storage and the cheap version of vSphere.

Give me an unlimited budget and I promise, PROMISE you will NEVER hear me complain about cost. I think the same is true of most people.

I've been there, more than once! I've done that exact same thing (Well in my case I managed to have good storage in most of the cases).

Those decision makers weigh the costs of maintaining that SLA with whatever solution they're going to provide. Breaking SLAs can be more cost effective then achieving them. Especially if they are absurdly high SLAs. I remember at one company I was at they signed all these high level SLAs with their new customers -- so I turned around and said - hey, in order to achieve those SLAs we need to do this laundry list of things. I think maybe 5-10% of the list got done until the budget ran out. You can continue to meet those high SLAs if your lucky, and don't actually have the ability to sustain failure and maintain uptime. More often than not such smaller companies prefer to rely on luck then doing things right.

Another company I was at had what could of been called a disaster in itself, during the same time I was working on a so-called disaster recovery project (no coincidence). Despite the disaster, at the end of the day the management canned the disaster recovery project (which everyone agreed if it was in place it would of saved a lot of everything had it been in place at the time of the disaster). It's not that budget wasn't approved - it was approved. The problem was management wanting to do another project that they massively under budgeted for and decided to cannibalize the budget from DR to give to this other pet project.

Yet another company I was at signed a disaster recovery contract with Sun Guard just to tick the check box to say they have DR. The catch was - the entire company knew up front before they signed - that they would never be able to utilize the service. IT WOULD NEVER WORK. But they signed it anyways because they needed a plan, and they didn't want to pay for a plan that would of worked.

VMware touting VM density as king

I've always found it interesting how VMware touts VM density, they show an automatic density advantage to VMware which automatically reduces VMware's costs regardless of the competition. This example was posted to one of their blogs a few days ago.

They tout their memory sharing, their memory ballooning, their memory compression all as things that can increase density vs the competiton.

My own experience with memory sharing on VMware at least with Linux is pretty simple - it doesn't work. It doesn't give results. Looking at one of my ESX 4.1 servers (yes, no ESXi here) which has 18 VMs on it and 101GB of memory in use, how much memory am I saving with the transparent page sharing?

3,161 MB - or about 3%. Nothing to write home about.

For production loads, I don't want to be in a situation where memory ballooning kicks in, or when memory compression kicks in, I want to keep performance high - that means no swapping of any kind from any system. Last thing I want is my VMs to start thrashing my storage with active swapping. Don't even think about swapping if your running Java apps either, once that garbage collection kicks in your VM will grind to a halt while it performs that operation.

I would like a method to keep the Linux buffer cache under control however, whether it is ballooning that specifically targets file system cache, or some other method, that would be a welcome addition to my systems.

Another welcome addition would be the ability to flag VMs and/or resource pools to pro-actively utilize memory compression (regardless of memory usage on the host itself). Low priority VMs, VMs that sit at 1% cpu usage most of the time, VM's where the added latency of compression on otherwise idle CPU cores isn't that important (again - stay away from actively swapping!). As a bonus provide the ability to limit the CPU capacity consumed by compression activities, such as limiting it to the resource pool that the VM is in, and/or having a per-host setting where you could say - set aside up to 1 CPU core or whatever for compression, if you need more than that, don't compress unless it's an emergency.

YAWA with regards to compression would be to provide me with compression ratios - how effective is the compression when it's in use? Recommend to me VMs that have low utilization that I could pro-actively reclaim memory by compressing these, or maybe only portions of the memory are worth compressing? The Hypervisor with the assistance of the vmware tools has the ability to see what is really going on in the guest by nature of having an agent there. The actual capability doesn't appear to exist now but I can't imagine it being too difficult to implement. Sort of along the lines of pro-actively inflating the memory balloon.

So, for what it's worth for me, you can take any VM density advantages for VMware off the table when it comes from a memory perspective. For me and VM density it's more about the efficiency of the code and how well it handles all of those virtual processors running at the same time.

Taking the Oracle VM blog post above, VMware points out Oracle supports only 128 VMs per host vs VMware at 512, good example - but really need to show how well all those VMs can work on the same host, how much overhead is there. If my average VM CPU utilization is 2-4% does that mean I can squeeze 512 VMs on a 32-core system (memory permitting of course)  -- when in theory I should be able to get around 640 - memory permitting again.

Oh the number of times I was logged into an Amazon virtual machine that was suffering from CPU problems only to see that 20-30% of the CPU usage was being stolen from the VM by the hypervisor. From the sar man page

%steal

Percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.

Not sure if Windows has something similar.

Back to costs vs Maturity

I was firmly in the VMware camp for many years, I remember purchasing ESX 3.1 (Standard edition - no budget for higher versions) for something like $3,500 for a two-socket license. I remember how cheap it felt at the time given the power it gave us to consolidate workloads. I would of been more than happy(myself at least) to pay double for what we got at the time. I remember the arguments I got in over VMware vs Xen with my new boss at the time, and the stories of the failed attempts to migrate to Xen after I left the company.

The pro-VMware guy in the original ZDNet debate doesn't see the damage VMware is doing to itself when it comes to licensing. VMware can do no wrong in his eyes. I'm sure there are plenty of other die hards out there that are in the same boat. The old motto of you never got fired for buying IBM right. I can certainly respect the angle though as much as it pains my own history to admit that I think the tides have changed and VMware will have a harder and harder time pitching it's wares in the future, especially if it keeps playing games with licensing on a technology which it's own founders (I think -- I wish I could find the article) predicted would become commodity by about now. With the perceived slow uptake of vSphere 5 amongst users I think the trend is already starting to form. The problem with the uptake isn't just the licensing of course, it's that for many situations there isn't a compelling reason to upgrade - it's good enough has set in.

I can certainly, positively understand VMware providing premium pricing for premium services, an Enterprise Plus Plus ..or whatever. But don't vary the price based on provisioned utilization that's just plain shooting yourself (and your loyal customers) in the feet. The provisioned part is another stickler for me - the hypervisor has the ability to measure actual usage, yet they stick their model to provisioned capacity - whether or not the VM is actually using the resource. It is a simpler model but it makes planning more complicated.

The biggest scam in this whole cloud computing era so many people think we're getting into is the vast chasm between provisioned vs utilized capacity. With companies wanting to charge you for provisioned capacity and customers wanting to over provision so they don't have to make constant changes to manage resources, knowing that they won't be using all that capacity up front.

The technology exists, it's just that few people are taking advantage of it and fewer yet have figured out how to leverage it (at least in the service provider space from what I have seen).

Take Terremark (now Verizon), a VMware-based public cloud provider (and one of only two partners listed on VMware's site for this now old program). They built their systems on VMware, they build their storage on 3PAR. Yet for this vCloud express offering there is no ability to leverage resource pools, no ability to utilize thin provisioning (from a customer standpoint). I have to pay attention to exactly how much space I provision up front, and I don't have the option to manage it like I would on my own 3PAR array.

Now Terremark has an enterprise offering that is more flexible and does offer resource pools, but this isn't available on their on demand offering. I still have the original quote Terremark sent me for the disaster recovery project I was working on at the time, it makes me want to either laugh or cry to this day. I have to give Terremark credit though at least they have an offering that can utilize resource pools, most others (well I haven't heard of even one - though I haven't looked recently) does not. (Side note: I hosted my own personal stuff on their vCloud express platform for a year so I know it first hand - it was a pretty good experience what drove me away primarily was their billing for each and every TCP and UDP port I had open on an hourly rate. Also good to not be on their platform anymore so I don't risk them killing my system if they see something I say and take it badly).

Obviously the trend in system design over recent years has bitten into the number of licenses that VMware is able to sell - and if their claims are remotely true - that 50% of the world's workloads are virtualized and of that they have 80% market share - it's going to be harder and harder to maintain a decent growth rate. It's quite a pickle they are in, customers in large part apparently haven't bought into the more premium products VMware has provided (that are not part of the Hypervisor), so they felt the pressure to increase the costs of the Hypervisor itself  to drive that growth in revenue.

Bottom Line

VMware is in trouble.

Simple as that.

Tagged as: , 9 Comments
15Dec/11Off

VMware increases core counts in 4.1 licensing

TechOps Guy: Nate

I just came across this mention on AMD's blog. They note that vSphere 4.1 Update 2 included a CPU licensing change -

For the AMD Opteron 6200 and 4200 series (Family 15h) processors, ESX/ESXi 4.1 Update 2 treats each core within a compute unit as an independent core, except while applying licenses. For the purpose of licensing, ESX/ESXi treats each compute unit as a core. For example, a processor with 8 compute units can provide the processor equivalent of 16 cores on ESX/ESXi 4.1 Update 2. However, ESX/ESXi 4.1 Update 2 only requires an 8 core license for each 16-core processor.

I had not heard of that before, so it's news to me! So not only is the physical cost of the Opteron 6200 cheaper than the 6100, the licensing cost is half as much (per core). AMD's blog post above shows some pretty impressive results where a pair of quad socket 6200 blades outperforming a pair of quad socket 10-core Intel blades(2 sockets populated per blade) and at the same time the 6200 solution costs half as much (per VM). Though it's also comparing vSphere 4.1 vs 5.0, since the Opteron 6200 results seem to be the first vSphere 5.0 VMmark results posted. Also the Intel solution has twice the ram as the Opteron but still loses out.

Based on what I see it seems VMmark is more CPU bound than memory(capacity bound), which I suppose I can understand but still in the vast majority of situations the systems are not CPU bound. People tend to load up more on CPUs so they can get more memory capacity. I won't have real numbers for probably two months but I'm expecting CPU usage on this new cluster I am building to be at least half the amount of memory usage.

The change sounds Oracle-esque in licensing where they have fairly complicated decisions they made to determine how many "Oracle cores" you have on your physical processor.

I am traveling tonight to Atlanta to deploy a new vSphere cluster with Opteron 6100s, I was going to go with vSphere 5 because of the license limits on vSphere 4.1 not supporting 16 core processors. Now I see 4.1 does support it so I have about 48 hours to think about whether or not I want to change my mind. I do like vSphere 5's inclusion of LLDP support, more vCPUs per VM. Though really even now after I have been looking through what is in vSphere 5 I don't see anything game changing, nothing remotely, in my opinion like the change to vSphere 4.0 from ESX 3.5.

Weigh the benefits of what's new in vSphere 5 vs having the ability to have unlimited memory(well, up to 1TB, which for me is unlimited from a practical standpoint) in my hosts for no additional licensing cost...

I'm already licensed for vSphere 5 since we bought it after the deadline of the end of September.

Mad props to AMD for getting VMware to tweak their licensing.

Decisions, decisions..

Tagged as: , 10 Comments
4Nov/11Off

Mass defections away from Vmware coming?

TechOps Guy: Nate

I have expected as much since Vmware announced their abrupt licensing changes, in the same survey that I commented on last night for another reason, another site has reported on another aspect of it - nearly 40% of respondents are strongly considering moving away from Vmware in the coming year, 47% of which cite the licensing charges as the cause.

A Gartner analyst questions the numbers saying the move will be more complicated than people think and that will help Vmware retain share. I don't agree with that myself I suspect for most customers the move will probably not be complex at all.

Myself I was just recently trying to a dig a bit more into KVM trying to figure out what they use for storage, it seems for block based systems they are using GFS2 (can't find the link off hand)?  Though I imagine they can run on top of NFS too. I wonder what the typical deployment is for KVM when it comes to storage - is shared storage widely used or is it instead used mostly with local DAS?

I just read an interesting comment from a Xen user (I've never found Xen to be a compelling platform myself from a technology perspective, my own personal use of Xen has been mostly indirect by means of EC2 - which in general is an absolutely terrible experience), from a thread on slashdot about this topic -

Hyper-V is about 5 years behind and XenServer is about 3 years behind in terms of functionality and stability, mainly due to the fact that VMWare has been doing it for so long. VMWare is rock-solid and feature rich, and I'd love to use them. Currently we use XenServer, but with Citrix recently closing down their hardware API's and not playing nicely with anyone it looks like it is going to be the first casualty. I've been very upset by XenServer's HA so far, plain and simple it has sucked. I've had hosts reboot from crashes and the virtual machines go down, but the host thinks it has the machines and all of the other hosts think it has the machines. I've done everything XenServer has asked (HA quorum on a separate LUN, patches, etc), but it still just sucks. I've yet to see a host fail and the machines to go elsewhere, and the configuration is absolutely right and has been reviewed by Citrix. Maybe 6.0 will be better, but I just heard of major issues today with it. Hyper-V is really where the competition is going to come from, especially with how engrained it is in everything coming up. Want to run Exchange 2010 SP2? Recommendation is Hyper-V virtual machines.

God I miss VMWare.

I hope Vmware comes through for me and produces a price point for the basic vSphere services that is more cost effective(basically I'd like to see vSphere Standard edition with say something crazy like 256GB/socket vRAM with the current pricing). Though I'd settle for with whatever vRAM is available in enterprise plus.

So your actually paying more for the features.

I can certainly find ways to "make do" at a cost of $1,318/socket (w/1 year of enterprise support based on this pricing), for Standard edition (includes Vmotion and HA), vs $4,369/socket for Enterprise plus. Two sockets would be around $2,600 -- which is less than where vSphere 3 was, which was in the $3,000-3,500 range per pair of sockets for standard edition in 2007.

I'm not holding my breath though(since being kicked in the teeth with vSphere 5 licensing changes).

Time will tell if there are such defections, unlike Netflix where the commitment is basically zero, we'll have to wait for the next round of hardware refreshes to kick in to see what sort of impact there is from the licensing change. Speaking of hardware refreshes(that need vSphere 5) what the hell is taking so long with the Opteron 6200s, AMD?! I really thought they'd show up in September, then couldn't imagine them not showing up in October, and here we are at November, and still no word.

Vmware does need a "Netflix moment", a term that has been used quite a bit recently.

Tagged as: 3 Comments
3Nov/11Off

Virtualization Surveys, and insights from Xen creator

TechOps Guy: Nate

Two different stories caught my eye today, one from our friends at The Register about a survey by Veeam Software which surveyed several hundred companies with more than 1,000 employees which came to the conclusion that the average consolidation ratio was 5.1:1.

Across the four geographic regions and all the companies surveyed, the perceived consolidation ratio was 9.8 virtual machines per physical machine. But if you do the math and calculate the actual penetration ration, companies are actually squeezing only 5.1 virtual machines per host on average.

It could just be that some IT managers garbled their responses and have screwed up the data, but perhaps Veeam is on to something.

I saw that and did a virtual face palm. 5.1:1 ? Even 9.8:1 ? I think I was doing about 7-9:1 back in 2007 with my first ESX 3.0 systems on HP DL380 G5s with 8 cores and 16GB of ram. I came across a screen shot of one of those systems a couple weeks ago, brought back some good memories! (oh how iSCSI sucked on ESX 3! Speaking of which that brought up another memory I was at a dinner thrown by Dell I think a year or two ago, and they were pushing iSCSI for Vmware via some 3rd party storage/vmware consultants or something. The presenter kept trying to emphasize at the time how good iSCSI was and how there's no reason not to use it in Vmware and I kept reminding him(in front of the group) how much iSCSI sucked in ESX 3, which is why people were still hesitant to use it even shortly after vSphere came out, he didn't take it well, it was funny to watch)

My last VMware projects, always memory constrained of course were at the low end 14:1, and higher end maybe 24:1(64GB ram on hardware circa ~2006 - HP DL585 G1). This was without any benefits from transparent page sharing since that stuff never worked for me on Linux anyways, no swapping either. Just right sizing the VMs to the workloads, even if it meant as little as 96MB of memory for the VM.

My next project I'll be surprised if we can't get at least 30-40:1.

Seeing numbers like 5:1 makes me think back to when Vmware went around to their customers and saw what they were actually using before announcing their new price hikes for memory, and they set the license limits to what their typical customer was using.

5:1 ? Sad. Unless your really running a CPU bound application then that's fine, not many of those out there though.

Second article was this one, where one of the "Godfathers" of Xen said one of the great things about virtualization is improving security with workload isolation.

Isolation -- the ability to restrict what computing goes on in a given context -- is a fundamental characteristic of virtualization that can be exploited to improve trustworthiness of processes on a physical system even if other processes have been compromised, says Crosby, a creator of the open source hypervisor and a founder of startup Bromium, which is looking to use Xen features to boost security.

I couldn't agree more, which is why per-VM licensing strategies really piss me off, because it works direct opposition to that strategy. At the very least have dual licensing so customers can license based on VM or based on hardware.

On a side note, I just saw an interesting interview on CNBC, where someone was talking about the IPO of Groupon which I believe is supposed to go live tomorrow. Groupon is apparently trying to raise about $510M in a very paltry offering of something like less than 5% of their company. The funny part is apparently they owe about $505M in short term liabilities to vendors and stuff. The person being interviewed says if Groupon doesn't pull this off soon they'll go broke practically overnight.

They are also reporting there are 11 book runners on the deal, more than any other US IPO in history. I don't know what a book runner is but it sounds fishy to have so many runners for such a small allocation of stock.

Burn, baby, burn.

 

Tagged as: , , 1 Comment