Diggin' technology every day

April 23, 2012

MS Shooting themselves in their mobile feet again?

Filed under: General,Random Thought — Tags: — Nate @ 10:03 am

I’ve started to feel sorry for Microsoft recently. Back in the 90s and early 00s I was firmly in the anti MS camp, but the past few years I have slowly moved out of that camp mainly because MS isn’t the beast that it used to be. It’s a company that just fumbles about at what it does now and doesn’t appear to be much of a threat anymore. It has a massive amount of cash still but for some reason can’t figure out how to use it. I suppose the potential is still there.

Anyways I was reading this article on slashdot just now about Skype on Windows phone 7. The most immediate complaint was the design of WP7 prevents skype from receiving calls while in the background because with few exceptions like streaming media and stuff any background app is suspended. There is no multi tasking on WP7? As some others I have seen notice – I haven’t seen a WP7 phone on anyone yet, so haven’t seen the platform in action. Back when what was Palm was gutted last year and the hardware divisions shut down many people were saying how WP7 was a good platform to go to from WebOS, especially the 7.5 release which was pretty new at the time.

I don’t multi task too much on my phone or tablets, but it’s certainly nice to have the option there. WebOS has a nice messaging interface with full skype integration so skype can run completely in the background. I don’t use it in this way mainly because the company I’m at uses Skype as a sort of full on chat client, so the device would be hammered by people talking (to other people) in group chats which is really distracting. Add to that the audible notifications for messaging on WebOS applies to all protocols, so I use a very loud panic alarm for SMS messages for my on call stuff, and having that sound off every couple of seconds when a skype discussion is going is not workable! So I keep it off unless I specifically need it. 99.9% of my skype activity is work related. Otherwise I wouldn’t even use the thing. Multi tasking has been one of the biggest selling points of WebOS since it was released several years ago, really seeming to be the first platform to support it (why it took even that long sort of baffles me).

So no multi tasking, and apparently no major upgrades coming either – I’ve come across a few articles like this one that say it is very unlikely that WP7 users will be able to upgrade to Windows 8/WP8. Though lack of mobile phone upgrades seems pretty common, Android in particular has had some investigations done to illustrate the varying degrees when or if the various handsets get upgrades. WebOS was in the same boat here, with the original phones not getting past version 1.4.5 or something, the next generation of phones not getting past 2.x, and only the Touchpad (with a mostly incompatible UI for phones apparently) having 3.x. For me, I don’t see anything in WebOS 3.x that I would need on my WebOS 2.x devices, and I remember when I was on WebOS 1.x I didn’t see anything in 2.x that made me really want to upgrade, the phone worked pretty well as it was. iOS seems to shine the best in this case providing longer term updates for what (has got to be) is a very mature OS at this point.

But for a company that has as much resources as Microsoft, especially given the fact that they seem to be maintaining tighter control over the hardware the phones run on, it’s really unfortunate that they may not be willing/able to provide the major update to WP8.

Then there was the apparent ban Microsoft put on all players, preventing them from releasing multi core phones in order to give Nokia time to make one themselves, instead of giving even more resources to making sure they could succeed they held the other players back, which not only hurts all of their partners (minus Nokia, or not?) but of course hurt the platform as a whole.

I’m stocked up on WebOS devices to last me a while on a GSM network. So I don’t have to think about what I may upgrade to in the future, I suspect my phones might outlive the network technologies they use.

To come back to the original topic – lack of multi tasking – specifically the inability for Skype to operate in the background is really sad. Perhaps the only thing worse is it took this long for Skype to show up on the platform in the first place. Even the zombie’d WebOS has had Skype for almost a year on the Touchpad, and if you happened to have a Verizon Pre2 phone at the time, Skype for that was released just over a year ago(again with full background support). I would of thought given Microsoft bought Skype about a year ago that they would of/ could of had a release for WP7 within a very short period of time(30 days?). But at least it’s here for the 8 people that use the phone, even if the version is crippled by the design of the OS. Even Linux has had Skype (which I use daily) for longer. There have been some big bugs in Skype on WebOS – most of them I think related to video/audio, doesn’t really impact me since most of my skype usage is for text chat.

While I’m here chatting about mobile I find it really funny, and ironic that apparently Microsoft makes more money off of Android than it does it’s own platform(estimated to be five times more last year), and Google apparently makes four times more money off of iOS than it’s own platform.

While there are no new plans for WebOS hardware at this point – it wouldn’t surprise me at all if people inside HP were working to make the new ARM-based WP8 tablets hackable in some way to get a future version of WebOS on them, even though MS is going to do everything they can to prevent that from happening.

April 20, 2012

Oracle not afraid to leverage Intel architecture

Filed under: Storage — Tags: , — Nate @ 11:28 am

I have bitched and griped in the past about how some storage companies waste their customer’s time, money and resources by not leveraging the Intel/Commodity CPU architecture that some of them tout so heavily.

Someone commented on here in response to my HP SPC-2 results pointing out that the new Oracle 7240 ZFS system has some new SPC-2 results that are very impressive, and today I stumble upon an article from our friends at The Register which talks about a similar 7240 system being tested in the SpecSFS benchmark with equally impressive results.

The main thing missing to me with the NFS results is the inability to provide them over a single file system(not just a global name space as NetApp tries to advertise but truly a single file system), oh, and of course the disclosure of costs with the test.

This 7240 system must be really new, when I went to investigate it recently the product detail pages on Oracle’s own site were returning 404s, but now they work.

I’ll come right out and say it – I’ve always been a bit leery of the ZFS offerings for a true high availability solution, I wrote a bit about this topic a while ago. Though that article focused mainly on people deploying ZFS on cheap crap hardware because they think they can make an equivalent enterprise offering by slapping some software on top of it.

I’m also a Nexenta customer for a very small installation (NAS only, back end is 3PAR). I know Nexenta and Oracle ZFS are worlds apart but at least I am getting some sort of ZFS exposure. ZFS has a lot of really nice concepts, it’s just a matter of how well it works in practice.

For example I was kind of shocked to learn that if a ZFS file system gets full you can’t delete files off of it. I saw one post of a person saying they couldn’t even mount the file system because it was full. Recently I noticed on one of my Nexenta volumes a process that kicks in when a volume gets 50% full. They create a quota’d file system on the volume of 100MB in size, so that when/if the file system is full you can somehow remove this data and get access to your file system again. Quite a hack.

I’ve seen another thread or two about existing Sun ZFS customers who have gotten very frustrated with the lack of support Oracle has given them since Oracle took the helm.

ANYWAYS, back to the topic of exploiting x86-64 architecture. Look at this –

ZFS Storage array base specifications

Clearly Oracle is embracing the processing and memory power that is available to them and I have to give them mad props for that – I wish other companies did the same, the customer would be so much better off.

They do it also by keeping the costs low (relative to the competition anyways), which is equally impressive. Oracle is a company of course that probably likes to drive margins more than most any other company out there, so it is nice to see them doing this.

My main question is – what of Pillar ? What kind of work is being done there? I haven’t noticed anything since Pillar went home to the Larry E mothership. Is it just dieing on the vine? Are these ZFS systems still not suitable for certain situations which Pillar is better at supporting?

Anyways, I can’t believe I’m writing about an Oracle solution twice in the same week but these are both nice things to see come out of one of the bigger players out there.

April 19, 2012

A Terabit of application switching throughput

Filed under: Networking — Tags: — Nate @ 6:02 pm

That’s a pretty staggering number to me. I had some friends that worked at a company that is now defunct (acquired by F5) called Crescendo Networks.

One of their claims to fame was the ability to “cluster” their load balancers so that you could add more boxes on the fly and it would just go faster, instead of having to rip and replace, or add more boxes and do funky things with DNS load balancing in trying to balance between multiple groups of load balancers

Crescendo's Scale out design - too bad the company didn't last long enough to see anyone leverage a 24-month expansion

Another company, A10 Networks (who is still around, though I think Brocade and F5 are trying to make them go away), whom introduced similar technology about a year ago called virtual chassis (details are light on their site). There may be other companies that have similar things too – they all seem gunning for the F5 VIPRION, which is a monster system, they took a chassis approach and support up to 4 blades of processing power. Then they do load balancing of the blades themselves to distribute the load. I have a long history in F5 products and know them pretty well, going back to their 4.x code base which was housed in (among other things) generic 4U servers with BSDI and generic motherboards.

I believe Zeus does it as well, I have used Zeus but have not gone beyond a 2 node cluster. I forgot, Riverbed bought them and changed the name to Stingray. I think Zeus sounds cooler.

With Crescendo the way they implemented their cluster was quite basic, it was very similar to how other load balancing companies improved their throughput for streaming media applications – some form of direct response from the server to the client, instead of having the response go back through the load balancer a second time. Here is a page from a long time ago on some reasons why you may not want to do this. I’m not sure how A10 or Zeus do it.

I am a Citrix customer now, having heard some good things about them over the years, but never having tried the product. I found it curious why the likes of Amazon and Google gobble up Netscaler appliances like M&Ms when for everything else they seem to go out of their way to build things themselves. I know Facebook is a big user of the F5 VIPRION system as well.

You’d think (or at least I think) companies like this – if they could leverage some sort of open source product and augment it with their own developer resources they would – I’m sure they’ve tried – and maybe they are using such products in certain areas. My information about who is using what could be out of date. I’ve used haproxy(briefly), nginx(more) at least for load balancers and wasn’t happy with either product. Give me a real load balancer please! Zeus seems to be a pretty nice platform – and open enough that you can run it on regular server hardware, rather than being forced into buying fixed appliances.

Anyways, I had a ticket open with Citrix today about a particular TLS issue regarding SSL re-negotiation, after a co-worker brought it to my attention that our system was reported as vunerable by her browser / plugins. During my research I came across this excellent site which shows a ton of useful info about a particular SSL site.

I asked Citrix how I could resolve the issues the site was reporting and they said the only way to do it was to upgrade to the latest major release of code (10.x). I don’t plan to do that, resolving this particular issue doesn’t seem like a big deal (though would be nice – not worth the risk of using this latest code so early after it’s release for this one reason alone). Add to that our site is fronted by Akamai  (which actually posted poorer results on the SSL check than our own load balancers). We even had a “security scan” run against our servers for PCI compliance and it didn’t pick up anything related to SSL.

Anyways, back on topic. I was browsing through the release notes for the 10.x code branch and saw that Netscaler now supports clustering as well

You can now create a cluster of nCore NetScaler appliances and make them work together as a single system image. The traffic is distributed among the cluster nodes to provide high availability, high throughput, and scalability. A NetScaler cluster can include as few as 2 or as many as 32 NetScaler nCore hardware or virtual appliances.

With their top end load balancers tapping out at 50Gbps, that comes to 1.6Tbps with 32 appliances. Of course you won’t reach top throughput depending on your traffic patterns so taking off 600Gbps seems reasonable, still 1Tbps of throughput! I really can’t imagine what kind of service could use that sort of throughput at one physical site.

It seems, at least compared to the Crescendo model the Citrix model is a lot more like a traditional cluster, probably a lot more like a VIPRION design –

The NetScaler cluster uses Equal Cost Multiple Path (ECMP), Linksets (LS), or Cluster Link Aggregation Group (CLAG) traffic distribution mechanisms to determine the node that receives the traffic (the flow receiver) from the external connecting device. Each of these mechanisms uses a different algorithm to determine the flow receiver.


Citrix Netscaler Traffic Flow

The flow reminds me a lot of the 3PAR cluster design actually.

My Thoughts on Netscaler

My experience so far with the Netscalers is mixed, some things I really like such as an integrated mature SSL VPN (note I said mature! well at least for windows – nothing for Linux and their Mac client is buggy and incomplete), application aware MySQL and DNS load balancing, and a true 64-bit multithreaded, shared memory design. I also really like their capacity on demand offering as well. These boxes are always CPU bound, so to have the option to buy a technically lower end box with the same exact CPU setup as a higher end box (that is rated for 2x the throughput) is really nice. It means I can turn on more of those CPU heavy features without having to fork over the cash for a bigger box.

Citrix nCore

While for the most part, at least last I checked – F5 was still operating on 32-bit TMOS (on top of 64-bit Linux kernels) leveraging a multi process design instead of a multi threaded design. So they were forced to add some hacks to load balance across multiple CPUs in the same physical load balancer in order to get the system to scale more (and there has been limitations over the years as to what could actually be distributed over multiple cores and what features were locked to a single core — as time has gone on they have addressed most of those that I am aware of). One in particular I remember (which may be fixed now I’m not sure – would be curious to know if it was fixed how they fixed it) – was that each CPU core had it’s own local memory with no knowledge of other CPUs – which means when doing HTTP caching  – each CPU had to cache the content individually – massively duplicating the cache and slashing the effectiveness of the memory you had on the box. This was further compounded by the 32-bitness of TMM itself in it’s limited ability to address larger amounts of memory.

In any case the F5 design is somewhat arcane, they chose to bolt on software features early on instead of re-building the core. The strategy seems to have paid off though from a market share and profits standpoint, just from a technical standpoint it’s kinda lame 🙂

To be fair there are some features in the multi threaded Citrix Netscaler that are not available that are available in the older legacy code.

Things I don’t like about the Netscaler include their Java GUI which is slow as crap (they are working on a HTML 5 GUI – maybe that is in v10?), I mean it can literally take about 4 minutes to load up all of my server groups (Citrix term for F5 Pools). F5 I can load them in about half a second. I think the separation of services with regards to content switching on Citrix is, well pretty weird to say the least. If I want to do content filtering I have to have an internal virtual server and an external virtual server, the external one does the content filtering and forwards to the internal one. With F5 it was all in one (same for Zeus too). The terminology has been difficult to adjust to vs my F5 (and some Zeus) background.

I do miss the Priority Activation feature F5 has, there is no similar feature on Citrix as far as I know (well I think you can technically do it but the architecture of the Netscaler makes it a lot more complex). This feature allows you to have multiple pools of servers within a single pool at different priorities. By default the load balancer sends to the highest (or lowest? I forgot, it’s been almost 2 years) group of servers, if that group fails then it goes to the next, and the next. I think you can even specify the minimum number of nodes to have in a group before it fails over entirely to the next group.

Not being able to re-use objects with the default scripting language just seems brain dead to me, so I am using the legacy scripting language.

So I do still miss F5 for some things, Zeus for some other things, though Netscaler is pretty neat in it’s own respects. F5 obviously has a strong presence where I spent the last decade of my life in and around Seattle, being that it was founded and has it’s HQ in Seattle. Still have a buncha friends over there. Some pretty amazing stories I’ve heard come out of that place, they grew so fast, it’s hard to believe they are still in one piece after all they’ve been through, what a mess!

If you want to futz around with a Netscaler you have the option of downloading their virtual appliance (VPX) for free – I believe it has a default throughput limit of 1Mbit. Upgrades to as high as 3Gbps. Though the VPX is limited to two CPU cores last I recall. F5 and A10 have virtual appliances as well.

Crescendo did not have a virtual appliance, which is one of the reasons I wasn’t particularly interested in perusing their offering back when they were around. The inside story of the collapse of Crescendo is the stuff geek movies are made out of. I won’t talk about it here but it was just amazing to hear what happened.

The load balancing market is pretty interesting to see the different regions and where various players are stronger vs weaker. Radware for example is apparently strong over on the east coast but much less presence in the west. Citrix did a terrible job marketing the Netscaler for many years (a point they acknowledged to me), then there are those folks out there that still use Cisco (?!) which just confuses me.  Then there are the smaller guys like A10, Zeus, Brocade – Foundry networks (acquired by Brocade, of course) really did themselves a disservice when they let their load balancing technology sit for a good five years between hardware refreshes, they haven’t been able to recover from that from what I’ve seen/heard. They tried to pitch me their latest iron a couple of years ago after it came out – only to find out that it didn’t support SSL at the time – I mean come on — of course they later fixed that lack of a feature but it was too late for my projects.

And in case you didn’t know – Extreme used to have a load balancer WAY BACK WHEN. I never used it. I forget what it’s called off the top of my head. Extreme also partnered with F5 in the early days and integrated F5 code into their network chipsets so their switches could do load balancing too (the last switch that had this was released almost a decade ago – nothing since). Though the code in the chipsets was very minimal and not useful for anything serious.

April 12, 2012

3PAR Zero detection and snapshots

Filed under: Storage — Tags: , — Nate @ 9:00 am


I’ve been holding onto this post to get confirmation from HP support, now that I have it, here goes something’




A commenter Karl made a comment that made my brain think about this another way, and it turns out I was stupid in my original assessment as for the system storing the data for the snapshot. For some reason I was thinking the system was storing the zeros, but rather it was storing the data the zeros were replacing.

So I’m a dumbass for thinking that. homer moment *doh*

BUT the feature still appears broken in that what should happen is if there is in fact 200GB of data written to the snapshot that implies that zeros overwrote 200GB worth of non zero’d data – and that data should of been reclaimed from the source volume. In this case it was not, only a tiny fraction (1,536MB of logical storage or 12x128MB chunks of data). So at the very least the bulk of the data usage should of been moved from the source volume to the snapshot (snapshot space is allocated separate from the source volume so it’s easy to see which is using what). CURRENTLY the volume is showing 989GB of reserved space on the array with 120GB of written snapshot data and 140GB of file system data or around 260GB of total data which should come out to around 325GB of physical data in RAID 5 3+1, not 989thGB. But that space reclaiming technology is another feature thin copy reclamation. Which reclaims space from deleted snapshots.

So, sorry for being a dumbass for the original premise of the post, for some reason my brain got confused by the results of the tests, and it wasn’t until Karl’s comment that it made me think about it from the other angle.

I am talking to some more technical / non support people on Monday about this.

And thanks Karl 🙂



I got some good information from senior technical folks at 3PAR and it turns out the bulk of my problems are related to bugs in how one part reports raw space utilization (resulting in wildly inaccurate info), and a bug with regards to a space reclamation feature that was specifically disabled by a software patch on my array in order to fix another bug with space reclamation. So the fix for that is to update to a newer version of code which has that particular problem fixed for good(I hope?). I think I’d never get that kind of information out of the technical support team.

So in the end not much of a big issue after all, just confused by some bugs and functionality that was disabled and me being stupid.





A lot of folks over the years have tried to claim I am a paid shill for 3PAR, or Extreme or whatever. All I can say is I’m not compensated by them for my posts in any direct way (maybe I get better discounts on occasion or something I’m not sure how that stuff factors in but in any case those benefits go to my companies rather than me).

I do knock them when I think they need to be knocked though. Here is something that made me want to knock 3PAR, well more than knock, more like kick in the butt, HARD and say W T F.

I was a very early adopter of the T-class of storage system, getting it in house just a couple months after it was released. It was the first system from them which had the thin built in – the thin reclamation and persistence technology integrated into the ASIC – only I couldn’t use it because the software didn’t exist at the time.


3PAR "Stay Thin" strategy - wasn't worth $100,000 in licensing for the extra 10% of additional capacity savings for my first big array.


That was kind of sad but it could of been worse – the competition that we were evaluating was Hitachi who had just released their AMS2000-series of products, literally about a month after the T-class was released. Hitachi had no thin provisioning support what-so-ever on the AMS2000 when it was launched. That came about seven months later. If you required thin provisioning at the time you had to buy a USP or (more common for this scenario due to costs at the time) a USP-V, which supported TP, and put the AMS2000 behind it. Hitachi refused to even give us a ballpark price as to the cost of TP on the AMS2000 whenever it was going to be released. I didn’t need an exact price, just tell me is it going to be $5,000, or $25,000 or maybe $100,000 or more ? Should be a fairly simple process, at least from a customer perspective especially given they already had such licensing in place on their bigger platform. In the end I took that bigger platform’s licensing costs(since they refused to give that to me too) and extrapolated what the cost might look like on the AMS line. I got the info from Storage Mojo‘s price list and basically took their price and cut it in half to take into account discounts and stuff. We ended up obviously not going for HDS so I don’t know what it would of really cost us in the end.

OK, steering the tangent closer to the topic again..bear with me.

Which got me wondering – given it is an ASIC – and not a FPGA they really have to be damn sure it works when they ship product otherwise it can be an expensive proposition to replace the ASICs if there is a design problem with the chip, after all the CPUs aren’t really in the data path of the stuff flowing through the system so it would be difficult to work around ASIC faults in software(if it was possible at all).

So I waited, and waited for the new thin stuff to come out, thinking since I had thin provisioning licensed already I would just install the new software and get the new features.

Then it was released – more than a year after I got the T400 but it came with a little surprise – additional licensing costs associated with the software – something nobody ever told me of (sounds like it was a last minute decision). If I recall right, for the system I had at the time if we wanted to fully license thin persistence it was going to be an extra $100,000 in software. We decided against it at the time, really wasn’t worth the price for what we’d reclaim. Later on 3PAR offered to give us the software for free if we bought another storage array for disaster recovery (which we were planning to) – but the disaster recovery project got canned so we never got it.

Another licensing feature of this new software was in order to get to the good stuff, the thin persistence you had to license another product  – Thin Conversion whether you wanted it or not (I did not – really you might only need Thin Conversion if your migrating from a non thin storage system).

Fast forward almost two years and I’m at another company with another 3PAR, there was a thin provisioning licensing snafu with our system so for the past few months(and for the next few) I’m operating on an evaluation license which basically has all the features unlocked – including the thin reclamation tech. I had noticed recently that some of my volumes are getting pretty big – per the request of the DBA we have I agreed to make these volumes quite large – 400GB each, what I normally do is create the physical volume at 1 or 2TB (in this case 2TB), then I create a logical volume that is more in line with what the application actually needs(which may be as low as say 40GB for the database), then grow it on line as the space requirements increase.

3PAR’s early marketing at least tried to communicate that you can do away with volume management altogether.  While certainly technically possible, I don’t recommend that you take that approach. Another nice thing about volume management is being able to name the volumes with decent names, which is very helpful when working with moving snapshots between systems, especially with MPIO and multiple paths and multiple snapshots on one system, with LVM it’s simple as can be, without – I really don’t want to think about it. Only downside is you can’t easily mount a snapshot back to the originating system because the LVM UUID will conflict and changing that ID is not (or was not, been a couple years since I looked into it) too easy, blocking access to the volume. Not a big deal though the number of times I felt I wanted to do that was once.

This is a strategy I came up with going back almost six years to my original 3PAR box and has worked quite well over the years. Originally, resizing was an off line operation since the kernel that we had at the time (Red Hat Enterprise 4.x) did not support on line file system growth, it does (and has) for a while now, I think since maybe 4.4/4.5 and certainly ever since v5.

Once you have a grasp as to the growth pattern of your application it’s not difficult to plan for. Getting the growth plan in the first place could be complex though given the dedicate on write technology, you had to (borrowing a term from Apple here) think different. It obviously wasn’t enough to just watch how much disk space was being consumed on average, you had to take into account space being written vs being deleted and how effective the file system was at re-utilizing deleted blocks. In the case of MySQL – being as inefficient as it is, you had to also take into account space utilization required by things like ALTER TABLE statements, in which MySQL makes a copy of the entire table with your change then drops the original. Yeah, real thin friendly there.

Given this kind of strategy it is more difficult to gauge just exactly how much your saving with thin provisioning, I mean on my original 3PAR I was about 300-400% over subscribed(which at the time was considered extremely high – I can’t count the number of hours I spent achieving that), I think I recall at that conference I was at David Scott saying the average customer was 300% oversubscribed. On my current system I am 1300% over subscribed. Mainly because I got a bunch of databases and I make them all 2TB volumes, I can say with a good amount of certainty that they will probably never get to remotely 2TB in size but it doesn’t affect me otherwise so I give it what I can (all my boxes on this array are VMware ESX 4.1 which of course has a 2TB limit – the bulk of these volumes are raw device mapped to leverage SAN-based snapshots as well as, to a lesser extent individually manage and monitor space and i/o metrics).

At the time my experience was compounded by the fact that I was still very green when it came to storage (I’d like to think I am more blue now at least whatever that might mean). Never really having dabbled much in it prior, choosing instead to focus on networking and servers. All big topics, I couldn’t take them all on at once 🙂

So my point is – even though 3PAR has had this technology for a while now – I really have never tried it. In the past couple months I have run the Microsoft sdelete tool on the 3 windows VMs I do have to support my vCenter stuff(everything else is Linux) – but honestly I don’t think I bothered to look to see if any space was reclaimed or not.

Now back on topic

Anyways, I have this one volume that was consuming about 300GB of logical space on the array when it had maybe 140GB of space written to the file system (which is 400GB). Obviously a good candidate for space reclamation, right? I mean the marketing claims you can gain 10% more space, in this case I’m gaining a lot more than that!

So I decided – hey how bout I write a basic script that writes out a ton of zeros to the file system to reclaim this space (since I recently learned that the kernel code required to do fancier stuff like fstrim [updated that post with new information at the end since I originally wrote it] doesn’t exist on my systems). So I put a basic looping script in to write 100MB files filled with zeros from /dev/zero.

I watched it as it filled up the file system over time (I spaced out the writing as to not flood my front end storage connections), watching it reclaim very little space – at the end of writing roughly 200GB of data it reclaimed maybe 1-2GB from the original volume. I was quite puzzled to say the least. But that’s not the topic of this post now is it.

I was shocked, awed, flabbergasted by the fact that my operation actually CONSUMED an additional 200GB of space on the system (space filled with zeros). Why did it do this? Apparently because I created a snapshot of the volume earlier in the day and the changes were being kept track of thus consuming the space. Never mind the fact that the system is supposed to drop the zeros even if it doesn’t reclaim space – it doesn’t appear to do so when there is a snapshot(s) on the volume, so the effects were a double negative – didn’t reclaim any space from the original, and actually consumed a ton more space (more than 2x the original volume size) due to the snapshot.

Support claims minimal space was reclaimed by the system because I wrote files in 100MB blocks instead of 128MB blocks. I find it hard to believe out of 200GB of files I wrote that there was not more 128MB contiguous blocks of space of zeros. But I will try the test again with 128MB files on that specific volume after I can contact the people that are using the snapshot to delete the snapshot and re-create it to reclaim that 200GB of space. Hell I might as well not even use the snapshot and create a full physical copy of the volume.

Honestly I’m sort of at a loss for words as to how stupid this is. I have loved 3PAR through thick and thin for a long time (and I’ve had some big thicks over the years that I haven’t written about here anyways..), but this one I felt compelled to. A feature so heavily marketed, so heavily touted on the platform is rendered completely ineffective when a basic function like snapshots is in use. Of course the documentation has nothing on this, I was looking through all the docs I had on the technology when I was running this test on Thursday and basically what it said was enable zero detection on the volume (disabled by default) and watch it work.

I’ve heard a lot of similar types of things (feature heavily touted but doesn’t work under load or doesn’t work period) on things like NetApp, EMC etc. This is a rare one for 3PAR in my experience at least. My favorite off the top of my head was NetApp’s testing of an EMC CX-3 performance with snapshots enabled. That was quite a shocker to me when I first saw it. Roughly a 65% performance drop over the same system without snapshots.

Maybe it is a limitation of the ASIC itself – going back to my speculation about design issues and not being able to work around them in software. Maybe this limitation is not present in the V-class which is the next generation ASIC. Or maybe it is, I don’t know.

HP Support says this behavior is as designed. Well I’m sure more than one person out there would agree it is a stupid design if so. I can’t help but think it is a design flaw, not an intentional one – or a design aspect they did not have time to address in order to get the T-series of arrays out in a timely manor(I read elsewhere that the ASIC took much longer than they thought to design, which I think started in 2006 – and was at least partially responsible for them not having PCI express support when the ASIC finally came out). I sent them an email asking if this design was fixed in the V-Class, will update if they respond. I know plenty of 3PAR folks (current and former) read this too so they may be able to comment (anonymously or not..).

As for why more space was not reclaimed in the volume, I ran another test on Friday on another volume without any snapshots which should of reclaimed a couple hundred gigs but according to the command line it reclaimed nothing, support points me to logs saying 24GB was reclaimed, but that is not reflected in the command line output showing the raw volume size on the system. Still working with them on that one. My other question to them is why 24GB ? I wrote zeros to the end of the file system, there was  0 bytes left. I have some more advanced logging things to do for my next test.

While I’m here I might as well point out some of the other 3PAR software or features I have not used, let’s see

  • Adaptive optimization (sub LUN tiering – licensed separately)
  • Full LUN-based automated tiering (which I believe is included with Dynamic optimization) – all of my 3PAR arrays to-date have had only a single tier of storage from a spindle performance perspective though had different tiers from RAID level perspectives
  • Remote Copy – for the situations I have been in I have not seen a lot of value in array-based replication. Instead I use application based. The one exception is if I had a lot of little files to replicate, using block based replication is much more efficient and scalable. Array-based replication really needs application level integration, and I’d rather have real time replication from the likes of Oracle(not that I’ve used it in years, though I do miss it, really not a fan of MySQL) or MySQL then having to co-ordinate snapshots with the application to maintain consistency (and in the case of MySQL there really is no graceful way to take snapshots, again, unlike Oracle – I’ve been struggling recently with a race condition somewhere in an App or in MySQL itself which pretty much guarantees MySQL slaves will abort with error code 1032 after a simple restart of MySQL – this error has been shown to occur upwards of 15 minutes AFTER the slave has gotten back in sync with the master – really frustrating when trying to deal with snapshots and getting those kinds of issues from MySQL). I have built my systems, for the most part so they can be easily rebuilt so I really don’t have to protect all of my VMs by replicating their data, I just have to protect/replicate the data I need in order to reconstruct the VM(s) in the event I need to.
  • Recovery manager for Oracle (I licensed it once on my first system but never ended up using it due to limitations in it not being able to work with raw device maps on vmware – I’m not sure if they have fixed that by now)
  • Recovery manager for all other products (SQL server, Exchange, and VMware)
  • Virtual domains (useful for service providers I think mainly)
  • Virtual lock (used to lock a volume from having data deleted or the volume deleted for a defined period of time if I recall right)
  • Peer motion

3PAR Software/features I have used (to varying degrees)

  • Thin Provisioning (for the most part pretty awesome but obviously not unique in the industry anymore)
  • Dynamic Optimization (oh how I love thee) – the functionality this provides I think for the most part is still fairly unique, pretty much all of it being made possible by the sub disk chunklet-based RAID design of the system. Being able to move data around in the array between RAID levels, between tiers, between regions of the physical spindles themselves (inner vs outer tracks), really without any limit as to how you move it (e.g. no limitations like aggregates in the NetApp world), all without noticeable performance impact is quite amazing (as I wrote a while back I ran this process on my T400 once for four SOLID MONTHS 24×7 and nobody noticed).
  • System Tuner (also damn cool – though never licensed it only used it in eval licenses) – this looks for hot chunklets and moves them around automatically. Most customers don’t need this since the system balances itself so well out of the box. If I recall right, this product was created in response to a (big) customer’s request mainly to show that it could be done, I am told very few license it since it’s not needed. In the situations where I used it it ended up not having any positive(or negative) effect on the situation I was trying to resolve at the time.
  • Virtual Copy (snapshots – both snapshots and full volume copies) – written tons of scripts to use this stuff mainly with MySQL and Oracle.
  • MPIO Software for MS windows – worked fine – really not much to it, just a driver. Though there was some licensing fee 3PAR had to pay for MS for the software or development efforts they leveraged to build it – otherwise the drivers could of been free.
  • Host Explorer (pretty neat utility that sends data back through the SCSI connection from the server to the array including info like OS version, MPIO version, driver versions etc – doesn’t work on vSphere hosts because VMware hasn’t implemented support for those SCSI commands or something)
  • System Reporter – Collects a lot of data, though from a presentation perspective I much prefer my own cacti graphs
  • vCenter Plugin for the array – really minimal set of functionality compared to the competition – a real weak point for the platform. Unfortunately it hasn’t changed much in the almost two years since it was released – hoping it gets more attention in the future, or even in the present. As-is, I consider it basically useless and don’t use it. I haven’t taken advantage of the feature on my own system since I installed the software to verify that it’s functional.
  • Persistent Cache – an awesome feature in 4+ node systems that allows re-mirroring of cache to another node in the system in the event of planned or unplanned downtime on one or more nodes in the cluster (while I had this feature enabled – it was free with the upgrade to 2.3.1 on systems with 4 or more nodes I never actually had a situation where I was able to take advantage of it before I left the company with that system).
  • Autonomic Groups – group volumes and systems together and make managing mappings of volumes to clusters of servers very easy. The GUI form of this is terrible and they are working to fix it. I literally practically wiped out my storage system when I first tried this feature using the GUI. It was scary the damage I did in the short time I had this(even more so given the number of years I’ve used the platform for). Fortunately the array that I was using was brand new and had really no data on it (literally). Since then – CLI for me, safer and much more clear as to what is going on. My friends over at 3PAR got a lot of folks involved over there to drive a priority plan to fix this functionality which they admit is lacking. What I did wipe out were my ESX boot volumes, so I had to re-create the volumes and re-install ESX. Another time I wiped out all of my fibre channel host mappings and had to re-establish those too. Obviously on a production system this would of resulted in massive data loss and massive downtime. Fortunately, again it was still at least 2 months from being a production system and had a trivial amount of data. When autonomic groups first came out I was on my T400 with a ton of existing volumes, migrating to use existing volumes to groups likely would of been disruptive so I only used groups for new resources, so I didn’t get much exposure to the feature at the time.

That turned out to be A LOT longer than I expected.

This is probably the most negative thing I’ve said about 3PAR here. This information should be known though. I don’t know how other platforms behave – maybe it’s the same. But I can say in the nearly three years I have been aware of this technology this particular limitation has never come up in conversations with friends and contacts at 3PAR. Either they don’t know about it either or it’s just one of those things they don’t want to admit to.

It may turn out that using SCSI UNMAP to reclaim space, rather than writing zeros is much more effective thus rendering the additional costs of thin licensing worth while. But not many things support that yet. As mentioned earlier, VMware specifically recommends disabling support for UNMAP in ESX 5.0 and has disabled it in subsequent releases because of performance issues.

Another thing that I found interesting, is that on the CLI itself, 3PAR specifically reccomends keeping Zero detection disabled unless your doing data migration because under heavy load it can cause issues –

Note: there can be some performance implication under extreme busy systems so it is recommended for this policy to be turned on only  during Fat to Thin and re-thinning process and be turned off during normal operation.

Which to some degree defeats the purpose? Some 3PAR folks have told me that information is out of date and only related to legacy systems. Which didn’t really make sense since there are no legacy systems that support zero detection as it is hard wired into the ASIC. 3PAR goes around telling folks that zero detection on other platforms is no good because of the load it introduces but then says that their system behaves in a similar way. Now to give them credit I suspect it is still quite likely a 3PAR box can absorb that hit much better than any other storage platform out there, but it’s not as if your dealing with a line rate operation, there clearly seems to be a limit as to what the ASIC can process. I would like to know what an extremely busy system looks like – how much I/O as a percentage of controller and/or disk capacity?

Bottom line – at this point I’m even more glad I didn’t license the more advanced thinning technologies on my bigger T400 way back when.

I suppose I need to go back to reclaiming space the old fashioned way – data migration.

4,000+ words woohoo!

April 10, 2012

Oracle first to release 10GbaseT as standard ?

Filed under: Networking — Tags: , , — Nate @ 2:21 pm

Sun has had some innovative x86-64 designs in the past, particularly on the AMD front. Of course Oracle dumped AMD a while back, and focus on Intel, despite that their market share continues to collapse (in good part probably because they screwed  over many of their partners from what I recall by going direct with so many customers, among other things).

In any case they launched a new server line up today, which otherwise is not really news since who uses Sun/Oracle x86-64 boxes anyways? But I thought the news was interesting since it seems to include 4 x 10GbaseT ports on board as standard.

Rear of Sun Fire X4170 M3 Server

The Sun Fire X4170 M3 and the X4270 M3 systems both appear to have quad port 10GbaseT on the motherboard. I haven’t heard of any other severs yet that have this as standard. Out of curioisity if you know of others I’d be interested to hear who they are.

The data sheet is kind of confusing, saying it has 4 onboard 10GbE ports but then it says Four 100/1,000/10 Base-T Ethernet ports in the network section below. Of course it was frequent to have 10/100/1000 BaseT before, so after seeing the physical rear of the system it seems convincing that they are using 10GbaseT.

Nice goin’ Oracle.


What’s wrong with this picture?

Filed under: Datacenter,Virtualization — Tags: , — Nate @ 7:36 am

I was reading this article from our friends at The Register which has this picture for an entry level FlexPod from NetApp/Cisco.

It just seems wrong. I mean the networking stuff. Given NetApp’s strong push for IP-based storage, one would think an entry level solution would simply have 2×48 port 10gig stackable switches, or whatever Cisco’s equivalent is(maybe this is it).

This solution is supposed to provide scalability for up to 1,000 users – what those 1,000 users are actually doing I have no idea, does it mean VDI? Database? web site users? File serving users? ?????????????

It’s also unclear in the article if this itself scales to that level or it just provides the minimum building blocks to scale to 1,000 users (I assume the latter) – and if so what does 1,000 user configuration look like? (or put another way how many users does the below picture support)

I’ll be the first to admit I’m ignorant as to the details and the reason why Cisco needs 3 different devices with these things but whatever the reason seems major overkill for an entry level solution assuming the usage of IP-based storage.

The new entry level flex pod

The choice of a NetApp FAS2000 array seems curious to me – at least given the fact that it does not appear to support that Flex Cache stuff which they like to tout so much.

April 7, 2012

Interesting discussion on vSphere vs Hyper-V

Filed under: Virtualization — Tags: , — Nate @ 1:52 pm

I stumbled upon this a few days ago and just got around to reading it now. It came out about a month ago, I forgot where I saw it, I think from Planet V12n or something.

Anyways it’s two people who sound experienced(I don’t see information on their particular backgrounds) each talking up their respective solution. Two things really stood out to me:

  • The guy pimping Microsoft was all about talking about a solution that doesn’t exist yet (“it’ll be fixed in the next version, just wait!”)
  • The guy pimping VMware was all about talking about how cost doesn’t matter because VMware is the best.

I think they are both right – and both wrong.

It’s good enough

I really believe that in Hyper-V’s case and also in KVM/RHEV’s case that for the next generation of projects these products will be “good enough” (in Hyper-V’s case – whenever Windows 8 comes out) for a large(majority) number of use cases out there. I don’t see Linux-centric shops considering Hyper-V or Windows-centric considering KVM/RHEV/etc so VMware will still obviously have a horse in the  race (as the pro-VMware person harps on in the debate).

Cost is becoming a very important issue

One thing that really got me the wrong way was when the pro-VMware person said this

Some people complain about VMware’s pricing but those are not the decision makers, they are the techies. People who have the financial responsibility for SLAs and customers aren’t going to bank on an unproven technology.

I’m sorry but that is just absurd. If cost wasn’t an issue then the techies wouldn’t be complaining about it because they know, first hand that it is an issue in their organizations. They know, first hand that they have to justify the purchase to those decision makers. The company I’m at now was in that same situation – the internal IT group could only get the most basic level of vSphere approved for purchase at the time for thier internal IT assets(this was a year or two or three ago). I hear them constantly complaining about the lack of things like vMotion, or shared storage etc. Cost was a major issue so the environment was built with disparate systems and storage and the cheap version of vSphere.

Give me an unlimited budget and I promise, PROMISE you will NEVER hear me complain about cost. I think the same is true of most people.

I’ve been there, more than once! I’ve done that exact same thing (Well in my case I managed to have good storage in most of the cases).

Those decision makers weigh the costs of maintaining that SLA with whatever solution they’re going to provide. Breaking SLAs can be more cost effective then achieving them. Especially if they are absurdly high SLAs. I remember at one company I was at they signed all these high level SLAs with their new customers — so I turned around and said – hey, in order to achieve those SLAs we need to do this laundry list of things. I think maybe 5-10% of the list got done until the budget ran out. You can continue to meet those high SLAs if your lucky, and don’t actually have the ability to sustain failure and maintain uptime. More often than not such smaller companies prefer to rely on luck then doing things right.

Another company I was at had what could of been called a disaster in itself, during the same time I was working on a so-called disaster recovery project (no coincidence). Despite the disaster, at the end of the day the management canned the disaster recovery project (which everyone agreed if it was in place it would of saved a lot of everything had it been in place at the time of the disaster). It’s not that budget wasn’t approved – it was approved. The problem was management wanting to do another project that they massively under budgeted for and decided to cannibalize the budget from DR to give to this other pet project.

Yet another company I was at signed a disaster recovery contract with Sun Guard just to tick the check box to say they have DR. The catch was – the entire company knew up front before they signed – that they would never be able to utilize the service. IT WOULD NEVER WORK. But they signed it anyways because they needed a plan, and they didn’t want to pay for a plan that would of worked.

VMware touting VM density as king

I’ve always found it interesting how VMware touts VM density, they show an automatic density advantage to VMware which automatically reduces VMware’s costs regardless of the competition. This example was posted to one of their blogs a few days ago.

They tout their memory sharing, their memory ballooning, their memory compression all as things that can increase density vs the competiton.

My own experience with memory sharing on VMware at least with Linux is pretty simple – it doesn’t work. It doesn’t give results. Looking at one of my ESX 4.1 servers (yes, no ESXi here) which has 18 VMs on it and 101GB of memory in use, how much memory am I saving with the transparent page sharing?

3,161 MB – or about 3%. Nothing to write home about.

For production loads, I don’t want to be in a situation where memory ballooning kicks in, or when memory compression kicks in, I want to keep performance high – that means no swapping of any kind from any system. Last thing I want is my VMs to start thrashing my storage with active swapping. Don’t even think about swapping if your running Java apps either, once that garbage collection kicks in your VM will grind to a halt while it performs that operation.

I would like a method to keep the Linux buffer cache under control however, whether it is ballooning that specifically targets file system cache, or some other method, that would be a welcome addition to my systems.

Another welcome addition would be the ability to flag VMs and/or resource pools to pro-actively utilize memory compression (regardless of memory usage on the host itself). Low priority VMs, VMs that sit at 1% cpu usage most of the time, VM’s where the added latency of compression on otherwise idle CPU cores isn’t that important (again – stay away from actively swapping!). As a bonus provide the ability to limit the CPU capacity consumed by compression activities, such as limiting it to the resource pool that the VM is in, and/or having a per-host setting where you could say – set aside up to 1 CPU core or whatever for compression, if you need more than that, don’t compress unless it’s an emergency.

YAWA with regards to compression would be to provide me with compression ratios – how effective is the compression when it’s in use? Recommend to me VMs that have low utilization that I could pro-actively reclaim memory by compressing these, or maybe only portions of the memory are worth compressing? The Hypervisor with the assistance of the vmware tools has the ability to see what is really going on in the guest by nature of having an agent there. The actual capability doesn’t appear to exist now but I can’t imagine it being too difficult to implement. Sort of along the lines of pro-actively inflating the memory balloon.

So, for what it’s worth for me, you can take any VM density advantages for VMware off the table when it comes from a memory perspective. For me and VM density it’s more about the efficiency of the code and how well it handles all of those virtual processors running at the same time.

Taking the Oracle VM blog post above, VMware points out Oracle supports only 128 VMs per host vs VMware at 512, good example – but really need to show how well all those VMs can work on the same host, how much overhead is there. If my average VM CPU utilization is 2-4% does that mean I can squeeze 512 VMs on a 32-core system (memory permitting of course)  — when in theory I should be able to get around 640 – memory permitting again.

Oh the number of times I was logged into an Amazon virtual machine that was suffering from CPU problems only to see that 20-30% of the CPU usage was being stolen from the VM by the hypervisor. From the sar man page


Percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.

Not sure if Windows has something similar.

Back to costs vs Maturity

I was firmly in the VMware camp for many years, I remember purchasing ESX 3.1 (Standard edition – no budget for higher versions) for something like $3,500 for a two-socket license. I remember how cheap it felt at the time given the power it gave us to consolidate workloads. I would of been more than happy(myself at least) to pay double for what we got at the time. I remember the arguments I got in over VMware vs Xen with my new boss at the time, and the stories of the failed attempts to migrate to Xen after I left the company.

The pro-VMware guy in the original ZDNet debate doesn’t see the damage VMware is doing to itself when it comes to licensing. VMware can do no wrong in his eyes. I’m sure there are plenty of other die hards out there that are in the same boat. The old motto of you never got fired for buying IBM right. I can certainly respect the angle though as much as it pains my own history to admit that I think the tides have changed and VMware will have a harder and harder time pitching it’s wares in the future, especially if it keeps playing games with licensing on a technology which it’s own founders (I think — I wish I could find the article) predicted would become commodity by about now. With the perceived slow uptake of vSphere 5 amongst users I think the trend is already starting to form. The problem with the uptake isn’t just the licensing of course, it’s that for many situations there isn’t a compelling reason to upgrade – it’s good enough has set in.

I can certainly, positively understand VMware providing premium pricing for premium services, an Enterprise Plus Plus ..or whatever. But don’t vary the price based on provisioned utilization that’s just plain shooting yourself (and your loyal customers) in the feet. The provisioned part is another stickler for me – the hypervisor has the ability to measure actual usage, yet they stick their model to provisioned capacity – whether or not the VM is actually using the resource. It is a simpler model but it makes planning more complicated.

The biggest scam in this whole cloud computing era so many people think we’re getting into is the vast chasm between provisioned vs utilized capacity. With companies wanting to charge you for provisioned capacity and customers wanting to over provision so they don’t have to make constant changes to manage resources, knowing that they won’t be using all that capacity up front.

The technology exists, it’s just that few people are taking advantage of it and fewer yet have figured out how to leverage it (at least in the service provider space from what I have seen).

Take Terremark (now Verizon), a VMware-based public cloud provider (and one of only two partners listed on VMware’s site for this now old program). They built their systems on VMware, they build their storage on 3PAR. Yet for this vCloud express offering there is no ability to leverage resource pools, no ability to utilize thin provisioning (from a customer standpoint). I have to pay attention to exactly how much space I provision up front, and I don’t have the option to manage it like I would on my own 3PAR array.

Now Terremark has an enterprise offering that is more flexible and does offer resource pools, but this isn’t available on their on demand offering. I still have the original quote Terremark sent me for the disaster recovery project I was working on at the time, it makes me want to either laugh or cry to this day. I have to give Terremark credit though at least they have an offering that can utilize resource pools, most others (well I haven’t heard of even one – though I haven’t looked recently) does not. (Side note: I hosted my own personal stuff on their vCloud express platform for a year so I know it first hand – it was a pretty good experience what drove me away primarily was their billing for each and every TCP and UDP port I had open on an hourly rate. Also good to not be on their platform anymore so I don’t risk them killing my system if they see something I say and take it badly).

Obviously the trend in system design over recent years has bitten into the number of licenses that VMware is able to sell – and if their claims are remotely true – that 50% of the world’s workloads are virtualized and of that they have 80% market share – it’s going to be harder and harder to maintain a decent growth rate. It’s quite a pickle they are in, customers in large part apparently haven’t bought into the more premium products VMware has provided (that are not part of the Hypervisor), so they felt the pressure to increase the costs of the Hypervisor itself  to drive that growth in revenue.

Bottom Line

VMware is in trouble.

Simple as that.

April 6, 2012

Flash from the past – old game review

Filed under: General,Random Thought — Tags: — Nate @ 11:39 pm

I was talking to a friend that I’ve known for more than 20 years a few days ago (these posts really are making me feel old 🙂 ) and we were talking about games, Wing Commander Saga among them and he brought up an old game that we tried playing for a while he couldn’t remember the name, but I did. It was Outpost. A Sci-fi simcity or civilization style game from 1994. It has amazing visuals, being one of the earlier CDROM-based games. I bought it after seeing the visuals and really wanted to like it but no matter what I couldn’t get very far without losing the game, no matter what I did I would run out of resources, air, food water, whatever it was, all my people would die and I would have to start again. Rinse and repeat a few times and I gave up on it eventually. I really liked the concept being a long time sci fi fan (well hard core sci fi fans would probably refer to me as a space opera fan).

ANYWAYS, I hadn’t looked for anything about this game since well probably the mid 90s. I ran a basic search earlier today and came across this 10-minute video review from an interactive CDROM magazine published back in 1994. I had no idea how much of a mess the game really was, the review was incredibly funny to watch they rip the game apart. They are in awe of the visuals like I was but other than that the game was buggy and under finished. They make it sound like it was one of the most un finished games of all time. They talk about entire sections of the strategy guide that are completely left out of the game, functions that are mentioned that just don’t exist. In-game artificial intelligence that gives absolutely worthless data, no explanation on how to plan to play the game up front. It’s quite humorous! I’m going to go watch it again.

From Wikipedia

Initial reviews of Outpost were enthusiastic about the game. Most notoriously, the American version of PC Gamer rated the game at 93%, one of its highest ratings ever for the time. It was later made known that the reviewers had in fact played beta versions of the game, and had been promised certain features would be implemented, but never were.


Following the release of the game, the game’s general bugginess and perceived mediocre gameplay, along with the lack of features described in most of the game’s reviews and the game’s own documentation led to a minor backlash against the computer game magazines of the time by consumers who bought the game based on their reviews.

April 5, 2012

Built a new computer – first time in 10 years

Filed under: linux,Random Thought — Tags: , — Nate @ 8:51 am

I was thinking about when the last time I built a computer from scratch this morning and I think it was about ten years ago, maybe longer – I remember the processor was a Pentium 3 800Mhz. It very well may of been almost 12 years ago. Up until around 2004 time frame I had built and re-built computers re-using older parts and some newer components, but as far as going out and buying everything from scratch, it was 10-12 years ago.

I had two of them, one was a socket-based system the other was a “Slot 2“-based system. I also built a couple systems around dual-slot (Slot 1) Pentium 2 systems with the Intel L440GX+ Motherboard (probably my favorite motherboard of all time). For those of you think that I use nothing but AMD I’ll remind you that aside from the AMD K6-3 series I was an Intel fanboy up until the Opteron 6100 was released. I especially liked the K6-3 for it’s on chip L2 cache, combined with 2 Megabytes of L3 cache on the motherboard it was quite zippy. I still have my K6-3 CPU itself in a drawer around here somewhere.

So I decided to build a new computer to move my file serving functions out of my HP xw9400 workstation which I bought about a year and a half ago into something smaller so I could turn the HP box into something less serious to play some games on my TV on (like WC: Saga!). Maybe get a better video card for it I’m not sure.

I have a 3Ware RAID card and 4x2TB disks in my HP box so I needed something that could take that. This is what I ended up going with, from Newegg –

Seemed like an OK combination, the case is pretty nice having a 5-port hot swap SATA backplane, supporting up to 7 HDDs. PC Power & Cooling (I used to swear by them and so thought might as well go with them again) had a calculator and said for as many HDDs as I had to get a 500W so I got that.

There is a lot of new things here that are new to me anyways and it’s been interesting to see how technology has changed since I last did this in the Pentium 3 era.

Mini ITX. Holy crap is that small. I knew it was small based on dimensions but it really didn’t set in until I held the motherboard box in my hand and it seemed about the same size as a retail box for a CPU 10 years ago. It’s no wonder the board uses laptop memory. The amount of integrated features on it are just insane as well from ethernet to USB 2 and USB 3, eSATA, HDMI, DVI, PS/2, optical audio output, analog audio out, and even wireless all crammed into  that tiny thing. Oh! Bluetooth is thrown in as well. During my quest to find a motherboard I even came across a motherboard that had a parallel port on it – I thought those died a while ago. The thing is just so tiny and packed.

On the subject of motherboards – the very advanced overclocking functions is just amazing. I will not overclock since I value stability over performance, and I really don’t need the performance in this box. I took the overclocking friendliness of this board to hopefully mean higher quality components and the ability to run more stable at stock speeds. Included tweaking features –

  • 64-step DRAM voltage control
  • Adjustable CPU voltage at 0.00625V increments (?!)
  • 64-step chipset voltage control
  • PCI Express frequency tuning from 100Mhz up to 150Mhz in 1Mhz increments
  • HT Tuning from 100Mhz to 550Mhz in 1Mhz increments
  • ASUS C.P.R. (CPU Parameter Recall) – no idea what that is
  • Option to unlock the 4th CPU core on my CPU
  • Options to run on only 1,2 or or all 3 cores.

Last time I recall over clocking stuff there was maybe 2-3 settings for voltage and the difference was typically at least 5-15% between them. I remember the only CPU I ever over clocked was a Pentium 200 MMX (o/c to 225Mhz – no voltage changes needed ran just fine).

I seem to recall from a PCI perspective, back in my day there was two settings for PCI frequencies, whatever the normal was, and one setting higher(which was something like 25-33% faster).

ASUS M4A88T-I Motherboard

Memory – wow it’s so cheap now, I mean 8GB for $45 ?! The last time I bought memory was for my HP workstation which requires registered ECC – and it was not so cheap ! This system doesn’t use ECC of course. Though given how dense memory has been getting and the possibility of memory errors only increasing I would think at some point soon we would want some form of ECC across the board ? It was certainly a concern 10 years ago when building servers with even say 1-2GB of memory now we have many desktops and laptops coming standard with 4GB+. Yet we don’t see ECC on the desktops and laptops – I know because of cost but my question is more around there doesn’t appear to be a significant (or perhaps in some cases even noticeable) hit in reliability of these systems with larger amounts of memory without ECC which is interesting.

Another thing I noticed was how massive some video cards have become, consuming as many as 3 PCI slots in some cases for their cooling systems. Back in my day the high end video cards didn’t even have heat sinks on them! I was a big fan of Number Nine back in my day and had both their Imagine 128 and Imagine 128 Series 2 cards, with a whole 4MB of memory (512kB chips if I remember right on the series 2, they used double the number of smaller chips to get more bandwidth). Those cards retailed for $699 at the time, a fair bit higher than today’s high end 3D cards (CAD/CAM workstation cards excluded in both cases).

Modular power supplies – the PSU I got was only partially modular but it was still neat to see.

I really dreaded the assembly of the system since it is so small, I knew the power supply was going to be an issue as someone on Newegg said that you really don’t want a PSU that is longer than 6″ because of how small the case is. I think PC Power & Cooling said mine was about 6.3″(with wiring harness). It was gonna be tight — and it was tight. I tried finding a shorter power supply in that class range but could not. It took a while to get the cables all wrapped up. My number one fear of course after doing all that work, hitting the power button and find out there’s a critical problem (bought the wrong ram, bad cpu, bad board, plugged in the power button the wrong way whatever).

I was very happy to see when I turned it on for the first time it lit up and the POST screen came right up on the TV. There was a bad noise comming from one of the fans because the cable was touching it, so I opened it up again and tweaked the cables more so they weren’t touching the fan, and off I went.

First, without any HDs just to make sure it turned on, the keyboard worked, I could get into the BIOS screen etc. All that worked fine, then I opened up the case again and installed an old 750GB HD in one of the hot swap slots. Hooked up a USB CDROM with a CD of Ubuntu 10.04 64-bit and installed it on the HD.

Since this board has built in wireless I was looking forward to trying it out – didn’t have much luck. It could see the 50 access points in the area but it was not able to login to mine for some reason, I later found that it was not getting a DHCP response so I hard wired an IP and it worked — but then other issues came up like DNS not working, very very slow transfer speeds(as in sub 100 BYTES per second), after troubleshooting for about 20 minutes I gave up and went wired and it was fast. I upgraded the system to the latest kernel and stuff but that didn’t help the wireless. Whatever, not a big deal I didn’t need it anyways.

I installed SSH, and logged into it from my laptop,  shut off X-Windows, and installed the Cerberus Test Suite (something else I used to swear by back in the mid 00s). Fortunately there is a packaged version of it for Ubuntu as, last I checked it hasn’t been maintained in about seven years. I do remember having problems compiling it on a 64-bit RHEL system a few years ago (Though 32-bit worked fine and the resulting binaries worked fine on 32-bit too).

Cerberus test suite (or ctcs as I call it), is basically a computer torture test. A very effective one, the most effective I’ve ever used myself. I found that if a computer can survive my custom test (which is pretty basic) for 6 hours then it’s good, I’ve run the tests as long as 72 hours and never saw a system fail in a period more than 6 hours. Normally it would be a few minutes to a couple hours. It would find problems with memory that memtest wouldn’t find after 24 hours of testing.

What cerberus doesn’t do, is it doesn’t tell you what failed or why, if your system just freezes up you still have to figure it out. On one project I worked on that had a lot of “white box” servers in it, we deployed them about a rack at a time, and I would run this test, maybe 85% of them would pass, and the others had some problem, so I told the vendor go fix it, I don’t know what it is but these are not behaving like the others so I know there is a issue. Let them figure out what component is bad (90% of the time it was memory).

So I fired up ctcs last night, and watched it for a few minutes, wondering if there is enough cooling on the box to keep it from bursting into flames. To my delight it ran great, with the CPU topping out at around 54C (honestly have no idea if that is good or not, I think it is OK though). I ran it for 6 hours overnight and no issues when I got up this morning. I fired it up again for another 8 hours (the test automatically terminates after a pre defined interval).

I’m not testing the HD, because it’s just a temporary disk until I move my 3ware stuff over.  I’m mainly concerned about the memory, and CPU/MB/cooling. The box is still running silent (I have other stuff in my room so I’m sure it makes some noise but I can’t hear it). It has 4 fans in it including the CPU fan. A 140mm, a 120mm and the PSU fan which I am not sure how big that is.

My last memory of ASUS was running on an Athlon with an A7A-266 motherboard(I think in 2000), that combination didn’t last long. The IDE controller on the thing corrupted data like nobody’s business. I would install an OS, and everything would seem fine then the initial reboot kicked in and everything was corrupt. I still felt that ASUS was a pretty decent brand maybe that was just specific to that board or something. I’m so out of touch with PC hardware at this level the different CPU sockets,  CPU types, I remember knowing everything backwards and forwards in the Socket 7 days, back when things were quite interchangeable. Then there was my horrible year or two experience in the ABIT BP-6, a somewhat experimental dual socket Celeron system. What a mistake that was, oh what headaches that thing gave me. I think I remember getting it based on a recommendation at Tom’s Hardware guide, a site I used to think had good information (maybe it does now I don’t know). But that experience with the BP6 fed back into my thoughts about Tom’s hardware and I really didn’t go back to that site ever again(sometimes these days I stumble upon it on accident). I noticed a few minutes ago that Abit as a company is out of business now, they seemed to be quite the innovator back in the late 90s.

Maybe this weekend I will move my 3ware stuff over and install Debian (not Ubuntu) on the new system and set it up. While I like Red Hat/CentOS for work stuff, I like Debian for home. It basically comes down to if I am managing it by hand I want Debian, if I’m using tools like CFEngine to manage it I want RH. If it’s a laptop, or desktop then it gets Ubuntu 10.04 (I haven’t seen the nastiness in the newer Ubuntu release(s) so not sure what I will do after 10.04).

I really didn’t think I’d ever build a computer again, until this little side project came up.

Another reason I hate SELinux

Filed under: linux,Random Thought — Tags: , , — Nate @ 7:43 am

I don’t write too much about Linux either but this is sort of technical I guess.

I’ve never been a fan of SELinux. I’m sure it’s great if your in the NSA, or the FBI, or some other 3 letter agency, but for most of the rest of the people it’s a needless pain to deal with, and provides little benefit.

I remember many moons ago back when I dealt with NT4, encountering situations where I, as an administrator could not access a file on the NTFS file system. It made no sense – I am administrator – get me access to that file – but no, I could not get access. HOWEVER, I could change the security settings and take ownership of the file NOW I can get access. Since I have that right to begin with it should just give me access and not make me jump through those hoops. That’s what I think at least. I recall someone telling me back in the 90s that Netware was similar and even went to further extremes where you could lock the admin out of files entirely, and in order to back data up you had another backup user which the backup program used and that was somehow protected too. I can certainly understand the use case, but it certainly makes things frustrating. I’ve never been at a company that needed anywhere remotely that level of control (I go out of my way to avoid them actually since I’m sure that’s only a small part of the frustrations of being there).

On the same token I have never used (for more than a few minutes anyways) file system ACLs on Linux/Unix platforms either. I really like the basic permissions system it works for 99.9% of my own use cases over the years, and is very simple to manage.

I had a more recent experience that was similar, but even more frustrating on Windows 7. I wanted to copy a couple files into the system32 directory, but no matter what I did (including take ownership, change permissions etc) it would not let me do it. It’s my #$#@ computer you piece of #@$#@.

Such frustration is not limited to Windows however, Linux has it’s own similar functionality called SE Linux, which by default is turned on in many situations. I turn it off everywhere, so when I encounter it I am not expecting it to be on, and the resulting frustration is annoying to say the least.

A couple weeks ago I installed a test MySQL server, and exposed a LUN to it which had a snapshot of a MySQL database from another system. My standard practice is to turn /var/lib/mysql into a link which points to this SAN mount point. So I did that, and started MySQL …failed. MySQL complained about not having write access to the directory. So I spent the next probably 25 minutes fighting this thing only to discover it was SE Linux that was blocking access to the directory. Disable SE Linux, reboot and MySQL came up fine w/o issue.  #@$#@!$

Yesterday I had another, more interesting encounter with SE Linux. I installed a few CentOS 6.2 systems to put an evaluation of Vertica on. These were all built by hand since we have no automation stuff to deal with CentOS/RH, everything we have is Ubuntu. So I did a bunch of basic things including installing some SSH keys so I could login as root w/my key. Only to find out that didn’t work. No errors in the logs, nothing just rejected my key. I fired up another SSH daemon on another port and my key was accepted no problem. I put the original SSH daemon in debug mode and it gave nothing either just said rejected my key. W T F.

After fighting for another probably 10 minutes I thought, HEY maybe SE Linux is blocking this, and I checked and SE Linux was in enforcing mode. So I disabled it, and rebooted – now SSH works again. I didn’t happen to notice any logs anywhere related to SE Linux and how/why it was blocking this, and only blocking it on port 22 not on any other ports(I tried two other ports), but there you have it, another reason to hate SE Linux.

You can protect your system against the vast majority of threats fairly easily, I mean the last system I dealt with that got compromised was a system that sat out on the internet (with tons of services running) that hadn’t had an OS upgrade in at least 3 years. The system before that I recall was another Linux host(internet-connected as well – it was a firewall) – this time back in 2001 and probably hadn’t had upgrades in a long time either.  The third – a FreeBSD system that was hacked because of me really – I told my friend who ran it to install SSH as he was using telnet to manage it. So he installed SSH and SSH got exploited (back in 2000-2001). I’ve managed probably 900-1000 different hosts over that time frame without an issue. I know there is value in SE Linux, just not in the environments I work in.

Oh and while I’m here, I came across a new feature in CentOS 6.2  yesterday which I’m sure probably applies to RHEL too. When formatting an ext4 file system by default it discards unused blocks. The man page says this is good for thin provisioned file systems and SSDs. Well I’ll tell you it’s not good for thin provisioned file systems, the damn thing sent 300 Megabytes a second of data (450-500,000+ sectors per second according to iostat) to my little storage array with a block size of 2MB (never seen a block size that big before), which had absolutely no benefit other than to flood the storage interfaces and possibly fill up the cache. I ran this on three different VMs at the same time. After a few seconds my front end latency on my storage went from 1.5-3ms to 15-20ms. And the result on the volumes themselves? Nothing, there was no data being written to them. So what’s the point? My point is disable this stupid function with the -K option when running mke2fs on CentOS 6.2. On Ubuntu 10.04 (what we use primarily), it uses ext4 too, but it does not perform this function when a file system is created.

Something that was strange when this operation ran, and I have a question to my 3PAR friends on it – is the performance statistics for the virtual LUN showed absolutely no data flowing through the system, but the performance stats for the volume itself were there(a situation I have never seen before in 6 years on 3PAR), and the performance stats of the fibre channel ports were there, there was no noticeable hit on back end I/O  that I could see, so the controllers were eating it. My only speculation is because RHEL/CentOS 6 has built in support for SCSI UNMAP that these commands were actually UNMAP commands rather than actual data. I’m not sure though.

Older Posts »

Powered by WordPress