Diggin' technology every day

September 21, 2012

Need New Colo! NOW

Filed under: Datacenter — Nate @ 1:36 pm

UPDATE – I have finished my move! woohoo. This is my first visit to this Hurricane Electric facility it’s not really what I expected but it is an, interesting place. It seems to be entirely shared hosting, well at least the room I am in right now. All of the racks are the same and things look fairly uniform. I can tell the racks are pretty old tech, reminds me of the racks I used to have at Internap. I wouldn’t want to host mission critical stuff here, or expensive stuff, but for my little 1U box it is OK. There is a cabinet one cabinet away from mine in my row where they removed the rear door to fit the equipment in. They have a C7000 blade enclosure  and tons of CAT5 cabling all over the place,  totally not secure, but I guess people here are mindful which is good. They look to be running what I assume is a 120V 20A circuit based on the size of the power cable coming out of the PDU, and there is a little meter there on the PDU itself that reads 20 …  The PDU is ziptied to the rear of the rack because there is no place to mount it in the rack (they had to extend the rack rails to the max to fit the blades in). Anyways off I go…


[UPDATED] My hosting provider broke the news to me yesterday (but didn’t see the email till today) that they are terminating their co-location on Oct 1st. So I have to get out and find somewhere else to put my server (which hosts this blog along with my email, my DNS etc..).

I reached out to Hurricane Electric where my current provider hosts, as well as another place called Silicon Valley Web Hosting, which seems decent. If you know of another provider in the bay area (I want it local in case I have hardware issues), please let me know!

This is what I have today:

  • 1U Server
  • 1 network feed
  • 5 static IPs
  • 100Mbit unlimited bandwidth
  • $100/mo

I don’t use much bandwidth, I haven’t kept track in a while but I suspect it’s well below 100GB/mo.

SV Web hosting looks to cost about $170/mo for at a facility they are hosted at down in San Jose. I think I’d be OK up to $190/mo, beyond that I’d have to think harder about making more compromises.

I’d like to be hosted in a good facility, and am totally willing to pay more for it – no fly by night operations, nobody that attracts a lot of spammers/DOS attack type stuff.

I don’t need 100% uptime, network outages here and there aren’t a big deal, hopefully power outages will be really rare. Oh and no facilities that run on flywheel UPSs now that I’m thinking about power. Redundant power would be nice (my server has it), not required, redundant network would be nice, not required.

So if you know of any good places I should check out let me know!

(for those of you who may not know I did try the cloud route for a year while I was in between servers – it really wasn’t for me – especially now that I have 4TB of usable storage on my system for backups and stuff)


UPDATE – after some hard thinking I decided to go with Hurricane Electric, the same building that I am hosted in now so the move should be pretty painless. For $200/mo I get 7U of space, with 240W of power and 100Mbit unmetered bandwidth. I think the extra space will give me some flexibility, at some point I plan to get a pair of these, and rotate them between home and colo, 1TB is more than enough, with my ~1Mbit upstream it would take me 4 months to upload 1TB(the best upstream I can get is “up to 5Mbps” for an extra $55/mo – currently my “up to 1.5Mbps” seems to tap out below 1.1Mbps), vs a 30 minute drive and probably 12 hours of data copying with this method. Then probably either move my backup Soekris OpenBSD firewall or get another one so I can better protect the ESXi and ipmi management interfaces on my server (the VM that runs this blog already sits behind a virtual OpenBSD firewall).

Longer term I can see building a new server to replace what I have, something that is bigger, supports more disks, but still efficient enough to fit in that 240W power envelope. Longer term still who knows maybe I will upgrade to 100Mbps cable modem for $160/mo (currently pay about $45 for 16Mbps), and just store everything in my own little cloud. Stream everything..

September 14, 2012

VMware suggests swapping as a best practice

Filed under: Virtualization — Tags: — Nate @ 11:26 am

Just came across this, and going through the PDF it says

Virtualization causes an increase in the amount of physical memory required due to the extra memory needed by ESXi for its own code and for data structures. This additional memory requirement can be separated into two components:
1. A system-wide memory space overhead for the VMkernel and various host agents (hostd, vpxa, etc.).

A new feature in ESXi 5.1 allows the use of a system swap file to reduce this memory overhead by up to 1GB when the host is under memory pressure.

That just scares me that the advocate setting up a swap file to reduce memory usage by up to 1GB. How much memory does the average VMware system have? Maybe 64GB today? So that could save 1.5% of physical memory, with the potential trade off of impacting storage performance (assuming no local storage) for all other systems in the environment.

Scares me just about as much as how 3PAR used to advocate their storage systems can get double the VM density per server because you can crank up the swapping and they can take the I/O hit (I don’t think they advocate this today though).

Now if you can somehow be sure that the system won’t be ACTIVELY swapping then it’s not a big deal, but of course you don’t want to actively swap really in any situation, unless your I/O is basically unlimited. You could go and equip your servers with say a pair of SSDs in RAID 1 to do this sort of swapping(remember it is 1GB). But it’s just not worth it. I don’t understand why VMware spent the time to come up with such a feature.

If anything the trend has been more memory in hosts not less, I’d imagine most serious deployments have well over 100GB of memory per host these days.

My best practice is don’t swap – ever. In the environments I have supported performance/latency is important so there is really no over subscription for memory, I’ve had one time where Vmware was swapping excessively at the host level, and to me it was a bug, but to Vmware it was a feature(there was tons of memory available on the host), I forgot the term but it was a documented behavior on how the hypervisor functions, just not commonly known I guess, and totally not obvious. The performance of the application obviously went in the toilet when this swapping was going on, it felt like the system was running on a 386 CPU.

Windows memory footprint is significantly different than that of Linux, Linux represents probably 98 or 99% of my VMs over the years.

Oh and that transparent page sharing VMware touts so much? I just picked one of my servers at random, 31 VMs and 147GB of memory in use, TPS is saving me a grand 3% of memory, yay TPS.

The cost of I/Os(to spinning rust, or even enterprise SSDs), unless your workload is very predictable and you do not have much active swapping, is just too much to justify the risk in allowing swap in any form in my experience. In fact the bulk of the VMs I run do have a local 500MB swap partition, enough for some very light swapping – but I’d rather have the VM fail & crash, then have it swap like crazy and take the rest of the systems down with it.

But that’s me

September 12, 2012

Data Center reminder: deploy environmental sensors

Filed under: Datacenter — Tags: — Nate @ 8:54 pm

I feel like I am almost alone in the world when it comes to deploying environmental sensors around my equipment. I first did it at home back around 2001 when I had a APC SmartUPS and put a fancy environmental monitoring card in it, which I then wrote some scripts for and tied it into MRTG.

A few years later I was part of a decently sized infrastructure build out that had a big budget so I got one of these, and 16 x environmental probes each with 200 foot cables (I think the probes+cables alone were about $5k(the longest cables they had at the time, which were much more expensive than the short ones, I wasn’t sure what lengths I needed so I just went all out), ended up truncating the ~3200 feet of cables down to around ~800 feet I suspect). I focused more on cage environmental than per rack, I would of needed a ton more probes if I had per rack. Some of the sensors went into racks, and there was enough slack on the end of the probes to temporarily position them anywhere within say 10 feet of their otherwise fixed position very easily.

The Sensatronics device was real nifty, so small, and yet it supported both serial and ethernet, had a real basic web server, was easily integrated to nagios (though at the time I never had the time to integrate it so relied entirely on the web server). We were able to prove to the data center at the time their inadequate cooling and they corrected it by deploying more vented tiles. They were able to validate the temperature using one of those little laser gun things.

At the next couple of companies I changed PDU manufacturers and went to ServerTech instead, many (perhaps all?) of their intelligent PDUs come with ports for up to two environmental sensors. Some of their PDUs require an add-on to get the sensor integration.

The probes are about $50 a piece and have about a 10 foot cable on them. Typically I’d have two PDUs in a rack and I’d deploy four probes (2 per PDU). Even though environmental SLAs only apply to the front of the racks, I like information so I always put two sensors in front and two sensors in rear.

I wrote some scripts to tie this sensor data into cacti (the integration is ugly so I don’t give it out), and later on I wrote some scripts to tie this sensor data into nagios (this part I did have time to do). So I could get alerts when the facility went out of SLA.

Until today the last time I was at a facility that was out of SLA was in 2009, when one of the sensors on the front of the rack was reporting 87 degrees. The company I was at during that point had some cheap crappy IDS systems deployed in each facility, and this particular facility had a high rate of failures for these IDSs. At first we didn’t think *too* much of it, then I had the chance to hook up the sensors and wow, was I surprised. I looked at the temperatures inside the switches and compared it to other facilities (can’t really extrapolate ambient temp from inside the switch), and confirmed it was much warmer there than at our other locations.

So I bitched to them and they said there was no problem, after going back and forth they did something to fix it – this was a remote facility – 5,000 miles away and we had no staff anywhere near it, they didn’t tell us what they did but the temp dropped like a rock, and stayed within (barely) their SLA after that – it was stable after that.

Cabinet Ambient Temperature

There you have it, oh maybe you noticed there’s only one sensor there, yeah the company was that cheap they didn’t want to pay for a second sensor, can you believe that, so glad I’m not there anymore (and oh the horror stories I’ve heard about the place since! what a riot).

Anyways so fast forward to 2012.  Last Friday we had a storage controller fail (no not 3PAR, another lower end HP storage system), with a strange error message, oddly enough the system did not report there was a problem in the web UI (system health “OK”), but one of the controllers was down when you dug into the details.

So we had that controller replaced (yay 4 hour on site support), the next night the second controller failed with the same reason. HP came out again and poked at it, at one point there was a temperature alarm but the on site tech said he thought it was a false alarm, they restarted the controller again and it’s been stable since.

So today I finally had some time to start hooking up the monitoring for the temperature sensors in that facility, it’s a really small deployment, just 1 rack, so 4 sensors.

I was on site a couple of months ago and at the time I sent an email noting that none of the sensors were showing temperatures higher than 78 degrees (even in the rear of the rack).

So imagine my surprise when I looked at the first round of graphs that said 3 of the 4 sensors were now reporting 90 degrees or hotter temperature, and the 4th(near the floor) was reporting 78 degrees.

Wow, that is toasty, freakin hot more like it. So I figured maybe one of the sensors got moved to the rear of the rack, I looked at the switch temperatures and compared them with our other facility, the hotter facility was a few degrees hotter (4C), not a whole lot.

The servers told another story though.

Before I go on let me say that in all cases the hardware reports the systems are “within operating range”, everything says “OK” for temperature – it’s just way above my own comfort zone.

Here is a comparison of two servers at each facility, the server configuration hardware and software is identical, the load in both cases is really low, actually load at the hot facility would probably be less given the time of day (it’s in Europe so after hours). Though in the grand scheme of things I think the load in both cases is so low that it wouldn’t influence temperature much between the two. Ambient temperature is one of 23 temperature sensors on the system.

Data CenterDeviceLocationAmbient Temperature Fan Speeds (0-100%)
[6 fans per server]
Hot Data CenterServer XRoughly 1/3rd from bottom of rack89.6 F90 / 90 / 90 / 78 / 54 / 50
Normal Data CenterServer XRoughly 1/3rd from bottom of rack66.2 F60 / 60 / 57 / 57 / 43 / 40
Hot Data CenterServer YRoughly 1/3rd from bottom of rack87.8 F90 / 90 / 72 / 72 / 50 / 50
Normal Data CenterServer YBottom of Rack66.2 F59 / 59 / 57 / 57 / 43 / 40

That’s a pretty stark contrast, now compare that to some of the external sensor data from the ServerTech PDU temperature probes:

LocationAmbient Temperature (one number per sensor)Relative Humidity (one number per sensor)
Hot Data Center - Rear of Rack95 / 8828 / 23
Normal Data Center - Rear of Rack84 / 84 / 76 / 8044 / 38 / 35 / 33
Hot Data Center - Front of Rack90 / 7942 / 31
Normal Data Center - Front of Rack75 / 70 / 70 / 7058 / 58 / 58 / 47

Again pretty stark contrast. Given that all equipment (even the storage equipment that had issues last week) is in “normal operating range” there would be no alerts or notification, but my own alerts go off when I see temperatures like this.

The on site personnel used a hand held meter and confirmed the inlet temperature on one of the servers was 30C (86 F), the server itself reports 89.6, I am unsure as to the physical location of the sensor in the server but it seems reasonable that an extra 3-4 degrees from the outside of the server to the inside is possible. The data center’s own sensors report roughly 75 degrees in the room itself, though I’m sure that is due to poor sensor placement.

Temperature readout using a hand held meter

I went to the storage array, and looked at it’s sensor readings – the caveat being I don’t know where the sensors are located (trying to find that out now), in any case:

  • Sensor 1 = 111 F
  • Sensor 2 = 104 F
  • Sensor 3 = 100.4 F
  • Sensor 4 = 104 F

Again the array says everything is “OK”,  I can’t really compare to the other site since the storage is totally different(little 3PAR array), but I do know that the cooler data center has a temperature probe directly in front of the 3PAR controller air inlets, and that sensor is reading 70 F. The only temperature sensors I can find on the 3PAR itself are on the physical disks, which range from 91F to 98F, the disk specs say operating temperature from 5-55C (55C = 131F).

So the lesson here is, once again – invest in your own environmental monitoring equipment – don’t rely on the data center to do it for you, and don’t rely on the internal temperature sensors of the various pieces of equipment (because you can’t extract the true ambient temperature and you really need that if your going to tell the facility they are running too hot).

The other lesson is, once you do have such sensors in place, hook them up to some sort of trending tool so you can see when stuff changes.

PDU Temperature Sensor data

The temperature changes in the image above was from when the on site engineer was poking around.

Some sort of irony here the facility that is running hot is a facility that has a high focus on hot/cold isle containment (though the row we are in is not complete so it is not contained right now), they even got upset when I told them to mount some equipment so the airflow would be reversed. They did it anyway of course, that equipment generates such little heat.

In any case there’s tons of evidence that this other data center is operating too hot! Time to get that fixed..

September 5, 2012

Setting expectations for WP8 and BB10

Filed under: General — Tags: — Nate @ 9:54 am

The more I read about Windows Phone 8, and BlackBerry 10 the more I am reminded of WebOS, especially the die hard community around both platforms.

I like WebOS myself, I still use my devices daily, despite my fondness for the platform I was not delusional about it – if HP had released the Pre3 when they were planning on it last year (right about this time last year), it would of been destroyed by the iPhone (I believed this long before I owned the hardware, now that I’ve been using it for the past year my thoughts haven’t changed at all). It would of been just embarrassing to see. Especially after the lackluster performance of the HP Veer, the world’s smallest smart phone at the time (perhaps still is), as far as I know the Veer never received even a single software update post launch which was sad(and there are some bad bugs in it). By contrast the Pre3 received several software updates even though it was never officially launched in the U.S. and had a tiny launch in Europe.

Anyways back on topic, Windows Phone. I have been loosely following the Windows Phone Central site where many of the die hard WP8 fans seem to hang out at for good reason. They were so excited about the launch of the newest Nokia phone, they raged against users who had lost faith in the platform, even raged against manufacturers that seem to be losing faith in the platform.

Microsoft and Nokia have tried to hype up the announcement that came today, and as I’m sure many expected, they over promised and way under delivered. This is the exact same thing HP/Palm was doing (I remember one comment from a HP/Palm person forgot who it was, who said they weren’t going to launch a product that wasn’t perfect – of course the only time they did that was when they shut down the Pre3 before it fully launched).

I feel they failed to truly impress at this event. All the leaks ruined it for me personally. All the new info was boring IMHO

Another user posted

So basically waking up and watching this event was pointless. Nearly everything that was “announced” has already been leaked anyways…seriously, their employees and partners are like swiss cheese when it comes to non-disclosure agreements.

In another article people reacted to the fact that there is no release date, no price, and no firm date for the release of Windows Phone 8 itself. The WP Central site tries to spin the news as positively as it can (much as the Pre Central, oh I’m sorry WebOS Nation site does and did for WebOS). One user wrote

this reveal was choreographed and edited to death.  really a bad demo.  Joe Belfiore just pisses me off the more he comes on stage.  Build 2012 in F-ing november…that’s over a month’s loss of sales opportunities.

I just don’t understand why these other players think announcing (or releasing) products around the time Apple does so is a good idea, it sounded incredibly stupid to me for HP and the Pre3 last year, and it’s even worse this year with Microsoft, Amazon and others trying to steal Apple’s thunder. Samsung did a great job in their latest Galaxy SIII releasing it in June. I have to assume it’s because they have been unable to adjust their product cycles to off set them enough with Apple, or perhaps they just want to try to drive some hype around the holiday season, but if your going up against Apple you really have to bring it. Microsoft/Nokia talk the talk, but they haven’t shown they can hold a candle up to an Apple product launch, so it’s sad to see them even try.

I just saw an interview with the Nokia CEO on CNBC and the only phone he picked up and sort of showed off was not a phone that was announced today, he seemed to focus on their dwindling leadership in the low end phone race.

RIM was sort of saved by further product delays, they did want to launch this fall, but due to problems with the platform they’ve had to postpone the launch yet again to sometime in 2013, more than a year later than the dates I heard originally tossed around a while ago. RIM is busy trying to keep their hype machine primed, offering to essentially bail out (for lack of a better term) developers that make $1,000 or more on applications to the tune of up to $9,000 (for a total of $10,000). If that doesn’t tell you they are bleeding developers like crazy I’m not sure what will. But kudos to them for going the extra mile to try to retain them.

Hopefully, for their sake, RIM can over deliver on their promises, but given how they’ve been for the past year I wouldn’t hold my breath. Nokia seemed to let out another massive disappointment with their announcement today, knecapping Windows Phone 8 before it even gets out of the gate.

One thing Nokia fans can get excited about I suppose is the Nokia touchstone, I mean wireless charging. The Palm wireless charging technology I’ve been using for the past three years is one of the key things I like about the platform. The main downside to it from a mass market perspective from HP/Palm at least is the wireless charging base station was not a cheap accessory, so I suspect many non techies did not opt for it due to the price (which could easily be $50-60 at product launch).

I really would like one of these platforms to do well, trust me I am not a fan of Android nor iOS, it’s just sad to see history repeating itself.

September 2, 2012

Some reasons why Linux on the desktop failed

Filed under: linux — Tags: — Nate @ 10:40 pm

The most recent incarnation of this debate seemed to start with a somewhat interesting article over at Wired who talked to Miguel de Icaza who is a pretty famous desktop developer in Linux, mostly famous for taking what seemed to be controversial stances on implementing Microsoft .NET on Linux in the form of Mono.

And he thinks the real reason Linux lost is that developers started defecting to OS X because the developers behind the toolkits used to build graphical Linux applications didn’t do a good enough job ensuring backward compatibility between different versions of their APIs. “For many years, we broke people’s code,” he says. “OS X did a much better job of ensuring backward compatibility.”

It has since blown up a bit more with lots more people giving their two cents. As a Linux desktop user (who is not a developer) for the past roughly 14 years I think I can speak with some authority based on my own experience. As I think back, I really can’t think of anyone I know personally who has run Linux on the desktop for as long as I have, or more to the point hasn’t tried it and given up on it after not much time had passed – for the most part I can understand why.

For the longest time Linux advocates hoped(myself included) Linux could establish a foot hold as something that was good enough for basic computing tasks, whether it’s web browsing, checking email, basic document writing etc. There are a lot of tools and toys on Linux desktops, most seem to have less function than form(?) at least compared to their commercial counterparts. The iPad took this market opportunity away from Linux — though even without iPad there was no signs that Linux was on the verge of being able to capitalize on that market.

Miguel’s main argument seems to be around backwards compatibility, an argument I raised somewhat recently, backwards compatibility has been really the bane for Linux on the desktop, and for me at least it has been just as much to do with the kernel and other user space stuff as it does the various desktop environments.

Linux on the desktop can work fine if:

  • Your hardware is well supported by your distribution – this will stop you before you get very far at all
  • You can live within the confines of the distribution – if you have any needs that aren’t provided as part of the stock system you are probably in for a world of hurt.

Distributions like Ubuntu, and SuSE before it (honestly not sure what, if anything has replaced Ubuntu today) have made tremendous strides in improving Linux usability from a desktop perspective. Live CDs have helped a lot too, to be able to give the system a test run without ever installing it to your HD is nice.

I suspect most people today don’t remember the days when the installer was entirely text based and you had to fight with XFree86 to figure out the right mode lines for your monitor for X11 to work, and fortunately I don’t think really anyone uses dial up modems anymore so the problems we had back when modems went almost entirely to software in the form of winmodems are no longer an issue. For a while I forked out the cash for Accelerated X, a commercial X11 server that had nice tools and was easy to configure.

The creation of the Common Unix Printer System or CUPS was also a great innovation, printing on Linux before that was honestly almost futile with basic printers, I can’t imagine what it would of been like with more complex printers.

Start at the beginning though – the kernel. The kernel is not, and never really has maintained a stable binary interface for drivers over the years. To the point where I can take a generic driver for say a 2.6.x series kernel and use the same driver (singular driver) on Ubuntu, Red Hat, Gentoo or whatever. I mean you don’t have to look further than how many binary kernel drivers VMware includes with their vmware tools package to see how bad this is, on the version of vmware tools I have on the server that runs this blog there are 197 — yes 197 different kernels supported in there

  • 47 for Ubuntu
  • 55 For Red Hat Enterprise
  • 57 for SuSE Linux Enterprise
  • 39 for various other kernels

In an ideal world I would expect maybe 10 kernels for everything, including kernels that are 64 vs 32 bit.

If none of those kernels work, then yes, vmware does include the source for the drivers and you can build it yourself(provided you have the right development packages installed the process is very easy and fast). But watch out, the next time you upgrade your kernel you may have to repeat the process.

I’ve read in the most recent slashdot discussion where the likes of Alan Cox (haven’t heard his name in years!) said the Linux kernel did have a stable interface as he can run the same code from 1992 on his current system. My response to that is..then why do we have all these issues with drivers.

One of the things that has improved the state of Linux drivers is virtualization – it slashes the amount of driver code needed by probably 99% running the same virtual hardware regardless of the underlying physical hardware. It’s really been nice not to have to fight hardware compatibility recently as a result of this.

There have been times where device makers have released driver disks for Linux, usually for Red Hat based systems, however these often become obsolete fairly quickly. For some things perhaps like video drivers it’s not the end of the world, for the experienced user anyways you still have the ability to install a system and get online and get new drivers.

But if the driver that’s missing is for the storage controller, or perhaps the network card things get more painful.

I’m not trying to complain, I have dealt with these issues for many years and it hasn’t driven me away — but I can totally see how it would drive others away very quickly, and it’s too bad that the folks making the software haven’t put more of an effort into solving this problem.

The answer is usually make it open source – in the form of drivers at least, if the piece of software is widely used then making it open source may be a solution, but I’ve seen time and time again source released and it just rots on the vine because nobody has interest in messing with it (can’t blame them if they don’t need it). If the interface was really stable the driver could probably go unmaintained for several years without needing anyone to look at it (at least through the life of say the 2.6.x kernel)

When it comes to drivers and stuff – for the most part they won’t be released as open source, so don’t get your hopes up. I saw one person say that their company didn’t want to release open source drivers because they feared that they might be in violation of someone else’s patents and releasing the source would make it easier for their competition to be able to determine this.

The kernel driver situation is so bad in my opinion that distributions for the most part don’t back port drivers into their previous releases. Take Ubuntu for example, I run 10.04 LTS on my laptop that I am using now as well as my desktop at work.  I can totally understand if the original released version doesn’t have the latest e1000e (which is totally open source!) driver for my network card at work. But I do not understand that more than a year after it’s release it still doesn’t have this driver. Instead you either have to manage the driver yourself (which I do – nothing new for me), or run a newer version of the distribution (all that for one simple network driver?!). This version of the distribution is supported until April 2013. Please note I am not complaining, I deal with the situation – I’m just stating a fact. This isn’t limited to Ubuntu either, it has applied to every Linux distribution I’ve ever used for the most part. I saw Ubuntu recently updated Skype to the latest version for Linux recently on 10.04 LTS, but they still haven’t budged on that driver (no I haven’t filed a bug/support request, I don’t care enough to do it – I’m simply illustrating a problem that is caused by the lack of a good driver interface in the kernel  – I’m sure this applies to FAR more than just my little e1000e).

People rail on the manufacturers for not releasing source, or not releasing specs. This apparently was pretty common back in the 70s and early 80s. It hasn’t been common in my experience since I have been using computers (going back to about 1990). As more and more things are moved from hardware to software I’m not surprised that companies want to protect this by not releasing source/specs. Many manufacturers have shown they want to support Linux, but if you force them to do so by making them build a hundred different kernel modules for the various systems they aren’t going to put the effort in to doing it. Need to lower the barrier of entry to get more support.

I can understand where the developers are coming from though, they don’t have incentive to make the interfaces backwards and forwards compatible since that does involve quite a bit more work(much of it boring), instead prefer to just break things as the software evolves. I had been hoping that as the systems matured more this would become less common place, but it seems that hasn’t been the case.

So I don’t blame the developers…

But I also don’t blame people for not using Linux on the desktop.

Linux would of come quite a bit further if there was a common way to install drivers for everything from network cards to storage controllers, to printers, video cards, whatever, and have these drivers work across kernel versions, even across minor distribution upgrades. This has never been the case though (I also don’t see anything on the horizon, I don’t see this changing in the next 5 years if it changes ever).

The other issue with Linux is working within the confines of the distribution, this is similar to the kernel driver problem – because different distros are almost always more than a distribution with the same software, the underlying libraries are often incompatible between distributions so a binary on one, especially a complex one that is say a KDE or Gnome application won’t work on another, there are exceptions like Firefox, Chrome etc – though other than perhaps static linking in some cases not sure what they do that other folks can’t do. So the amount of work to support Linux from a desktop perspective is really high. I’ve never minded static linking, to me it’s a small price to pay to improve compatibility with the current situation. Sure you may end up loading multiple copies of the libraries into memory(maybe you haven’t heard but it’s not uncommon to get 4-8GB on a computer these days), sure if there is a security update you have to update the applications that have these older libraries as well. It sucks I suppose, but from my perspective it sucks a lot less than what we have now.  Servers – are an entirely different beast, run by(hopefully) experienced people who can handle this situation better.

BSD folks like to tout their compatibility – though I don’t think that is a fair comparison, comparing two different versions of FreeBSD against Red Hat vs Debian are not fair, comparing two different versions of Red Hat against each other with two different versions of FreeBSD (or NetBSD or OpenBSD or DragonFly BSD …etc)  is more fair. I haven’t tried BSD on the desktop since FreeBSD 4.x, for various reasons it did not give me any reasons to continue using it as a desktop and I haven’t had any reasons to consider it since.

I do like Linux on my desktop, I ran early versions of KDE (pre 1.0), up until around KDE 2, then switched to AfterStep for a while, eventually switching over to Gnome with Ubuntu 7 or 8, I forget. With the addition of an application called Brightside, GNOME 2.x works really well for me. Though for whatever reason I have to launch Brightside manually each time I login, setting it to run automatically on login results in it not working.

I also do like Linux on my servers, I haven’t compiled a kernel from scratch since the 2.2 days, but have been quite comfortable working with the issues operating Linux on the server end, the biggest headaches were always drivers with new hardware, though thankfully with virtualization things are much better now.

The most recent issue I’ve had with Linux on servers has been some combination of Ubuntu 10.04, with LVM and ext4 along with enterprise storage. Under heavy I/O have have seen many times ext4 come to a grinding halt. I have read that Red Hat explicitly requires that barriers be disabled with ext4 on enterprise storage, though that hasn’t helped me. My only working solution has been to switch back to ext3 (which for me is not an issue). The symtoms are very high system cpu usage, little to no i/o (really any attempts to do I/O result in the attempt freezing up), and when I turn on kernel debugging it seems the system is flooded with ext4 messages. Nothing short of a complete power cycle can recover the system in that state. Fortunately all of my root volumes are ext3 so it doesn’t prevent someone from logging in and poking around. I’ve looked high and low and have not found any answers. I had never seen this issue on ext3, and the past 9 months has been the first time I have run ext4 on enterprise storage. Maybe a bug specific to Ubuntu, am not sure. LVM is vital when maximizing utilization using thin provisioning in my experience, so I’m not about to stop using LVM, as much as 3PAR’s marketing material may say you can get rid of your volume managers – don’t.


Powered by WordPress