TechOpsGuys.com Diggin' technology every day

August 17, 2013

Happy Birthday Debian: 20 years old

Filed under: linux — Tags: — Nate @ 4:10 pm
Debian Powered

Techopsguys is Debian Powered

The big 2-0. Debian was the 2nd Linux I cut my teeth on, the first being Slackware 3.x. I switched to Debian 2.0 (hamm) in 1998 when it first came out. This was before apt existed (I think that was Debian 2.2 but not sure). I still remember the torture that was dselect, and much to my own horror dselect apparently still lives. Though I had to apt-get install it. It was torture because I literally spent 4-6 hours going through the packages selecting them one at a time. There may of been an easier way to do it back then I’m not sure, I was still new to the system.

I have been with Debian ever since, hard to believe it’s been about 15 years since I first installed it. I have, with only one exception stuck to stable the entire time. The exception I think was in between 2.2 and 3.0, I think that delay was quite large so I spent some time on the testing distribution. Unlike my early days running Linux I no longer care about the bleeding edge. Perhaps because the bleeding edge isn’t as important as it once was(to get basic functionality out of the system for example).

Debian has never failed me during a software update, or even major software upgrade. Some of the upgrades were painful (not Debian’s fault – for example going from Cyrus IMAP 1.x to 2.x was really painful). I do not have any systems that have lasted long enough to traverse more than one or two major system upgrades, hardware always gets retired. But unlike some other distributions major upgrades were fully supported and worked quite well.

I intentionally avoided Red Hat in my early days specifically because it was deemed easier to use. I started with Slackware, and then Debian. I spent hours compiling things whether it was X11, KDE 0.x, QT, GTK, Gnome, GIMP.. I built my own kernels from source, even with some custom patches(haven’t seriously done this since Linux 2.2). I learned a lot, I guess you could say the hard way. Which is why in part I do struggle on advising people who want to learn Linux what the best way is(books, training etc). I don’t know since I did it another way, a way that takes many years. Most people don’t have that kind of patience. At the time of course I really didn’t realize those skills would become so valuable later in life it was more of a personal challenge for myself I suppose.

I have used a few variants/forks of Debian over the years, most recently of course being Ubuntu. I have used Ubuntu exclusively on my laptops going back several years(perhaps even to 2006 I don’t remember). I have supported Ubuntu in server environments for the past roughly three years. I mainly chose Ubuntu for the laptops and desktops for the obvious reason – hardware compatibility. Debian (stable) of course tends to lag behind hardware support. Though these days I’m still happy running Ubuntu 10.04 LTS desktop .. which is EOL now. Haven’t decided what my next move is, not really thinking about it since what I have works fine still. Probably think more whenever I get my next hardware refresh.

I also briefly used Corel Linux, of which I still have the inflatable Corel penguin sitting on my desk at work it has followed me to every job for the past 13 years, still keeps it’s air. I don’t know why I have kept it for so long. Corel Linux was interesting in that they ported some of their own windows apps over to Linux with Wine, their office suite and some graphics programs. They made a custom KDE file manager if I recall right(with built in CIFS/SMB support if I recall right). Other than that it wasn’t much to write home about. Like most things on Linux the desktop apps were very fragile, obviously closed source and so did not last long(compatibility wise could not run them on other systems) after Corel Linux folded. My early Debian systems that I used as desktops at least got butchered by me installing custom stuff on top of them. Linux works best when you stick with the OS packages, and that’s something I did not do in the early days. These days I go to semi extreme lengths to make sure everything (within my abilities) is packaged in a Debian package before installation.

I used to participate a lot in the debian-user mailing list eons ago, though haven’t since due to lack of time. At the time at least that list had massive volume, it was just insane the amount of email I got from it. Looking now, comparing August 2013 roughly 1,300 messages, vs August 2001 almost 6,000! Even more so the spam I got long after I unsubscribed. It persisted for years until I terminated the email address associated with that list. I credit one job offer a bit over ten years ago now to my participation on that(and other) mailing lists at the time, as I specifically called them out in my references.

That being said, despite my devotion to Debian on my home systems (servers at least, this blog runs on Debian 7), I still do prefer Red Hat for commercial/larger scale stuff. Even with the past three years supporting Ubuntu the experience has been ok, I still like RH more. At the same time I do not like RH for my own personal use. It basically comes down to how the system is managed. I was going to go into reasons why I like RH more for this or that, but decided not to since it is off topic for this post.

I’ve never seen Toy Story – the movie characters Debian has used to name it’s releases after since at least 2.0 perhaps longer. Not really my kind of flick, have no intention of ever seeing it really.

Here’s a really old screen shot from my system back in the day. I don’t remember if this is Slackware or Debian, the kernel being compiled 2.1.121 came out in September 1998, so right about the time I made the switch. Looks like I am compiling Gimp 1.01, some version of XFree86, and downloading a KDE snapshot (I think all of that was pre 1.0 KDE). And look, xfishtank in the background! I miss that. These days Gnome and KDE take over the root window making things like xfishtank not visible when using them (last I tried at least). xpenguins is another cool one that does still work with GNOME.

REALLY Old Screenshot

So, happy 20th birthday Debian, it has been interesting to watch you grow up, and it’s nice to see your still going strong.

June 4, 2013

Infoworld suggests radical Windows 8 changes

Filed under: General — Tags: , — Nate @ 8:46 am

Saw this come across on slashdot, an article over at Infoworld on how Microsoft can fix Windows 8.

They suggest ripping out almost all of the new stuff (as defaults) and replacing it with a bunch of new options that users can pick from.

Perhaps I am lucky in that I’ve never used Windows 8 (I briefly held a MS Surface RT in my hands, a friend who is an MS employee got one for free(as did all employees I believe) and handed it to me to show me some pictures on it).

Some of the suggestions from Infoworld sound pretty good to me, though hard to have a firm opinion since I’ve never used the Metro UI (oh, sorry they changed the name to something else).

Windows 8 (as it stands today) certainly sounds pretty terrible from a UI standpoint. The only positives I have read on Windows 8 is people say it is faster. Which isn’t much these days, machines have been fast enough for many years(which at least in part has led to the relative stagnation of the PC market). My computers have been fast enough for years(the laptop I am typing on is almost 3 years old, I plan to keep it around for at least another year as my primary machine — I have another year of on site support so I’m covered from that angle).

It has been interesting to see, that really since XP was released, there haven’t been anything really exciting on the Windows desktop front, it’s a mature product(the results have shown, much like the economy pretty much every OS launch they’ve done has had weaker reception than the previous – Windows 7 sort of an exception from the hard core community but from a broader sense it still seemed weak). It’s come a long way from the mess many of us dealt with in the 90s (and instability in NT4 was one big driver for me to attempt Linux on my primary desktop 15 years ago and I’m still with Linux today).

I don’t use Windows enough to be able to leverage the new features. I’m still used to using the XP interface, so am not fond of many of the new UI things that MS has come up with over the years. Since I don’t use it much,  it’s not critical.

The last time I did use Windows seriously was at a few different companies I had windows as my primary desktop. But you probably wouldn’t know it if you saw it. It was customized with cygwin, and Blackbox for windows. Most recently was about three years ago (company was still on XP at the time). Most of the time my screen was filled with rxvt X terminals (there is a native Windows port for rxvt in cygwin that works wonderfully), and firefox. Sometimes had Outlook open or Visio or in rare cases IE.

Not even the helpdesk IT guy could figure my system out “Can you launch control panel for me?”. It gave it a nice Linux look & feel(I would of killed for proper virtual desktop edge flipping but I never found a solution for that) with the common windows apps.

Ironically enough I’ve purchased more copies of Windows 7 (I think I have 7 now – 2 or 3 licenses are not in use yet – stocked up so I wouldn’t have to worry about Windows 8 for a long time) than all previous MS operating systems combined. I’ve bought more Microsoft software in the past 3-4 years (Visio Pro 2010 is another one) than in the previous decade combined. As my close friends will attest I’m sure – I have not been a “hater” of Microsoft for some time now (12 years ago I threatened to quit if they upgraded from NT4 to Windows 2000 – and they didn’t at least not as long as I was there – those were the days when I was pretty hard core anti MS – I was working on getting Samba-tng and LDAP to replace NT4 – I never deployed the solution, and today of course I wouldn’t bother)

Some new Linux UIs suck too

Microsoft is not alone in crappy UIs though. Linux is right up there too (many would probably argue it always was, that very well could be true, though myself I was fine with what I have used over the years from KDE 0.x to AfterStep to GNOME 1.x/2.x). GNOME 3 (and the new Unity stuff from Ubuntu) looks at least as terrible as the new Microsoft stuff (if not more so).

I really don’t like how organizations are trying to unify the UI between mobile and PC. Well maybe if they did it right I’d like it (not knowing what “right” would be myself).

By the same notion I find it ludicrous that LG would want to put WebOS on a TV! Maybe they know something I don’t though, and they are actually going to accomplish something positive. I love WebOS (well the concept – the implementation needs a lot of work and billions of investment to make it competitive) don’t get me wrong but I just don’t see how there is any advantage to WebOS on a device like a TV. The one exception is ecosystem – if there is an ecosystem of WebOS devices that can seamlessly inter-operate with each other.  There isn’t such an ecosystem today, what’s left has been a rotting corpse for the past two years (yes I still use my HP Pre3 and Touchpad daily). There’s no sign LG has a serious interest in making such an ecosystem, and even if they did, there’s no indication they have the resources to pull it off (I’d wager they don’t).

I haven’t used Unity but last weekend I did install Debian 7 on my server at home (upgraded from 6). 99% of the time from a UI perspective this system just cycles through tens of thousands of images as a massive slide show (at some point I plan to get a 40″+ screen and hang it on my wall as a full sized digital picture frame, I downloaded thousands of nice 1080p images from interfacelift as part of the library).

I was happy to see Debian 7 included a “GNOME 2 like” option, as a moderately customized Gnome 2 is really what I am used to, and I have absolutely positively no interest to change it.

It gets mostly there, maybe 50-75% of the way. First thing I noticed was the new Gnome did not seem to import any of the previous settings. I got a stock look – stock wallpaper, stock settings, and no desktop icons(?). I tried to right click on the desktop to change the wall paper – that didn’t work either. I tried to right click on the menu bar to add some widgets, that didn’t work either. I went from 0 to very annoyed almost immediately. This was with the “compatibility” gnome desktop! Imagine if I had tried to login to regular GNOME 3, I probably would of thrown my laptop against the wall before it finished logging in! 🙂 (fortunately for my laptop’s sake I have never gotten to that point)

Eventually I found the way to restore the desktop icons and the right click on the desktop, I managed to set one of my wonderful high res NSFW desktop backgrounds. I still can’t add widgets to the menu bar I assume it is not possible. I haven’t checked if I can do virtual desktop edge flipping with brightside (or with something built in), I’d wager that doesn’t work either.

I’m not sure what I will do on my main laptop/desktop which are Ubuntu 10.04 which is now unsupported. I hear there are distros/packages out there that are continuing to maintain/upgrade the old Gnome 2 stuff (or have replaced Gnome 3’s UI with Gnome 2), so will probably have to look into that, maybe it will be easy to integrate into Debian or Ubuntu 12.04(or both).

I saw a fantastic comment on slashdot recently that so perfectly describes the typical OSS developer on this stuff

[..]

What X11 is, is old. And developers are bored with it. And they want something new and shiny and a chance to play with the hardware without abstraction throwing a wet blanket over their benchmark scores.

The benchmark of success for Wayland is that _users_ don’t actually notice that anything changed. They’ll fall short of that benchmark because too many people like using X11, and even the backward compatibility inevitably will cause headaches.

But developers will enjoy it more, and in the FOSS world those are the only consumers that matter.

(the last sentence especially)

That was in a conversation related to replacing X11 (the main GUI base for Linux) with something completely different (apparently being developed by some of the same folks that worked on X11) that has been under development for many, many years. Myself I have no issues with X11, it works fine for me. Last time I had major issues with X11 was probably 10+ years ago.

As someone who has worked closely with developers for the past 13 years now I see a lot of this first hand. Often times the outcome is good, many other times not so much.

One system I worked with was so architecturally complex that two people on my team left the company within a year of starting and their primary complaint was the application was too complicated to learn (they had been working with it almost daily for their time there). It was complex for sure(many many sleepless nights and long outages too) – though it didn’t make me want to throw my laptop against the wall like Chef does.

In the case of Microsoft, I found it really funny that one of(if not the) main managers behind Windows 8 suddenly resigned mere weeks after the Windows 8 launch.

October 23, 2012

Should System admins know how to code?

Filed under: linux — Tags: — Nate @ 11:57 am

Just read the source article, and the discussion on slashdot was far more interesting.

It’s been somewhat of a delicate topic for myself, having been a system admin of sorts for about sixteen years now, primarily on the Linux platform.

For me, more than anything else, you have to define what code is. Long ago I drew a line in the sand that I have no interest in being a software developer, I do plenty of scripting in Perl & Bash, primarily for monitoring purposes and to aid in some of the more basic areas of running systems.

Since this blog covers 3PAR I suppose I should start there – I’ve written scripts to do snapshots and integrate them with MySQL (still in use today) and Oracle (haven’t used this side of things since 2008).  This is a couple thousand lines of script (I don’t like to use the word code because to me it implies some sort of formal application). I’d wager 99% of that is to support the Linux end of things and 1% to support 3PAR. One company I was at I left, and turned these scripts over to people who were going to try to take on my responsibility. The folks had minimal scripting experience and their eyes glazed over pretty quick while I walked them through the process. They feared the 1,000 line script. Even though for the most part the system was very reliable and not difficult to recover from failures from, even if you had no scripting experience. In this case to manage snapshots with MySQL (integrated with a storage platform) – I’m not aware of any out of the box tool that can handle this. So you sort of have no choice but to glue your own together. With Oracle, and MSSQL tools are common, maybe even DB2 – but MySQL is left out in the cold.

I wrote my own perl-based tool to login to 3PAR arrays and get their metrics and populate RRD files (I use cacti to present that data – since it has a nice UI, but cacti could not collect data like I can so that stuff is run outside of cacti). Another thousand lines of script here.

Perhaps one of the coolest things I think I wrote was a file distribution system a few years ago to replace a product we used in house that was called R1 Repliweb. Though it looks like they got acquired by somebody else. Repliweb is a fancy file distribution system that primarily ran on Windows, but the company I was at was using the Linux agents to pass files around. I suppose I could write a full ~1200 word post about that project alone(if your interested in hearing that let me know), but basically I replaced it with an architecture of load balancers, VMs, a custom version of SSH, rsync, with some help from CFengine and about 200 lines of script which not only dramatically improved scalability but also reliability went literally to 100%. Never had a single failure (the system was self healing – though I did have to turn off rsync’s auto resume feature because it didn’t work for this project) while I was there (the system was in place about 12-16 months when I left).

So back to the point – to code or not to code. I say not to code (again back to what code means – in my context it means programming – if your directly using APIs then your programming, if your using tools to talk to APIs then your scripting) – for the most part at least. Don’t make things too complicated. I’ve worked with a lot of system admins over the years and the number that can script well, or code is very small. I don’t see that number increasing. Network engineers are even worse – I’ve never seen a network engineer do anything other than completely manually. I think storage is similar.

If you start coding your infrastructure you start making it even more difficult to bring new people on board, to maintain this stuff, and run it moving forward. If you happen to be in an environment that is experiencing explosive growth and your adding dozens or hundreds of servers constantly then yes this can make a lot of sense. But most companies aren’t like that and never will be.

It’s hard enough to hire people these days, if you go about raising the bar to even higher levels your never going to find anyone. I think to the Hadoop end of the market – those folks are always struggling to hire because the skill is so specialized, and there are so few people out there that can do it. Most companies can’t compete with the likes of Microsoft, Yahoo and other big orgs with their compensation and benefits packages.

You will, no doubt spend more on things like software, hardware for things that some fancy DevOps god could do in 10 lines of ruby while they sleep. Good luck finding and retaining such a person though, and if you feel you need redundancy so someone can take a real vacation, yeah that’s gonna be tough. There is a lot more risk, in my opinion in having a lot of code running things if you lack the resources to properly maintain it.  This is a problem even at scale as well. I’ve heard on several occasions – the big Amazon themselves, customized CFengine v1 way back when with so much extra stuff. Then v2 (and since v3)  came around with all sorts of new things, and guess what – Amazon couldn’t upgrade because they had customized it too much. I’ve heard similar things about other technologies Amazon has adopted. They are stuck because they customized it too much and can’t upgrade.

I’ve talked to a ton of system admin candidates over the past year and the number that I feel comfortable being able to take over the “code” on our end I think is fair to say is zero. Granted not even I can handle the excellent code written by my co-worker. I like to tell people I can do simple stuff in 10 minutes on CFengine and it will take me four hours to do things the chef way on chef, my eyes will bleed and my blood will boil in the process.

The method I’d use on CFengine you could say “sucks” compared to Chef, but it works, and is far easier to manage. I can bring  almost anyone up to speed on the system in a matter of hours, vs chef takes a strong Ruby background to use (myself I am going on nearly two and a half years with Chef and I haven’t made much progress other than I feel I can speak with authority on how complex it is).

Sure it can be nice to have APIs for everything, fancy automation everywhere – but you need to pick your battles.  When your dealing with a cloud organization like Amazon you almost have to code – to deal with all of their faults and failures and just overall stupid broken designs and everything that goes along with it. Learning to code makes the experience most likely from absolutely infuriating (where I stand) to almost manageable (costs and architecture aside here).

When your dealing with your own stuff, where you don’t have to worry about IPs changing at random because some host has died, or because you can change your CPU or memory configuration with a few mouse clicks and not have to re-build your system from scratch, the amount of code you need shrinks dramatically, lowering the barriers to entry.

After having worked in the Amazon cloud for more than two years both myself and my co-workers(who have much more experience in it than me) believe that it actually takes more effort and expertise to properly operate something in there vs doing it on your own. It’s the total opposite of how cloud is viewed by management.

Obviously it is easier said than done, just look at the sheer number of companies that go down every time Amazon has an outage or their service is degraded. Most recent one was yesterday. It’s easy for some to blame the customer for not doing the right thing,  at the end of the day though most companies would rather work on the next feature to attract customers and let something else handle fault tolerance. Only the most massive companies have resources to devote to true “web scale” operation. Shoe horning such concepts onto small and medium businesses is just stupid, and the wrong set of priorities.

Someone made a comment recently that made me laugh (not at them, but more at the situation). They said they performed some task to make my life easier in the event we need to rebuild a server (a common occurrence in EC2). I couldn’t help but laugh because we hadn’t rebuilt a single server since we left EC2 (coming up on one year in a few months here).

I think it’s great that equipment manufacturers are making their devices more open, more programmatic. Adding APIs, and other things to make automation easier. I think it’s primarily great because then someone else can come up with the glue that can tie it all together.

I don’t believe system admins should have to interact with such interfaces directly.

At the same time I don’t expect developers to understand operations in depth. Hopefully they have enough experience to be able to handle basic concepts like load balancing(e.g. store session data in some central place, preferably not a traditional SQL database). The whole world often changes from running an application in a development environment to running it in production. The developers take their experience to write the best code that they can, and the systems folks manage the infrastructure (whether it is cloud based or home grown) and operate it in the best way possible.  Whether that means separating out configuration files so people can’t easily see passwords, to inserting load balancers in between tiers, splitting out how application code is deployed,  to something as simple as log rotation scripts.

If you were to look at my scripts you may laugh(depending on your skill level) – I try to keep them clean but they are certainly not up to programmer standards, no I’ve never “use strict” on Perl for example. My scripting is simple so to do things sometimes takes me many more lines than someone more experienced in the trade to do. This has it’s benefits though – it makes it easier for more people to be able to follow the logic should they need to, and it still gets the job done.

The original article seemed to focus more on scripts, while the discussion on slashdot at some points really got into programming with one person saying they wrote apache modules ?!

As one person in the discussion thread on slashdot pointed out, heavy automation can hurt just as much as help. One mistake in the wrong place and you could take the systems down far faster than you can recover them. This has happened to me on more than one occasion of course.  One time in particular I was looking at a CFEngine configuration file, saw some logic that appeared to be obsolete, and removed a single character (a ! which told CFEngine don’t apply that configuration to that class), then CFengine went and wiped out my apache configurations. When I made the change I was very sure that what I was doing was right, but in the end it wasn’t. That happened seven years ago but I still remember it like it was yesterday.

System administrators should not have to program – scripting certainly is handy and I believe that is important(not critical – it’s not at the top of my list  for skills when hiring), just keep an eye out for complexity and supportability when your doing that stuff.

October 15, 2012

Ubuntu 10.04 LTS upgrade bug causes issues

Filed under: linux — Tags: , — Nate @ 11:41 am

[UPDATE] – after further testing it seems it is machine specific I guess my ram is going bad. dag nabbit.

 

I’ve been using Ubuntu for about five years, and Debian I have been using since 1998.

This is a first for me. I came into the office and Ubuntu was prompting to upgrade some packages, I run 10.04 LTS – which is the stable build.  I said go for it, and it tried, and failed.

I tried again, and failed, and again and failed.

I went to the CLI and it failed there too – dpkg/apt was seg faulting –

[1639992.836460] dpkg[31986]: segfault at 500006865496 ip 000000000040b7bf sp 00007fff71efdee0 error 4 in dpkg[400000+65000]
[1640092.698567] dpkg[32069] general protection ip:40b7bf sp:7fff73b2f750 error:0 in dpkg[400000+65000]
[1640115.056520] dpkg[32168]: segfault at 500008599cb2 ip 000000000040b7bf sp 00007fff20fc2da0 error 4 in dpkg[400000+65000]
[1640129.103487] dpkg[32191] general protection ip:40b7bf sp:7fffd940d700 error:0 in dpkg[400000+65000]
[1640172.356934] dpkg[32230] general protection ip:40b7bf sp:7fffbb361e80 error:0 in dpkg[400000+65000]
[1640466.594296] dpkg-preconfigu[32356]: segfault at d012 ip 00000000080693e4 sp 00000000ff9d1930 error 4 in perl[8048000+12c000]
[1640474.724925] apt-get[32374] general protection ip:406a67 sp:7fffea1e6c68 error:0 in apt-get[400000+1d000]
[1640920.178714] frontend[720]: segfault at 4110 ip 00000000080c50b0 sp 00000000ffa52ab0 error 4 in perl[8048000+12c000]

I have a 32-bit chroot to run things like 32-bit firefox, and I had the same problem there. For a moment I thought maybe I have bad ram or something, but turns out that was not the case. There is some sort of bug in the latest apt 0.7.25.3ubuntu9.14 (I did not see a report on it, though the UI for Ubuntu bugs seems more complicated than Debian’s bug system), which causes this. I was able to get around this by:

  • Manually downloading the older apt package (0.7.25.3ubuntu9.13)
  • Installing the package via dpkg (dpkg -i <package>)
  • Exporting the list of packages (dpkg –get-selections >selections)
  • Edit the list, change apt from install to hold
  • Import the list of packages (dpkg –set-selections <selections)
  • apt-get works fine now on 64-bit

However my 32-bit chroot has a hosed package status file, something else that has never happened to me in the past 14 years on Debian. So I will have to figure out how to correct that, or worst case I suppose wipe out the chroot and reinstall it, since it is a chroot, it’s not a huge deal. Fortunately the corruption didn’t hit the 64-bit status file. There is a backed up status file but it was corrupt too (I think because I tried to run apt-get twice).

64-bit status file:

/var/lib/dpkg/status: UTF-8 Unicode English text, with very long lines

32-bit status file:

/var/lib/dpkg/status: data

I’m pretty surprised, that this bug(s) got through. Not the quality I’ve come to know and love ..

September 2, 2012

Some reasons why Linux on the desktop failed

Filed under: linux — Tags: — Nate @ 10:40 pm

The most recent incarnation of this debate seemed to start with a somewhat interesting article over at Wired who talked to Miguel de Icaza who is a pretty famous desktop developer in Linux, mostly famous for taking what seemed to be controversial stances on implementing Microsoft .NET on Linux in the form of Mono.

And he thinks the real reason Linux lost is that developers started defecting to OS X because the developers behind the toolkits used to build graphical Linux applications didn’t do a good enough job ensuring backward compatibility between different versions of their APIs. “For many years, we broke people’s code,” he says. “OS X did a much better job of ensuring backward compatibility.”

It has since blown up a bit more with lots more people giving their two cents. As a Linux desktop user (who is not a developer) for the past roughly 14 years I think I can speak with some authority based on my own experience. As I think back, I really can’t think of anyone I know personally who has run Linux on the desktop for as long as I have, or more to the point hasn’t tried it and given up on it after not much time had passed – for the most part I can understand why.

For the longest time Linux advocates hoped(myself included) Linux could establish a foot hold as something that was good enough for basic computing tasks, whether it’s web browsing, checking email, basic document writing etc. There are a lot of tools and toys on Linux desktops, most seem to have less function than form(?) at least compared to their commercial counterparts. The iPad took this market opportunity away from Linux — though even without iPad there was no signs that Linux was on the verge of being able to capitalize on that market.

Miguel’s main argument seems to be around backwards compatibility, an argument I raised somewhat recently, backwards compatibility has been really the bane for Linux on the desktop, and for me at least it has been just as much to do with the kernel and other user space stuff as it does the various desktop environments.

Linux on the desktop can work fine if:

  • Your hardware is well supported by your distribution – this will stop you before you get very far at all
  • You can live within the confines of the distribution – if you have any needs that aren’t provided as part of the stock system you are probably in for a world of hurt.

Distributions like Ubuntu, and SuSE before it (honestly not sure what, if anything has replaced Ubuntu today) have made tremendous strides in improving Linux usability from a desktop perspective. Live CDs have helped a lot too, to be able to give the system a test run without ever installing it to your HD is nice.

I suspect most people today don’t remember the days when the installer was entirely text based and you had to fight with XFree86 to figure out the right mode lines for your monitor for X11 to work, and fortunately I don’t think really anyone uses dial up modems anymore so the problems we had back when modems went almost entirely to software in the form of winmodems are no longer an issue. For a while I forked out the cash for Accelerated X, a commercial X11 server that had nice tools and was easy to configure.

The creation of the Common Unix Printer System or CUPS was also a great innovation, printing on Linux before that was honestly almost futile with basic printers, I can’t imagine what it would of been like with more complex printers.

Start at the beginning though – the kernel. The kernel is not, and never really has maintained a stable binary interface for drivers over the years. To the point where I can take a generic driver for say a 2.6.x series kernel and use the same driver (singular driver) on Ubuntu, Red Hat, Gentoo or whatever. I mean you don’t have to look further than how many binary kernel drivers VMware includes with their vmware tools package to see how bad this is, on the version of vmware tools I have on the server that runs this blog there are 197 — yes 197 different kernels supported in there

  • 47 for Ubuntu
  • 55 For Red Hat Enterprise
  • 57 for SuSE Linux Enterprise
  • 39 for various other kernels

In an ideal world I would expect maybe 10 kernels for everything, including kernels that are 64 vs 32 bit.

If none of those kernels work, then yes, vmware does include the source for the drivers and you can build it yourself(provided you have the right development packages installed the process is very easy and fast). But watch out, the next time you upgrade your kernel you may have to repeat the process.

I’ve read in the most recent slashdot discussion where the likes of Alan Cox (haven’t heard his name in years!) said the Linux kernel did have a stable interface as he can run the same code from 1992 on his current system. My response to that is..then why do we have all these issues with drivers.

One of the things that has improved the state of Linux drivers is virtualization – it slashes the amount of driver code needed by probably 99% running the same virtual hardware regardless of the underlying physical hardware. It’s really been nice not to have to fight hardware compatibility recently as a result of this.

There have been times where device makers have released driver disks for Linux, usually for Red Hat based systems, however these often become obsolete fairly quickly. For some things perhaps like video drivers it’s not the end of the world, for the experienced user anyways you still have the ability to install a system and get online and get new drivers.

But if the driver that’s missing is for the storage controller, or perhaps the network card things get more painful.

I’m not trying to complain, I have dealt with these issues for many years and it hasn’t driven me away — but I can totally see how it would drive others away very quickly, and it’s too bad that the folks making the software haven’t put more of an effort into solving this problem.

The answer is usually make it open source – in the form of drivers at least, if the piece of software is widely used then making it open source may be a solution, but I’ve seen time and time again source released and it just rots on the vine because nobody has interest in messing with it (can’t blame them if they don’t need it). If the interface was really stable the driver could probably go unmaintained for several years without needing anyone to look at it (at least through the life of say the 2.6.x kernel)

When it comes to drivers and stuff – for the most part they won’t be released as open source, so don’t get your hopes up. I saw one person say that their company didn’t want to release open source drivers because they feared that they might be in violation of someone else’s patents and releasing the source would make it easier for their competition to be able to determine this.

The kernel driver situation is so bad in my opinion that distributions for the most part don’t back port drivers into their previous releases. Take Ubuntu for example, I run 10.04 LTS on my laptop that I am using now as well as my desktop at work.  I can totally understand if the original released version doesn’t have the latest e1000e (which is totally open source!) driver for my network card at work. But I do not understand that more than a year after it’s release it still doesn’t have this driver. Instead you either have to manage the driver yourself (which I do – nothing new for me), or run a newer version of the distribution (all that for one simple network driver?!). This version of the distribution is supported until April 2013. Please note I am not complaining, I deal with the situation – I’m just stating a fact. This isn’t limited to Ubuntu either, it has applied to every Linux distribution I’ve ever used for the most part. I saw Ubuntu recently updated Skype to the latest version for Linux recently on 10.04 LTS, but they still haven’t budged on that driver (no I haven’t filed a bug/support request, I don’t care enough to do it – I’m simply illustrating a problem that is caused by the lack of a good driver interface in the kernel  – I’m sure this applies to FAR more than just my little e1000e).

People rail on the manufacturers for not releasing source, or not releasing specs. This apparently was pretty common back in the 70s and early 80s. It hasn’t been common in my experience since I have been using computers (going back to about 1990). As more and more things are moved from hardware to software I’m not surprised that companies want to protect this by not releasing source/specs. Many manufacturers have shown they want to support Linux, but if you force them to do so by making them build a hundred different kernel modules for the various systems they aren’t going to put the effort in to doing it. Need to lower the barrier of entry to get more support.

I can understand where the developers are coming from though, they don’t have incentive to make the interfaces backwards and forwards compatible since that does involve quite a bit more work(much of it boring), instead prefer to just break things as the software evolves. I had been hoping that as the systems matured more this would become less common place, but it seems that hasn’t been the case.

So I don’t blame the developers…

But I also don’t blame people for not using Linux on the desktop.

Linux would of come quite a bit further if there was a common way to install drivers for everything from network cards to storage controllers, to printers, video cards, whatever, and have these drivers work across kernel versions, even across minor distribution upgrades. This has never been the case though (I also don’t see anything on the horizon, I don’t see this changing in the next 5 years if it changes ever).

The other issue with Linux is working within the confines of the distribution, this is similar to the kernel driver problem – because different distros are almost always more than a distribution with the same software, the underlying libraries are often incompatible between distributions so a binary on one, especially a complex one that is say a KDE or Gnome application won’t work on another, there are exceptions like Firefox, Chrome etc – though other than perhaps static linking in some cases not sure what they do that other folks can’t do. So the amount of work to support Linux from a desktop perspective is really high. I’ve never minded static linking, to me it’s a small price to pay to improve compatibility with the current situation. Sure you may end up loading multiple copies of the libraries into memory(maybe you haven’t heard but it’s not uncommon to get 4-8GB on a computer these days), sure if there is a security update you have to update the applications that have these older libraries as well. It sucks I suppose, but from my perspective it sucks a lot less than what we have now.  Servers – are an entirely different beast, run by(hopefully) experienced people who can handle this situation better.

BSD folks like to tout their compatibility – though I don’t think that is a fair comparison, comparing two different versions of FreeBSD against Red Hat vs Debian are not fair, comparing two different versions of Red Hat against each other with two different versions of FreeBSD (or NetBSD or OpenBSD or DragonFly BSD …etc)  is more fair. I haven’t tried BSD on the desktop since FreeBSD 4.x, for various reasons it did not give me any reasons to continue using it as a desktop and I haven’t had any reasons to consider it since.

I do like Linux on my desktop, I ran early versions of KDE (pre 1.0), up until around KDE 2, then switched to AfterStep for a while, eventually switching over to Gnome with Ubuntu 7 or 8, I forget. With the addition of an application called Brightside, GNOME 2.x works really well for me. Though for whatever reason I have to launch Brightside manually each time I login, setting it to run automatically on login results in it not working.

I also do like Linux on my servers, I haven’t compiled a kernel from scratch since the 2.2 days, but have been quite comfortable working with the issues operating Linux on the server end, the biggest headaches were always drivers with new hardware, though thankfully with virtualization things are much better now.

The most recent issue I’ve had with Linux on servers has been some combination of Ubuntu 10.04, with LVM and ext4 along with enterprise storage. Under heavy I/O have have seen many times ext4 come to a grinding halt. I have read that Red Hat explicitly requires that barriers be disabled with ext4 on enterprise storage, though that hasn’t helped me. My only working solution has been to switch back to ext3 (which for me is not an issue). The symtoms are very high system cpu usage, little to no i/o (really any attempts to do I/O result in the attempt freezing up), and when I turn on kernel debugging it seems the system is flooded with ext4 messages. Nothing short of a complete power cycle can recover the system in that state. Fortunately all of my root volumes are ext3 so it doesn’t prevent someone from logging in and poking around. I’ve looked high and low and have not found any answers. I had never seen this issue on ext3, and the past 9 months has been the first time I have run ext4 on enterprise storage. Maybe a bug specific to Ubuntu, am not sure. LVM is vital when maximizing utilization using thin provisioning in my experience, so I’m not about to stop using LVM, as much as 3PAR’s marketing material may say you can get rid of your volume managers – don’t.

 

July 27, 2012

Microsoft Licenses Linux to Amdocs

Filed under: General — Tags: , — Nate @ 3:00 pm

Microsoft has been fairly successful in strong arming licensing fees from various Android makers, though less successful in getting fees directly from operators of Linux servers.

It seems one large company, Amdocs, has caved in though.

The patent agreement provides mutual access to each company’s patent portfolio, including a license under Microsoft’s patent portfolio covering Amdocs’ use of Linux-based servers in its data centers.

I almost worked for Amdocs way back in the day. A company I was at was acquired by them, I want to say less than two months after I left the company. Fortunately I still had the ability to go back and buy my remaining stock options and got a little payout from it. One of my former co-workers said that I walked away from a lot of money.  I don’t know how much he got but he assured me he spent it quickly and was broke once again! I don’t know many folks at the company still since I left it many years ago, but everything I heard sounds like the company turned out to be as bad as I expected, and I don’t think I would of been able to put up with the politics or red tape for the retention periods following the acquisition as it was already bad enough to drive me away from the company before they were officially acquired.

I am not really surprised Amdocs licensed Linux from Microsoft. I was told an interesting story a few years ago about the same company. They were a customer of Red Hat for Enterprise Linux, and Oracle enticed them to switch to Oracle Enterprise Linux for half the cost they were paying Red Hat. So they opted to switch.

The approval process had to go through something like a dozen layers in order to get processed, and at one point it ends up on the desk of the head legal guys at Amdocs corporate. He quickly sent an email to the new company they just acquired about a year earlier that the use of Linux or any open source software was forbidden and they had to immediately shut down any Linux systems they had. If I recall right this was on a day before a holiday weekend. My former company was sort of stunned and laughed a bit, they had to sent another letter up the chain of command which I think reached the CEO or the person immediately below the CEO of the big parent who went to the lawyer and said they couldn’t shut down their Linux systems because all of the business flowed through Linux, and they weren’t about to shut down the business on a holiday weekend, well that and the thought of migrating to a new platform so quickly was sort of out of the question given all the other issues going on at the time.

So they got a special exclusion to run Linux and some other open source software, which I assume is still run to this day. It was the first of three companies (in a row no less) that I worked at that started out as Microsoft shops, then converted to Linux (in all three cases I was hired on a minimum of 6-12 months after they made the switch).

Another thing the big parent did was when they came over to take over the corporate office they re-wired everything into a secure and insecure networks. The local linux systems were not allowed on the secure network only the insecure one(and they couldn’t do things like check email from the insecure network). They tried re-wiring it over a weekend and if I recall right they were still having problems a week later.

Fun times I had at that company, I like to tell people I took 15 years of experience and compressed it into three, which given some of the resumes I have come across recently 15 years may not be long enough. It was a place of endless opportunity, and endless work hours. I’d do it again if I could go back I don’t regret it, though it came at a very high personal cost which took literally a good five years to recover from fully after I left(I’m sure some of you know the feeling).

I wouldn’t repeat the experience again though – I’m no longer willing to put up with outages that last for 10+ hours(had a couple that lasted more than 24 hours), work weeks that extend into the 100 hour range with no end in sight. If I could go back in time and tell myself whether or not to do it – I’d say do it, but I would not accept a position at a company today after having gone through that to repeat the experience again – just not worth it.  A few years ago some of the execs from that company started a new company in a similar market and tried to recruit a bunch of us former employees pitching the idea “it’ll be like the good ‘ol days”, they didn’t realize how much of a turn off that was to so many of us!

I’d be willing to bet the vast majority of Linux software at Amdocs is run by the company I was at, at last check I was told it was in the area of 2,000 systems (all of which ran in VMware) – and they had switched back to Red Hat Enterprise again.

June 30, 2012

Synchronized Reboot of the Internet

Filed under: linux — Tags: — Nate @ 7:37 pm

[UPDATE – I’ve been noticing some people claim that kernels newer than 2.6.29 are not affected, well I got news for you, I have 200+ VMs that run 2.6.32 that say otherwise (one person in the comments mentions Kernel 3.2 is impacted too!) 🙂 ]

[ UPDATE 2 – this is a less invasive fix that my co-worker has tested on our systems:

date -s "`date -u`"

]
Been fighting a little fire that I’m sure hundreds if not thousands are fighting as well it happened at just before midnight UTC when a leap second was inserted into our systems, and well that seemed to trip a race condition in Linux, that I assume most thought was fixed but I guess people didn’t test it.

[3613992.610268] Clock: inserting leap second 23:59:60 UTC

 

The behavior as I’m sure your all aware of by now is a spike in CPU usage, normally our systems run on average under 8% cpu usage, and this really pegged them up by ten fold. Fortunately vSphere held up and we had the capacity to eat it, the resource pools helped make sure production had it’s share of CPU power. Only minimal impact to the customers, our external alerting never even went off, that was a good sign.

CPU Spike on a couple hundred VMs all at the same time (the above cluster has 441Ghz of CPU resources)

We were pretty lost at first, fortunately my co-worker had a thought maybe it was leap second related, we dug into things more and eventually came across this page (thanks google for being up to date), which confirmed the theory and confirmed we weren’t the only ones impacted by it.  Fortunately our systems were virtualized by a system that was not impacted by the issue so we did not experience any issues on the bare metal only in the VMs. From the page

Just today, Sat June 30th – starting soon after the start of the day GMT. We’ve had a handful of blades in different datacentres as managed by different teams all go dark – not responding to pings, screen blank.

They’re all running Debian Squeeze – with everything from stock kernel to custom 3.2.21 builds. Most are Dell M610 blades, but I’ve also just lost a Dell R510 and other departments have lost machines from other vendors too. There was also an older IBM x3550 which crashed and which I thought might be unrelated, but now I’m wondering.

It wasn’t long after that we started getting more confirmations of the issue from pretty much everyone out there. We haven’t dug into more of a root cause at this point we’ve been busy rebooting Linux VMs which seems to be a good workaround (didn’t need the steps indicated on the page). Even our systems that are up to date with kernel patches and stuff as recently as a month ago were impacted. Red Hat apparently is issuing a new advisory for their systems since they were impacted as well.

Some systems behaved well under the high load, others were so unresponsive they had to be power cycled. There was usually one process that was chewing through an abnormal amount of CPU, for the systems I saw it was mostly Splunk and autofs.  I think it was just coincidence though, perhaps processes that were using CPU at the instant the leap second was inserted into the system.

The internet is in the midst of a massive reboot. I pity the foo who has a massive number of systems and has to co-ordinate some complex massive reboot (unless there is another way – for me reboot was simplest and fastest).

I for one was not aware that a leap second was coming or the potential implications, it’s obvious I’m not alone. I do recall leap seconds in the past not causing issues for any of the systems I managed. I logged into my personal systems including the one that powers this blog, and there are no issues on them. My laptop runs Ubuntu 10.04 as well(same OS rev as the servers I’ve been rebooting for the past 2 hours) and no issues there either (been using it all afternoon).

Maybe someday someone will explain to me in a way that makes sense why we give a crap about adding a second, I really don’t care if the world is out of sync by a few seconds with the rest of the known universe, if it’s that important we should have a scientific time or something, and let the rest of the normal folks go about their way. Same goes for daylight savings time. Imagine the power bill as a result of this fiasco, with 1000s, to 100,000s of servers spiking to 100% CPU usage all at the same time.

Microsoft will have a field day with this one I’m sure 🙂

 

June 17, 2012

The old Linux ABI compatibility argument

Filed under: Random Thought — Tags: — Nate @ 12:45 pm

[WARNING: Rambling ahead]

Was just reading a somewhat interesting discussion over on slashdot (despite the fact that I am a regular there I don’t have an account and have posted as an Anonymous Coward about 4 times over the past twelve years).

The discussion is specifically about Linus’ slamming NVIDIA for their lack of co-operation in open sourcing their drivers and integrating them into the kernel.

As an NVIDIA customer going back to at least 1999 I think when I got a pair of cheap white boxed TNT2 PCI Graphics cards from Fry’s Electronics I can say I’ve been quite happy with their support of Linux. Sure I would love it if it was more open source and integrated and stuff, but it’s not that big of a deal for me to grab the binary drivers from their site and install them.

I got a new Dell desktop at work late last year, and specifically sought out a model that had Nvidia in it because of my positive experience with them (my Toshiba laptop is also Nvidia-based). I went ahead and installed Ubuntu 10.04 64-bit on it, and it just so happens that the Nvidia driver in 10.04 did not support whatever version of graphics chip was in the Dell box – it worked ok in safe mode but not in regular 3D/high performance/normal mode. So I went to download the driver from Nvidia’s site only to find I had no network connectivity. It seems the e1000e driver in 10.04 also did not support the network chip that happened to be in that desktop. So I had to use another computer to track down the source code for the driver and copy it over via USB or something I forget. Ever since that time whenever Ubuntu upgrades the kernel on me I have to boot to text mode to recompile the e1000e driver and re-install the Nvidia driver. As an experienced Linux user this is not a big deal to me. I have read too many bad things about Ubuntu and Unity that I would much rather put up with the pain of the occasional driver re-install than have constant pain because of a messed up UI. A more normal user perhaps should use a newer version of distro that hopefully has built in support for all the hardware (perhaps one of the Ubuntu offshoots that doesn’t have Unity – I haven’t tried any of the offshoots myself).

One of the other arguments is that the Nvidia code taints the kernel, making diagnostics harder – this is true – though I struggle to think of a single time I had a problem where I thought the Nvidia driver was getting in the way of finding the root cause. I tend to run a fairly conservative set of software(I recently rolled back my Firefox 13 64-bit on my Ubuntu at work to Firefox 3.6 32-bit due to massive stability problems with the newer Firefox(5 crashes in the span of about 3 hours)) so system crashes and stuff really aren’t that common.

It’s sad that apparently the state of ATI video drivers on Linux is still so poor despite significant efforts over the years in the open source community to make it better. I believe I am remembering right when in the late 90s Weather.com invested a bunch of resources in getting ATI drivers up to speed to power their graphics on their sets. AMD seems to have contributed quite a bit of stuff themselves. But the results still don’t seem to cut it. I’ve never to my knowledge at least used a ATI video card in a desktop/laptop setting on one of my own systems anyways. I keep watching to see if their driver/hardware situation on Linux is improving but haven’t seen much to get excited about over the years.

From what I understand Nvidia’s drivers are fairly unified across platforms, and a lot of their magic sauce is in the drivers, less of it in the chips. So myself  I can understand them wanting to protect that competitive edge. Provided they keep supplying quality product anyways.

Strangely enough the most recent kernel upgrade didn’t impact the Nvidia driver but still of course broke the e1000e driver. I’m not complaining about that though it comes with the territory (my Toshiba Laptop on the other hand is fully supported by Ubuntu 10.04 no special drivers needed – though I do need to restart X11 after suspend/resume if I expect to get high performance video(mainly in intensive 3D games). My laptop doesn’t travel much and stays on 24×7 not a big deal.

The issue more than anything else is even now after all these years there isn’t a good enough level of compatibility across kernel versions or even across user land. So many headaches for the user would be fixed if this was made more of a priority. The counter argument of course is open source the code and integrate it and it will be better all around. Except unless the product is particularly popular it’s much more likely (even if open source) that it will just die on the vine, not being able to compile against more modern libraries and binaries themselves will just end up segfaulting. Use the source luke, comes to mind here where I could technically try to hire someone to fix it for me (or learn to code myself) but it’s not that important – I wish product X would still work and there isn’t anything realistically I can do to make it work.

But even if the application(or game or whatever) is old and not being maintained anymore it still may be useful to people. Microsoft has obviously done a really good job in this department over the years. I was honestly pretty surprised when I was able to play the game Xwing vs Tie Fighter(1997) on my dual processor Opteron with XP Professional (and reports say it works fine in Windows 7 provided you install it using another OS because the installer is 16-bit which doesn’t work in Windows 7 64-bit). I very well could be wrong but 1997 may of been even before Linux moved from libc5 to glibc.

I had been quietly hoping that as time has gone on that at some point things would stabilize as being good enough for some of these interfaces but it doesn’t seem to be happening. One thing that has seemed to have stabilize is the use of iptables as the firewall of choice on Linux. I of course went through ipfwadm in kernel 2.0, and ipchains in 2.2, then by the time iptables came out I had basically moved on to FreeBSD for my firewalls (later OpenBSD when pf came out). I still find iptables quite a mess compared to pf but about the most complicated thing I have to do with it is transparent port redirection and for that I just copy/paste examples of config I have from older systems. Doesn’t bug me if I don’t end up using it.

Another piece of software that I had grown to like over the years – this time something that really has been open source is xmms (version 1). Basically a lookalike of the popular Winamp software xmms v1 is a really nice simple MP3/OGG player. I even used it in it’s original binary-only incarnation. Version 1 was abandoned years ago(They list Red Hat 9 binaries if that gives you an idea), and version 2 seems to be absolutely nothing remotely similar to version 1. So I’ve tried to stick to version 1. With today’s screen resolutions I like to keep it in double size mode. Here’s a bug report on Debian from 2005 to give you an idea how old this issue is, but fortunately the workaround still works. Xmms still does compile(though I did need to jump through quite a few hoops if I recall right) – for how long I don’t know.

I remember a few months ago wanting to hook up my old Sling boxes again, which are used to stream TV over the internet (especially since I was going to be doing some traveling late last year/this year). I bought them probably 6-7-8 years ago and have not had them hooked up in years. Back then I was able to happily use WINE to install the Windows based Sling Player and watch video. This was in 2006. I tried earlier this year and it doesn’t work anymore. The same version of Sling Player (same .exe from 5+ years ago) doesn’t work on today’s WINE. I wasn’t the only one, a lot of other people had problems too(could not find any reports of it working for anyone). Of course it still worked in XP. I keep the Sling box turned off so it doesn’t wear out prematurely unless I plan to use it. Of course I forgot to plug it in before I went on my recent trip to Amsterdam.

I look at a stack of old Linux games from Loki Software and am quite certain none of them will ever run again, but the windows versions of such games will still happily run(some of them even run in Wine of all things). It’s disappointing to say the least.

I’m sure I am more committed to Linux on the desktop/laptop than most Linux folks out there (that are more often than not using OS X), and I don’t plan to change – just keep chugging along. From the early days of staying up all night compiling KDE 0.something on Slackware to what I have in Ubuntu today..

I’m grateful that Nvidia has been able to put out such quality drivers for Linux over the years and as a result I opt for their chipsets in my Linux laptop/desktops at every opportunity. I’ve been running it (Linux) since I want to say 1998 when my patience with NT4 finally ran out. Linux was the first system I was exposed to at a desktop level that didn’t seem to slow down or become less stable with the more software you loaded on it (stays true for me today as well). I never quite understood what I was doing, or what the OS was doing that would prompt me to re-install from the ground up at least once a year back in the mid 90s with Windows.

I don’t see myself ever going to OS X, I gave it an honest run for about two weeks a couple years ago and it was just so different to what I’m used to I could not continue using it, even putting Ubuntu as the base OS on the hardware didn’t work because I couldn’t stand the track pad (I like the nipple, who wouldn’t like a nipple? My current laptop has both and I always use the nipple) and the keyboard had a bunch of missing keys. I’m sure if I tried to forget all of my habits that I have developed over the years and do things the Apple way it could of worked but going and buying a Toshiba and putting Ubuntu 10.04 on it was (and remains) the path of least resistance for me to becoming productive on a new system (second least(next to Linux) resistance is a customized XP).

I did use Windows as my desktop at work for many years but it was heavily, heavily customized with Blackbox for windows as well as cygwin and other tools. So much so that the IT departments didn’t know how to use my system(no explorer shell, no start menu). But it gave windows a familiar feel to Linux with mouse over activation (XP Power toys – another feature OS X lacked outside of the terminal emulator anyways), virtual desktops (though no edge flipping). It took some time to configure but once up and going it worked well. I don’t know how well it would work in Windows 7, the version of BB I was using came out in 2004/2005 time frame, there are newer versions though.

I do fear what may be coming down the pike from a Linux UI perspective though, I plan to stick to Ubuntu 10.04 for as long as I can. The combination of Gnome 2 + some software called brightside which allows for edge flipping(otherwise I’d be in KDE) works pretty well for me(even though I have to manually start brightside every time I login, when it starts automatically it doesn’t work for some reason. The virtual desktop implementation isn’t as good as Afterstep, something I used for many years but Gnome makes up for it in other areas where Afterstep fell short.

I’ve gotten even more off topic than I thought I would.

So – thanks Nvidia for making such good drivers over the years, because of them it’s made Linux on the desktop/laptop that much easier to deal with for me. The only annoying issue I recall having was on my M5 laptop, which wasn’t limited to Linux and didn’t seem specific to Nvidia (or Toshiba).

Also thank you to Linus for making Linux and getting it to where it is today.

April 5, 2012

Built a new computer – first time in 10 years

Filed under: linux,Random Thought — Tags: , — Nate @ 8:51 am

I was thinking about when the last time I built a computer from scratch this morning and I think it was about ten years ago, maybe longer – I remember the processor was a Pentium 3 800Mhz. It very well may of been almost 12 years ago. Up until around 2004 time frame I had built and re-built computers re-using older parts and some newer components, but as far as going out and buying everything from scratch, it was 10-12 years ago.

I had two of them, one was a socket-based system the other was a “Slot 2“-based system. I also built a couple systems around dual-slot (Slot 1) Pentium 2 systems with the Intel L440GX+ Motherboard (probably my favorite motherboard of all time). For those of you think that I use nothing but AMD I’ll remind you that aside from the AMD K6-3 series I was an Intel fanboy up until the Opteron 6100 was released. I especially liked the K6-3 for it’s on chip L2 cache, combined with 2 Megabytes of L3 cache on the motherboard it was quite zippy. I still have my K6-3 CPU itself in a drawer around here somewhere.

So I decided to build a new computer to move my file serving functions out of my HP xw9400 workstation which I bought about a year and a half ago into something smaller so I could turn the HP box into something less serious to play some games on my TV on (like WC: Saga!). Maybe get a better video card for it I’m not sure.

I have a 3Ware RAID card and 4x2TB disks in my HP box so I needed something that could take that. This is what I ended up going with, from Newegg –

Seemed like an OK combination, the case is pretty nice having a 5-port hot swap SATA backplane, supporting up to 7 HDDs. PC Power & Cooling (I used to swear by them and so thought might as well go with them again) had a calculator and said for as many HDDs as I had to get a 500W so I got that.

There is a lot of new things here that are new to me anyways and it’s been interesting to see how technology has changed since I last did this in the Pentium 3 era.

Mini ITX. Holy crap is that small. I knew it was small based on dimensions but it really didn’t set in until I held the motherboard box in my hand and it seemed about the same size as a retail box for a CPU 10 years ago. It’s no wonder the board uses laptop memory. The amount of integrated features on it are just insane as well from ethernet to USB 2 and USB 3, eSATA, HDMI, DVI, PS/2, optical audio output, analog audio out, and even wireless all crammed into  that tiny thing. Oh! Bluetooth is thrown in as well. During my quest to find a motherboard I even came across a motherboard that had a parallel port on it – I thought those died a while ago. The thing is just so tiny and packed.

On the subject of motherboards – the very advanced overclocking functions is just amazing. I will not overclock since I value stability over performance, and I really don’t need the performance in this box. I took the overclocking friendliness of this board to hopefully mean higher quality components and the ability to run more stable at stock speeds. Included tweaking features –

  • 64-step DRAM voltage control
  • Adjustable CPU voltage at 0.00625V increments (?!)
  • 64-step chipset voltage control
  • PCI Express frequency tuning from 100Mhz up to 150Mhz in 1Mhz increments
  • HT Tuning from 100Mhz to 550Mhz in 1Mhz increments
  • ASUS C.P.R. (CPU Parameter Recall) – no idea what that is
  • Option to unlock the 4th CPU core on my CPU
  • Options to run on only 1,2 or or all 3 cores.

Last time I recall over clocking stuff there was maybe 2-3 settings for voltage and the difference was typically at least 5-15% between them. I remember the only CPU I ever over clocked was a Pentium 200 MMX (o/c to 225Mhz – no voltage changes needed ran just fine).

I seem to recall from a PCI perspective, back in my day there was two settings for PCI frequencies, whatever the normal was, and one setting higher(which was something like 25-33% faster).

ASUS M4A88T-I Motherboard

Memory – wow it’s so cheap now, I mean 8GB for $45 ?! The last time I bought memory was for my HP workstation which requires registered ECC – and it was not so cheap ! This system doesn’t use ECC of course. Though given how dense memory has been getting and the possibility of memory errors only increasing I would think at some point soon we would want some form of ECC across the board ? It was certainly a concern 10 years ago when building servers with even say 1-2GB of memory now we have many desktops and laptops coming standard with 4GB+. Yet we don’t see ECC on the desktops and laptops – I know because of cost but my question is more around there doesn’t appear to be a significant (or perhaps in some cases even noticeable) hit in reliability of these systems with larger amounts of memory without ECC which is interesting.

Another thing I noticed was how massive some video cards have become, consuming as many as 3 PCI slots in some cases for their cooling systems. Back in my day the high end video cards didn’t even have heat sinks on them! I was a big fan of Number Nine back in my day and had both their Imagine 128 and Imagine 128 Series 2 cards, with a whole 4MB of memory (512kB chips if I remember right on the series 2, they used double the number of smaller chips to get more bandwidth). Those cards retailed for $699 at the time, a fair bit higher than today’s high end 3D cards (CAD/CAM workstation cards excluded in both cases).

Modular power supplies – the PSU I got was only partially modular but it was still neat to see.

I really dreaded the assembly of the system since it is so small, I knew the power supply was going to be an issue as someone on Newegg said that you really don’t want a PSU that is longer than 6″ because of how small the case is. I think PC Power & Cooling said mine was about 6.3″(with wiring harness). It was gonna be tight — and it was tight. I tried finding a shorter power supply in that class range but could not. It took a while to get the cables all wrapped up. My number one fear of course after doing all that work, hitting the power button and find out there’s a critical problem (bought the wrong ram, bad cpu, bad board, plugged in the power button the wrong way whatever).

I was very happy to see when I turned it on for the first time it lit up and the POST screen came right up on the TV. There was a bad noise comming from one of the fans because the cable was touching it, so I opened it up again and tweaked the cables more so they weren’t touching the fan, and off I went.

First, without any HDs just to make sure it turned on, the keyboard worked, I could get into the BIOS screen etc. All that worked fine, then I opened up the case again and installed an old 750GB HD in one of the hot swap slots. Hooked up a USB CDROM with a CD of Ubuntu 10.04 64-bit and installed it on the HD.

Since this board has built in wireless I was looking forward to trying it out – didn’t have much luck. It could see the 50 access points in the area but it was not able to login to mine for some reason, I later found that it was not getting a DHCP response so I hard wired an IP and it worked — but then other issues came up like DNS not working, very very slow transfer speeds(as in sub 100 BYTES per second), after troubleshooting for about 20 minutes I gave up and went wired and it was fast. I upgraded the system to the latest kernel and stuff but that didn’t help the wireless. Whatever, not a big deal I didn’t need it anyways.

I installed SSH, and logged into it from my laptop,  shut off X-Windows, and installed the Cerberus Test Suite (something else I used to swear by back in the mid 00s). Fortunately there is a packaged version of it for Ubuntu as, last I checked it hasn’t been maintained in about seven years. I do remember having problems compiling it on a 64-bit RHEL system a few years ago (Though 32-bit worked fine and the resulting binaries worked fine on 32-bit too).

Cerberus test suite (or ctcs as I call it), is basically a computer torture test. A very effective one, the most effective I’ve ever used myself. I found that if a computer can survive my custom test (which is pretty basic) for 6 hours then it’s good, I’ve run the tests as long as 72 hours and never saw a system fail in a period more than 6 hours. Normally it would be a few minutes to a couple hours. It would find problems with memory that memtest wouldn’t find after 24 hours of testing.

What cerberus doesn’t do, is it doesn’t tell you what failed or why, if your system just freezes up you still have to figure it out. On one project I worked on that had a lot of “white box” servers in it, we deployed them about a rack at a time, and I would run this test, maybe 85% of them would pass, and the others had some problem, so I told the vendor go fix it, I don’t know what it is but these are not behaving like the others so I know there is a issue. Let them figure out what component is bad (90% of the time it was memory).

So I fired up ctcs last night, and watched it for a few minutes, wondering if there is enough cooling on the box to keep it from bursting into flames. To my delight it ran great, with the CPU topping out at around 54C (honestly have no idea if that is good or not, I think it is OK though). I ran it for 6 hours overnight and no issues when I got up this morning. I fired it up again for another 8 hours (the test automatically terminates after a pre defined interval).

I’m not testing the HD, because it’s just a temporary disk until I move my 3ware stuff over.  I’m mainly concerned about the memory, and CPU/MB/cooling. The box is still running silent (I have other stuff in my room so I’m sure it makes some noise but I can’t hear it). It has 4 fans in it including the CPU fan. A 140mm, a 120mm and the PSU fan which I am not sure how big that is.

My last memory of ASUS was running on an Athlon with an A7A-266 motherboard(I think in 2000), that combination didn’t last long. The IDE controller on the thing corrupted data like nobody’s business. I would install an OS, and everything would seem fine then the initial reboot kicked in and everything was corrupt. I still felt that ASUS was a pretty decent brand maybe that was just specific to that board or something. I’m so out of touch with PC hardware at this level the different CPU sockets,  CPU types, I remember knowing everything backwards and forwards in the Socket 7 days, back when things were quite interchangeable. Then there was my horrible year or two experience in the ABIT BP-6, a somewhat experimental dual socket Celeron system. What a mistake that was, oh what headaches that thing gave me. I think I remember getting it based on a recommendation at Tom’s Hardware guide, a site I used to think had good information (maybe it does now I don’t know). But that experience with the BP6 fed back into my thoughts about Tom’s hardware and I really didn’t go back to that site ever again(sometimes these days I stumble upon it on accident). I noticed a few minutes ago that Abit as a company is out of business now, they seemed to be quite the innovator back in the late 90s.

Maybe this weekend I will move my 3ware stuff over and install Debian (not Ubuntu) on the new system and set it up. While I like Red Hat/CentOS for work stuff, I like Debian for home. It basically comes down to if I am managing it by hand I want Debian, if I’m using tools like CFEngine to manage it I want RH. If it’s a laptop, or desktop then it gets Ubuntu 10.04 (I haven’t seen the nastiness in the newer Ubuntu release(s) so not sure what I will do after 10.04).

I really didn’t think I’d ever build a computer again, until this little side project came up.

Another reason I hate SELinux

Filed under: linux,Random Thought — Tags: , , — Nate @ 7:43 am

I don’t write too much about Linux either but this is sort of technical I guess.

I’ve never been a fan of SELinux. I’m sure it’s great if your in the NSA, or the FBI, or some other 3 letter agency, but for most of the rest of the people it’s a needless pain to deal with, and provides little benefit.

I remember many moons ago back when I dealt with NT4, encountering situations where I, as an administrator could not access a file on the NTFS file system. It made no sense – I am administrator – get me access to that file – but no, I could not get access. HOWEVER, I could change the security settings and take ownership of the file NOW I can get access. Since I have that right to begin with it should just give me access and not make me jump through those hoops. That’s what I think at least. I recall someone telling me back in the 90s that Netware was similar and even went to further extremes where you could lock the admin out of files entirely, and in order to back data up you had another backup user which the backup program used and that was somehow protected too. I can certainly understand the use case, but it certainly makes things frustrating. I’ve never been at a company that needed anywhere remotely that level of control (I go out of my way to avoid them actually since I’m sure that’s only a small part of the frustrations of being there).

On the same token I have never used (for more than a few minutes anyways) file system ACLs on Linux/Unix platforms either. I really like the basic permissions system it works for 99.9% of my own use cases over the years, and is very simple to manage.

I had a more recent experience that was similar, but even more frustrating on Windows 7. I wanted to copy a couple files into the system32 directory, but no matter what I did (including take ownership, change permissions etc) it would not let me do it. It’s my #$#@ computer you piece of #@$#@.

Such frustration is not limited to Windows however, Linux has it’s own similar functionality called SE Linux, which by default is turned on in many situations. I turn it off everywhere, so when I encounter it I am not expecting it to be on, and the resulting frustration is annoying to say the least.

A couple weeks ago I installed a test MySQL server, and exposed a LUN to it which had a snapshot of a MySQL database from another system. My standard practice is to turn /var/lib/mysql into a link which points to this SAN mount point. So I did that, and started MySQL …failed. MySQL complained about not having write access to the directory. So I spent the next probably 25 minutes fighting this thing only to discover it was SE Linux that was blocking access to the directory. Disable SE Linux, reboot and MySQL came up fine w/o issue.  #@$#@!$

Yesterday I had another, more interesting encounter with SE Linux. I installed a few CentOS 6.2 systems to put an evaluation of Vertica on. These were all built by hand since we have no automation stuff to deal with CentOS/RH, everything we have is Ubuntu. So I did a bunch of basic things including installing some SSH keys so I could login as root w/my key. Only to find out that didn’t work. No errors in the logs, nothing just rejected my key. I fired up another SSH daemon on another port and my key was accepted no problem. I put the original SSH daemon in debug mode and it gave nothing either just said rejected my key. W T F.

After fighting for another probably 10 minutes I thought, HEY maybe SE Linux is blocking this, and I checked and SE Linux was in enforcing mode. So I disabled it, and rebooted – now SSH works again. I didn’t happen to notice any logs anywhere related to SE Linux and how/why it was blocking this, and only blocking it on port 22 not on any other ports(I tried two other ports), but there you have it, another reason to hate SE Linux.

You can protect your system against the vast majority of threats fairly easily, I mean the last system I dealt with that got compromised was a system that sat out on the internet (with tons of services running) that hadn’t had an OS upgrade in at least 3 years. The system before that I recall was another Linux host(internet-connected as well – it was a firewall) – this time back in 2001 and probably hadn’t had upgrades in a long time either.  The third – a FreeBSD system that was hacked because of me really – I told my friend who ran it to install SSH as he was using telnet to manage it. So he installed SSH and SSH got exploited (back in 2000-2001). I’ve managed probably 900-1000 different hosts over that time frame without an issue. I know there is value in SE Linux, just not in the environments I work in.

Oh and while I’m here, I came across a new feature in CentOS 6.2  yesterday which I’m sure probably applies to RHEL too. When formatting an ext4 file system by default it discards unused blocks. The man page says this is good for thin provisioned file systems and SSDs. Well I’ll tell you it’s not good for thin provisioned file systems, the damn thing sent 300 Megabytes a second of data (450-500,000+ sectors per second according to iostat) to my little storage array with a block size of 2MB (never seen a block size that big before), which had absolutely no benefit other than to flood the storage interfaces and possibly fill up the cache. I ran this on three different VMs at the same time. After a few seconds my front end latency on my storage went from 1.5-3ms to 15-20ms. And the result on the volumes themselves? Nothing, there was no data being written to them. So what’s the point? My point is disable this stupid function with the -K option when running mke2fs on CentOS 6.2. On Ubuntu 10.04 (what we use primarily), it uses ext4 too, but it does not perform this function when a file system is created.

Something that was strange when this operation ran, and I have a question to my 3PAR friends on it – is the performance statistics for the virtual LUN showed absolutely no data flowing through the system, but the performance stats for the volume itself were there(a situation I have never seen before in 6 years on 3PAR), and the performance stats of the fibre channel ports were there, there was no noticeable hit on back end I/O  that I could see, so the controllers were eating it. My only speculation is because RHEL/CentOS 6 has built in support for SCSI UNMAP that these commands were actually UNMAP commands rather than actual data. I’m not sure though.

Older Posts »

Powered by WordPress