TechOpsGuys.com Diggin' technology every day

February 17, 2012

Random Irony – MySQL wanting .NET

Filed under: Random Thought — Tags: — Nate @ 10:22 pm

So I am here late at night waiting for Citrix to call me back about a issue on one of my Netscalers so I went to install an eval of the 3PAR System Reporter — last time I used it it was windows-only, even though it was based off of Apache, MySQL and some CGI. Out of instinct I installed a windows VM to install this on and since it is an eval I am just going to put MySQL on the same VM because well our array is small and there just isn’t enough data to tax it. Later I found out I can use Linux now! But for Linux it needs 32-bit, so for this eval I’ll stick to windows since I’m almost done setting it up and I don’t have 32-bit Linux handy at the moment.

I’ve never been a big fan of system reporter, it seems very much like the tools from EMC anyways, and I’ve heard similar about the tools from HDS. So I guess it is typical, which is probably why I am not fond of it. But installing it is faster and easier than getting my home grown stuff online. So in the meantime I’ll use it till I get some spare time to install my custom stuff (which uses perl scripts, rrdtool and cacti to display the data – along with a lot of time to create the graphs).

So I was installing some things and went to install MySQL, much to  my surprise, it wanted .NET. Oracle owns MySQL, they own Java, MySQL is of course open source, and I’m sure it’s got more deployments on Linux and stuff than Windows so it puzzles me after all that, that the MySQL folks go out of their way to use .NET for whatever they are using it for.

Thought it was funny anyways

Where's my NET?

Little update – got a cryptic error while trying to install system reporter.. so I guess here comes another support case for HP!

February 16, 2012

30 billion packets per second

Filed under: Networking — Tags: — Nate @ 11:28 pm

Still really busy getting ready to move out of the cloud. Ran into a networking issue today noticed my 10GbE NICs are going up and down pretty often. Contacted HP and apparently these Qlogic 10GbE NICs have overheating issues and the thing that has fixed it for other customers is to set the bios to increased cooling, basically ramp up those RPMs (as well as a firmware update, which is already applied) Our systems may have more issues given that each server has two of these NICs right on top of each other. So maybe we need to split them apart, waiting to see what support says. There haven’t been any noticeable issues with the link failures since everything is redundant across both NICs (both NIC ports on the same card seem to go down at about the same time).

Anyways, you know I’ve been talking about it a lot here, and it’s finally arrived (well a few days ago), the Black Diamond X-Series (self proclaimed world’s largest cloud switch). Officially announced almost a year ago, their marketing folks certainly liked to grab the headlines early. They did the same thing with their first 24-port 10GbE stackable switch.

The numbers on this thing are just staggering, I mean 30 billion packets per second, a choice of either switching fabric based on 2.5Tbps (meaning up to 10Tbps in the switch), or 5.1Tbps (with up to 20Tbps in the switch). They are offering both 48x10GbE line cards as well 12 and 24x40GbE line cards (all line rate, duh). You already know it has up to 192x40GbE or 768x10GbE(via breakout cables) in a 14U footprint – half the size of the competition which was already claiming to be a fraction of the size of their other competition.

5Tbps Switching fabric module (max 4 per switch)

 

Power rated at 5.5 watts per 10GbE port or 22W per 40GbE port.

They are still pushing their Direct Attach approach, support for which still hasn’t exactly caught on in the industry. I didn’t really want Direct Attach but I can see the value in it. They apparently have a solution for KVM ready to go, though their partner in crime that supported VMware abandoned the 10GbE market a while ago (it required special NIC hardware/drivers for VMware). I’m not aware of anyone that has replaced that functionality from another manufacturer.

I found it kind of curious they rev’d their Operating system 3 major versions for this product. (from version 12 to 15). They did something similar when they re-wrote their OS about a decade ago (jumped from version 7 to 10).

Anyways not much else to write about, I do plan to write more on the cloud soon just been so busy recently – this is just a quick post / random thought.

So the clock starts – how long until someone else comes out with a midplane-less chassis design ?

February 3, 2012

Don’t push code on a Friday damnit

Filed under: General,Random Thought — Tags: — Nate @ 9:35 pm

I hate it when people want to push code on a Friday. Here it is, Friday, I was working to wind up a few last tasks before going home when my phone went off saying part of my company’s site was not working right.

After some investigation with a developer we discovered it was an issue with Facebook (ugh, how I hate thee) and their code was breaking because Facebook was broken (they were not aware this would happen I am sure they will fix it going forward).

So while they are working to work around the issue I ran a search for it, and it seems to be a more wide spread problem caused by a Facebook software deployment done today, Friday at nearly 7PM!

F F S

My monitor first detected the failure at 6:49 PM so they weren’t even done deploying by the time it failed. It was intermittant for a few minutes then went hard down at around 7:11PM.

@$#$ facebook. Thanks for screwing me, and who knows how many others by deploying code on a Friday night.

Of course the developers on our end deserve some of the heat as well. But I am nit picking about code deployments on a Friday, not code bugs..

IBM shows it still has some horses left

Filed under: Storage — Tags: , , — Nate @ 11:15 am

I noticed a few days ago that IBM posted some new SPC-1 results based on their SVC system, this time using different back end storage- their Storwize product (something I had not heard of before but I don’t pay too close of attention to what IBM does they have so many things it’s hard to keep track).

The performance results are certainly very impressive, coming in at over 520,000 IOPS  at a price of $6.92 per IOP. This is the sort of results I was kind of expecting from the Hitachi VSP a while back. IBM tested with 1,920 drives the same number as the 3PAR V800. They bested the 3PAR performance by a good 70,000 IOPS with half the latency on the same number of disks and less data cache.

The capacity numbers were, and still are, sort of difficult to interpret they seem to give conflicting information. IBM is using ~138TB of disk space to protect ~99TB of disk space. While 3PAR is using ~263TB of disk space to protect ~263TB of disk space. Both results say there is 30TB+ of “unused storage” in that protection scheme.

Bottom line is the IBM box is presented with roughly 280TB of storage, and of that, 100TB is usable, or about 35%. That brings their cost per usable TB number to $36,881/TB vs the 3PAR V800 which is roughly $12,872. The V800 I/O cost was $6.59, which IBM comes real close to.

IBM has apparently gone the same route as HDS in the only 3.5″ drives they support on their Storwize systems are 3TB SATA disks. They hamper their own cost structure by not supporting larger 3.5″ 15k RPM SAS disks, which just doesn’t make sense to me. There are 300GB 15k SAS drives out and Storwize doesn’t support those either(yet at least).

It took about five pages of scripting to configure the system from the looks of the full disclosure report.

Certainly looks like a halfway decent system. I mean if you compare it to the VSP for example it has the same array virtualization abilities with the SVC, it is sporting almost double the amount of disk drives, almost double the raw performance, configuration at least appears to be less complicated. It uses those power efficient 2.5″ disks just like the VSP. It also costs quite a bit less than the VSP both on per-IOP and per-TB basis. It also appears to have mainframe support for those that need that. From the looks of Seagate 15k RPM disks at least the 2.5″ drives have an average of 15% less latency for random reads and writes than their 3.5″ counterparts. I thought the difference might be bigger than that given how much less distance the disk heads have to travel.

If I was in the market for such a big system, these results wouldn’t lead me away from 3PAR, at least based on the pricing disclosed of each system (and level of complexity to configure). I was interviewing a candidate a few weeks ago and this guy had a strong storage background. Having worked for Symantec I think for a while he was doing some sort of consulting at various companies for storage. I asked him how he provisioned storage, what his strategies were. His response was quite surprising. He said usually the vendors come out, deploy their systems and provision everything up front, all he does is carve out LUNs and present them to users. He had never been involved in the architecture planning or deployment of a storage system. He acted as if what he was doing was the standard practice (maybe it is at large companies I’ve never worked at such an organization), and that it was perfectly normal.

But it certainly seems like a good system when put up against at least the VSP, and probably the V-MAX too.

I’ve always been interested in the SVC by itself, certainly seems like a cool concept, I’ve never used one of course but having the ability to cluster at that intermediate level(in this case a 8-node cluster which may be the max I’m not sure) and then scale out storage behind it. Clearly they’ve shown with this you can pump one hell of a lot of I/O through the thing. They also seem to have SSD tiering support built into it which is nice as well.

Hopefully HP can come up with something similar at some point, as much as they talk smack about the likes of SVC today.

Making the easy stuff hard, the hard stuff possible

Filed under: linux — Tags: — Nate @ 5:16 am

First off, sorry for being away for so long, I’ve been really, really busy preparing a new data center deployment to migrate my company out of the cloud into. The last time I did anything remotely resembling this was in 2007, though this time there are some extra layers involved that I didn’t have back then. It is certainly an interesting experience though configuring the software and infrastructure from the absolute ground up, having nothing to base it off of (other than past experience obviously!). I mean we have our stuff in a public cloud now but there are so many things that are different from an infrastructure perspective that little of it transfers over.

I wanted to write about a sort of topic that I haven’t really written about before. It’s about a systems management tool named Chef from a Seattle-based company named Opscode. It’s supposed to be a next generation tool that is supposed to make your life easier, more advanced than older tools like Puppet and Cfengine.

I’ll start off by saying I have a very strong background in Cfengine, having had used it since late 2004, at three different companies. My techniques and approaches evolved significantly over the years, and my last deployment was quite good in my opinion considering I had to adapt an existing Cfengine deployment made by folks who didn’t know what they were doing into something that worked well, and doing so in a 4 nines environment. That was not easy, as you know one wrong command or config in one of these tools can wreck havoc as I know first hand. I grew to like Cfengine a lot, and there was really nothing that I needed it to do that it couldn’t do for me. I knew it’s limitations well and it was simple to use.

I was introduced to Chef in the summer of 2010 when I went to the headquarters of Opscode and met their senior staff including one of the co-founders I believe. They gave us their powerpoint presentation on what Chef was, how it worked, what it could do, why it exists.

It certainly came across as a very impressive tool, being able to do tons of things that Cfengine could not do, had a lot of concepts that sounded like they could be useful. At the same time however it looked incredibly complicated.

I raised my concerns with their senior staff on that very first day and we had about a 15 minute discussion on it. I’m not a programmer, nor do I ever intend to be. I have a very big line that I refuse to cross from scripting tools in perl & bash to help make my life easier to full on code. A developer at my company constantly jokes that I say I am not a programmer yet I come up with complicated regexes and scripts to do things they don’t understand how to come up with on their own.

They tried to re-assure me that learning Chef is no different than learning the syntax of an Apache configuration file, or DNS or something like that. I didn’t really buy it, but was still willing to give the tool a shot since it sounded like a nice level of systems management that you could achieve with it. I still joke with my co-workers and current boss( who was my boss at the time too) on this very topic, they all remember that conversation to this day.

Chef is written in Ruby, and is very Ruby-centric. I guess you could say I am very biased against Ruby given my past experience supporting Ruby (on Rails) applications.

So here I am, almost 18 months later and things haven’t changed much. My dislike of Ruby continues, and is perhaps even stronger now having used Chef.

My first chef implementation about a year ago was fraught with frustration at almost every turn. I could(and still can) see the promise in the tools it provides the user with but it’s just so difficult to work with especially coming from a Cfengine background(and lack of programming experience) that for my first iteration I dumbed it down a whole bunch, making the logic very Cfengine like, at least as much as I could. I didn’t use any data bags, any attributes, no templates, nothing like that. I had (and still have) a very hard time finding usable examples for many things in Chef. They have a big repository of sample cookbooks – but to me for the most part those are not usable, because while examples they don’t go into details as to specifically, literally what each line of code does. Chef apparently uses this for it’s template language, I looked at it a couple of times – and really I could not make heads or tails of it.

I like to tell people that Chef makes the easy things hard, and the hard things possible. It seems very clear to me that they attacked the hard things in system management first before addressing the easy things. I remember seeing something in their documentation around the concept of the holy grail in the single instance copy, which fits along those lines well. The idea is you have one small bit of code that can be adapted to (m)any environments and situations, using the templates to pull attributes and values from data bags or other sources to make something on the fly.

The concept is novel for sure, coming from a Cfengine background I am very used to duplicating config stanzas, for different environments, making static config files, one for each environment or something like that. I’ve been doing it so long it’s second nature.

Where the opscode folks and I seem to part ways is our priorities. Their priority is to turn the system management into code and automate it to the point where it scales to a million systems. Mine is less ambitious, I want it to be easy to manage and it can scale to a few thousand systems at the most, since going beyond that gets so cookie cutter that it’s not fun anymore. I can certainly see the value of such an approach when dealing with massive environments that are changing all the time. Most companies though this situation doesn’t exist – most companies things are fairly static, you get a new system here and there, you get a new environment maybe once a quarter at the most. Maybe some big project comes along that increases your system count by a large amount for some special purpose.

I have absolutely no problem in maintaining separate config files for each environment and having different config stanzas in the config management tool to push those files out. Not only is this approach simpler (in my view) it gives much more, insight – perhaps is a good word into what is actually happening. I mean if you have a template filled with things that are pulling values dynamically from a half dozen or more different sources you really have no idea what that file really looks like until it lands on the server in question. I like to be able to open the file and look at the settings rather than hunt down the various flags and values that can come from these various sources chef provides.

I’m not building new environments every day, the level of change in general is quite small (as it has been over the past decade at companies I have worked at), I don’t need the level of dynamic abilities that Chef provides because it doesn’t help me that much.

I came up with a new saying a few months ago after dealing with Chef. If it’s not friends with sed, awk and grep then it’s not friends with me. Chef, being very developer-centric uses a lot of JSON to store and manage it’s various configurations. JSON is very much not friendly to sed, awk and grep, and so it frustrates me greatly whenever I have to deal with it.

Because we are moving into a self managed data center environment we needed a way to provision systems. My background is Red Hat/CentOS, Kickstart and Cfengine. We have Ubuntu, <nothing>, and Chef. I came up with a system that for now uses VMware templates (my first ever use of VMware templates) and some custom scripting to integrate with Chef and do other provisioning tasks. It works, it’s not as nice as Kickstart but it works. So speaking of this, and JSON there is a bootstrap process Chef needs to do in order to get itself registered and stuff with the Chef service. This involves creating a bit of JSON that Chef can read. The standard way of Chef bootstrap is a sort of push approach, where there is a management agent that waits for a system to be provisioned, then ssh’s to the system and runs a bunch of stuff. I wanted a pull approach, where the system is provisioned and boots up and configures itself. So I came up with this little bash snippet to construct this JSON file

echo -n "Making first-boot.json ..."
echo -n "{ \"run_list\": [ ">/etc/chef/first-boot.json;
export ROLES=`grep ROLE /root/00-50-* |head -n 1 | sed s'/.*=//'g | sed s'/,/ /'g` &&
for ROLE in $ROLES; do echo -n \"role[${ROLE}]\",;done | sed s'/\,$//'g >>/etc/chef/first-boot.json;
echo -n " ] }" >>/etc/chef/first-boot.json

That /root/00-50-* file is a configuration file named after the MAC address of the VM. This is based on my older kickstart stuff which has been extended to support Chef. It stores things like IP address, Host name, default gateway, for the network, then Chef environment, Chef Role(s), and Chef Organization. It’s a simple text file format, that looks like VARIABLE=value, one VARIABLE per line.

My point with pasting that code is the ugly length I have to go through to simulate valid JSON output using my own regular tool set. Remember I am NOT a programmer!

The scripting works fine(at least so far, built a dozen or so different roles and systems), but it shouldn’t be that complicated.

For those of you more experienced with Vmware templates I noticed there is the ability to customize a template so that Vmware can set the IP address, host name etc of the guest OS. When I saw this I spent a good two hours trying to get it to work, but no matter what I tried Vmware said my configuration was not supported and it would not let me customize. I have read conflicting reports as to whether or not it is possible on Ubuntu. I am running ESX 4.1 with vCenter 5.0. I think if I was running vCenter 4.x it would work fine, but Ubuntu and other “non tier 1” operating system support for template customization is no longer supported in the 5.0 products. Often times when I see “not supported” especially when something used to work, it means that it might work but don’t ask us for help if it blows up. Maybe coincidence or not but as I said no matter what I did, the customization boxes were greyed out and I could not get vCenter 5.0 to work with Ubuntu.

At the end of the day it doesn’t matter though, I had, what was to me at least a good provisioning process I could adapt from my Kickstart days, a process that works well on both physical as well as virtual machines. Something that leverages the MAC address or the serial number(in the case of physical machines) for unique identification.

With regards on how I used to do things with Cfengine, it was simpler than Chef. Cfengine operates more on trust than Chef. Chef uses public/private keys to authenticate systems, and these keys have to be in the right place in order for a system to get registered. This is good for untrusted networks, like public clouds(ugh). Cfengine works more on trust, where you can (or at least I did) assign network ranges where the IPs are trusted, and a new system could just register itself without any special configuration. The keys would be generated automatically and exchanged between cfengine client and server. I had my cfengine configuration, for the most part dynamic based on the host name of the server. Most of my major Cfengine classes ran a simple grep on the file name that had the host name in it, if the host name matched a particular pattern it was automatically included in the right classes. With Chef life is different, I can’t do that. I have to specifically define which role(s) or recipes a system has up front. Because the system will only download cookbooks that it is specifically configured for using. This isn’t a big deal but is an extra step that I’m not used to having to do.

Sample CFengine class defitition:

ENV_CORPDMZ     = ( ReturnsZero(/bin/egrep -q "^HOSTNAME=corpdmz" /etc/sysconfig/network) )

With Cfengine, prior to implementing the hostname-based approach, adding a new server with Cfengine involved manually editing the master cfengine configuration so that it was aware of the new system that was about to come online. I still had to edit this file on occasion, if there were special configs needed for a server, but for the most part, for like systems, web servers and the like I did not.

Which sort of brings me to the next topic – recruiting talent that can use Chef.  I’ve been managing server systems for about 17 years now, wow has it really been that long.  It’s clear to me after 18 months of chef I lack the knowledge to be able to effectively use the tool (though it hasn’t stopped me from using it at this point), but knowing that, and working with people at my previous company with Chef and seeing the tool present them with a similar level of frustration (if not more), I can see Chef being a real sticking point finding talent that is capable of managing it. My company is actively recruiting senior systems people(well one person) and the candidates that I have spoken with so far, along with candidates I have spoken to in the past, I honestly can think of perhaps one or two people over the years that I know that could handle Chef, and one of them is a full time programmer now (when I met him he was hired to be on my operations team back in 2003).

Well short of the co-worker I have now who does quite a wonderful job in deploying and managing Chef, who wrote the vast majority of Chef stuff at my current company. It’s really well done, but even now that a lot of the hard work was done by him, in a very chef-like way I constantly struggle to add new stuff in, or to change existing things because it’s so dynamic. I see a value for something – where is it coming from? is it from the node? environment ? data bag? attribute? something else?

So I see Chef somewhat like I see Hadoop as far as what skill sets are needed and who can provide them. One of my previous companies was working on migrating towards Hadoop and a big complaint I heard from them about Hadoop (and I have heard it from others since) is finding talent that knows the product. With the likes of Yahoo, Google, and other big companies with very deep pockets and big data aspirations they can afford to pay out the wazoo for Hadoop talent, something small companies just can’t compete with. The number of people qualified to do Hadoop right vs the number of people that can do SQL, well it’s obvious, right.

I see the same with Chef. It’s a powerful tool but it’s just not there yet with regards to usability, I can see it being a very useful tool for the likes of those same kinds of companies who manage very huge fleets of systems and have a very dynamic environment. One such place is HP, whom someone I know is going to work for HP Cloud, because he knows Chef. I assume he is probably pretty good at Chef by now, though the caveat with him is he has a strong Ruby programming background. So it’s no real surprise that he could pick Chef up.

I filed several feature requests and bug reports on the Chef support site about a year ago when I was first interacting with it, though I don’t think much made it through. One thing I’d really like is a good way to do in-line editing of text files. At least at the time the Chef mantra was “find another way to do it”, which a friend of mine says is the same thing Puppet people say. So how do I go about adding an entry to /etc/hosts?

Another thing I’d like to be able to do is bulk file copies from the cookbook and preserve ownership and permissions from the source files(e.g. having a directory tree with various owners/groups/permissions and copying it all at once), I don’t think that is possible still. At the time the Opscode people suggested I use rsync for that.

Another thing I’d like is to be able to host cookbooks internally while using the external service for other things. This is mainly for security purposes I feel more at ease when my core data stays within the confines of my network, on systems under my direct control.

Another thing I’d like to see which I have mentioned to Opscode in one way or another as well is a more abstracted configuration language. I think I called it idiot mode or something. The Ruby syntax they use, while I’m sure it’s great for ruby people really sucks for people like me. I’m fine with a reduced subset of functionality that may be provided by idiot mode, because it’s likely that I won’t use that functionality to begin with(at least not initially). Make the learning curve to actually using the tool less steep.

At one point Opscode was interested in talking to me about a full time position being an advocate for their platform. I just couldn’t go through with it, I just can’t get excited about the platform after all the frustration it has given me. I certainly see the promise and will continue trying,  but I think some fundamental things need to be done to the system in order to make it more usable.

So, in the end, I see Chef as a very powerful tool, a very useful tool for those with the skills that can handle the power it gives you. If I were deploying a new environment today I would certainly NOT use Chef, I would use Cfengine. I don’t want to discourage people from using Chef, it is a good tool, just realize the much higher level of investment you need in order to properly leverage it and try to weigh that against the benefits. For me, the hard things that are made possible by Chef really involve a trivial amount of time. I dare say I have spent FAR more time trying to work with Chef on these hard things (understanding the concepts, code etc) than just flat out doing it by hand the old fashioned way.

You might want to ask – why haven’t I tried Puppet? My answer would be – to-date I haven’t had a reason to. I’ve had a few brief discussions with people who use Puppet over the years(including those who have used Cfengine as well) and asked them why should I use Puppet over Cfengine. For the most part the response was there’s nothing really revolutionary in Puppet so if your happy with Cfengine then stick to it. There are a few things Puppet apparently does better (What they are I don’t remember), but in my talks with people there wasn’t anything — anything that made me want to jump on Puppet. There was things that sounded nice (like Chef has), but not enough return to justify the investment in time to make a migration when, as I mentioned earlier Cfengine does pretty much everything I need it to do.

With Cfengine I could probably train a systems person up on the basics in literally an afternoon. My Cfengine configurations were not complicated. With Chef, well here I am at 18 months and still lost.

3,400 words, I think that’s a record for me for a published blog post. Should get back to sleep now, started writing this at about 3:30AM.

Powered by WordPress