Opscode Chef folks still have a lot to learn

April 9, 2013

Opscode Chef folks still have a lot to learn

Filed under: Random Thought — Tags: chef — Nate @ 8:01 pm

The theme for this post is: BASIC STUFF. This is not rocket science.

A while back I wrote a post (wow has it really been over a year since that post!) about Chef and my experience with it for what was at the time the past two years, I think I chose a good title for it –

Making the easy stuff hard, and the hard stuff possible

Which still sums up my thoughts today. This post was inspired by something I just read on the Opscode Chef status site.

While I’m on the subject of that damn status site I’ll tell you what – I filed a support ticket with them back in AUGUST 2012 – yes people that is EIGHT MONTHS ago, to report to them that their status site doesn’t #@$@ work.Â Well at least most of the time it doesn’t #@$@! work. You see a lot of times the site returns an invalid Location: header which is relative instead of absolute, and standards based browsers(e.g. Firefox), don’t like that so I get a pretty error message that says the site is down, basically. I can usually get it to load after forcing a refresh 5-25 times.

This is not the kind of message you want to serve from your "status" site

I first came across this site when Opscode was in the midst of a fairly major outage. So naturally I feel it’s important that the web site that hosts your status page work properly. So I filed the ticket, after going back and forth with support, I determined the reason for the browser errors and they said they’d look into it. There wasn’t a lot they claimed they could do because the site was hosted with another provider (Tumbler or something??).

That’s no excuse.

So time passes, and nothing gets done. I mentioned a while back I met some of the senior opscode staff a few years ago, so I directly reached out to the Chief Operating Officer of Opscode (who is a very technical guy himself) to plead with him FIX THE DAMN SITE. If Tumbler is not working then host it elsewhere, it is trivial to setup that sort of site, I mean just look at the content on the site! I was polite in my email to him. He responded and thanked me.

So more time passes, and nothing happens. So in early January I filed another support ticket outlining the reason behind their web site errors and asked that they fix their site. This time I got no reply.

More time passes. I was bored tonight so I decided to hit the site again, guess what? Yeah, they haven’t done squat.

How incompetent are these people? Sorry maybe it is not incompetence but laziness.Â If you can’t be bothered to properly operate the site take the site down.

So anyway I was on their site and noticed this post from last week

Chef 0.9.x Client EOL

Since we stopped supporting Chef 0.9.x June 11, 2012 we decided it is a good time to stop all API support for Chef 0.9.x completely.

Starting tomorrow the api.opscode.com service will no longer support requests from Chef 0.9.x clients.

ref:Â http://www.opscode.com/blog/2012/05/10/chef-0-9-eol/

I mean it doesn’t take a rocket scientist to read that and not think immediately how absurd that is. It’s one thing to say you are going to stop supporting something that is fine. But to say OH WE DECIDED TO STOP SUPPORT, TODAY IS YOUR LAST DAY.

So I go to the page they reference above and it says

On or after June 11th, weâ€™ll deploy a change to Hosted Chef that will disable all access to Hosted Chef for 0.9 clients, so you will want to make sure youâ€™ve upgraded before then.

Last I checked, it is nowhere near June 11th. (now that I think of it maybe they meant last year, they don’t say for sure).Â In any case there was extremely poor notification on this – and how much work does it take to maintain servers running chef 0.9 ? So you can stop development on it, no new patches. Big deal.

This has absolutely no impact on anything I do because we have been on Chef 0.10 forever. But the fact they would even consider doing something like this just shows how poorly run things are over there.

How can they expect customers to take them seriously by doing stuff like this? This is BASIC STUFF. REAL BASIC.

Something else that caught my eye recently as I was doing some stuff in Chef, was their APIs seemed to be down completely. So I hopped on over to the status site after forcing a refresh a dozen or more times to get it to load and saw

Hosted Chef Maintenance Underway

The following systems are unavailable while Hosted Chef is migrated from MySQL to PostgreSQL.

– The Hosted Chef Platform including the API and Management Console

– Opscode Support Ticketing System

– Chef Community Site

Apparently they had announced it on the site one or more days prior(can’t tell for sure now since both posts say posted 1 week ago). But they took the APIs down at 2:00 PM Pacific time! (they are based in Seattle so that’s local time for them). Who in their right mind takes their stuff down in the middle of the afternoon intentionally for a data migration? BASIC STUFF PEOPLE. And their method of notification was poor as well, nobody at my company(we are a paying customer) had any idea it was happening. Fortunately it had only a minor impact on us. I just got lucky when I happened to try to use their API at the exact moment they took it down.

Believe me there are plenty of times when one of our developers comes up to me and says OH #@$ WE NEED THIS CONFIGURATION SETTING IN PRODUCTION NOW! As you might imagine most of that is in Chef, so we rely on that functioning for us at all times. Unscheduled down time is one thing, but this is not excusable. At the very least you could migrate customers in smaller batches(with downtime for any given customer measured in seconds – maybe the really big customers take longer but they can work with those individually to schedule a good time). If they didn’t build the product to do that they should go back to the drawing board.

My co-worker was recently playing around with a slightly newer build of Chef 0.10.x that he thinks we should upgrade to (ours is fairly out of date – primarily because we had some major issues on a newer build at the time). He ran into a bunch of problems including Opscode changing some major things around within a minor release breaking a bunch of stuff. Just more signs of how cavalier they are, typical modern “web 2.0” developer types, that don’t know anything about stability.

Maybe I was lucky I don’t know. But I basically ran the same version of CFengine v2 for nearly 7 years without any breakage (hell I can’t remember encountering a single issue I considered a bug!), across three different companies. I want my configuration system to be stable, fast and simple to manage. Chef is none of those, the more I use it the more I dislike it. I still believe it is a good product and has it’s niche, but it’s got a looooooooong way to go to win over people like me.

As a CFengine employee put it in my last post, Chef views things as configuration as code, and CFengine views them as configuration as documentation. I’m far in the documentation camp. I believe in proper naming conventions whether it is servers, or load balancer addresses, or storage volumes, mount points on servers etc. Also I believe strongly in a good descriptive domain name (have always used the airport codes like most other folks). None of this randomly generated crap(here’s looking at you Amazon). If you are deploying 10,000 servers that are doing the same thing you can still number them in some sort of sane manor. I’ve always been good at documentation, it does take work, and I find more often than not most people are overwhelmed by what I write (you may get the idea with what I have written here) so they often don’t read it — but it is there and I can direct them to it. I take lots of screen shots and do a lot of step by step guides.

On a side note, this configuration as documentation is a big reason why I do not look forward to IPv6.

Chef folks will say go read the code!Â That can be a pretty dangerous thing to say, really, it is. I mean just yesterday or was it the day before, I was trying to figure out how a template on a server was getting a particular value. Was it coming from the cookbook attributes? from the role? from the environment? I looked everywhere and I could not find the values that were being populated — and the values I specified were being ignored. So I passed this task to my co-worker who I have to acknowledge has been a master in Chef, he has written most of what we have, and while I can manage to tweak stuff here and there, the difficult stuff I give him because if I don’t my fist will go through the desk or perhaps the monitor (desk is closer), after a couple hours working with Chef.Â A tool is not supposed to make you get so frustrated.

So I ask him to look into it, and quickly I find HIM FIGHTING CHEF! OH MY THE IRONY. He was digging up and down and trying to set things but Chef was undoing them and he was cursing and everything. I loved it. It’s what I go through all the time.Â After some time he eventually found the issue, the values were being set in another cookbook and they conflicted.

So he worked on it for a bit, and decided to hard code the values for a time while he looked into a better solution. So he deployed this better solution and it had more problems. The most recent thing is for some reason Chef was no longer able to successfully complete a run on certain types of servers(other types were fine though). He’s working on fixing it.

I know he can do it, he’s a really smart guy I just wanted to write about that story – I’m not the only one that has these problems.

Sure I’d love to replace ChefÂ with something else. But it’s not a priority I want to try to shove in my boss’ face (who likes the concept of Chef). I have other fish to fry, and as long as I have this guy doing the dirty work well it’s not as much of a pain for me.

Tracking down conflicting things in CFengine was really simple for me – probably because I wasn’t trying anything too over the top with configuration. Opscode guys liked to say, oh wouldn’t it be great if you could have one configuration stanza that could adapt to ANY SITUATION.

I SAY NO. —-Â IT! IS! NOT! GREAT!

It might be nice in some situations but in many others it just gives me a headache. I like to be able to look at a config and say THAT IS GOING TO SERVER X, EXACTLY HOW IT SITS NOW. Sure I have to duplicate configs and files for different environments and such but really at the end of the day – at all of the companies I have worked at — IT’S NOT A BIG DEAL. In the grand scheme of things. If your configuration is so complex that you need all of this maybe you should step back and consider if you are doing something wrong – does it really need to be that complex? Why?

Oh and don’t get me started on that #$@ damn ruby syntax in the Chef configuration files. Oh you need quotes around a string that is nothing more than a word? You puke with a cryptic stack trace if you don’t have that? Oh you puke with a cryptic stack trace unless these two configuration settings are on their own lines? Come on, this is stupid. I go back to this post on Ruby, how I am reminded of it almost every time I use Chef. I had to support Ruby+Rails apps back from 2006-2008 and it was a bad experience. Left a bad taste in my mouth for Ruby. Chef just keeps on piling on the crap. I’ll fully admit I am very jaded against Ruby (and Chef for that matter). I think for good reason. How’s that saying go? Burn me once shame on you, burn me 500 times shame on me?

With the background that some of these folks have at Opscode it’s absolutely stunning to me the number of times they have shot themselves in the feet over the past few years, on such BASIC THINGS.Â Maybe that’s how things are done at the likes of Amazon I don’t know, never worked there(knew many that did and do though, general consensus is stay away).

In my neck of the woods people take more care in what they do.

I’ll end this again by mentioning I could train someone on CFEngine in an afternoon, Chef – here I am 2 and a half years later and still struggling.

(In case your wondering YES I run Ubuntu 10.04 LTS on my laptop and desktop (guess what – it is about to go EOL too) – I have no plans to change, because it’s stable, and it does the job for me. I run Debian STABLE on my servers because – IT’S STABLE. No testing, no unstable, no experimental. Tried and true. The new UI stuff in the newer Ubuntu is just scary for me, I have no interest in trying it.)

Ok that’s enough for this rant I guess.Â Thanks for listening.

Comments (6)

6 Comments

[…] few months ago I wrote a long rant on how Opscode has a lot to learn about running […]

Pingback by Opscode is learning « TechOpsGuys.com — July 24, 2013 @ 10:11 am
You are complaining because you have to put a quote around a string? This is standard programming, not just Ruby… that said I do wish Chef support YAML instead of just JSON for config data (YAML can be converted to JSON pretty easily once the YAML has been read in).

Yes, new lines signify something new in the code, or you can you a semi-colon instead of a new line and put it all on one. This is generally how things work in a programming language.

It honestly sounds more like you don’t have control over how your Chef environment is setup. Chef doesn’t keep you from making bad choices.

There are tons of places in Chef where you can store various bits of data (databags, encrypted databags, various levels of node json, hardcoded in a cookbook, etc). You just have to know which is the right place to store that information for a given problem (hint, 9/10 it should be in an attribute), and then in your cookbooks you should endeavor to always solve that problem the same way (heck, even write a library to do it the same way every time)

I am not the biggest Chef fan. The reasons you list here are not reasons to dislike Chef but rather should raise questions “How are we using Chef wrong?”.

Comment by cpuguy83 — August 6, 2013 @ 1:22 pm
Hey there!

Yes I am complaining about putting a quote around a very basic string. e.g. standard alphanumeric characters. Not something that has special characters that may need escaping. Think along the lines of setting a variable in a shell, if it’s just one word there is no quotes required. I also think back to all my years with CFengine, and it was very, very rare I had to enclose strings in quotes, unless of course there was special characters. This is config I am talking about here not code. If the “config” is code then to me that is a failure right there, full stop.

The point I try to make is the whole ‘programming’ thing is bad for configuration. It’s too complicated for basic things. I wrote an earlier post “Making the easy stuff hard and making the hard stuff possible”.

http://www.techopsguys.com/2012/02/03/making-the-easy-stuff-hard-the-hard-stuff-possible/

In the post above someone from CFengine commented and made a good point, Chef views configuration as code, and CFengine views configuration as documentation. I’ve always been a staunch supporter of good documentation, good naming conventions etc. I see a very big line in between scripting code and programming code that myself I will not cross. My interests go far beyond just system stuff, not enough room in my brain for programming as well as everything else that I spend time on (the bulk of this blog is devoted towards storage topics).

I think that (post above) suits Chef quite well. There are tons of things that Chef makes possible that I could not do in CFengine. By contrast there are tons of things I could do in CFengine very easily that are very difficult in chef. Those things that are made possible are not things I find especially useful at smaller scale. I’m sure if you have thousands to tens of thousands of nodes in a highly fluid configuration it makes a lot of sense but most folks aren’t running like that. Certainly none of the web companies I have worked at over the past decade were anything like that.

Using chef wasn’t my decision, and I have mentioned in the past(perhaps not in this post I forget), that because I don’t have to interface with it as much (my co-worker does most of that and is pretty adept at it, he writes stuff in Chef I can stare at for an hour and not make sense of but it works..). So I haven’t tried to take on the battle of replacing Chef with something else.

How are we using Chef wrong? Is a good question, I think the entire premise of using chef at anything other than a massive scale is a massive mistake(at this point in Chef’s life at least). I have absolutely not a shadow of a doubt if the operations team left my company (3 people including the manager), whoever came in would have to rip out all the chef stuff and replace it with something simpler to manage. The number of folks that I have known over the past decade who are in my line of work that can work effectively with Chef I can probably count on one hand (myself is not included in that list I freely admit despite nearly 20 years of managing systems – I can make Chef do some things, but most of it is just too frustrating to use so I end up bypassing the “proper/chef” way of doing things). Even the examples in the public cookbook repositories are too complex to effectively use unless your a programmer (for the most part — last I checked anyway which was about a year ago).

I have been using Chef for three years now. I recall meeting with the founders of Opscode 3 years ago and how they said to me with a straight face — you know apache configs, you know sendmail, bind etc configs. Chef is no different you’ll get used to it in no time. I didn’t buy it then, and time has shown this to be the case over and over and over again. I begged them back then to make an “idiot” mode for the Chef config. They expressed no interest in such a thing. Ironically enough at one point the company expressed interest in bringing me on as an evangelist, I don’t know where they got the idea from but I politely declined.

I don’t try to say Chef is a bad tool – I just think it is the wrong tool in most circumstances. The capabilities it provides are not critical to most organizations. I still regularly encounter companies that maintain everything by hand or w/various scp/rsync scripts.

I would not wish Chef upon someone for system management any more than I would wish HP Openview on someone for system monitoring. Nagios is simpler. More limited for sure, but I really have never needed the power of something like Openview — nor have I needed the power of something like Chef.

Why use it then? Again just goes back to the decision was made before I started, it’s not a critical issue since I don’t have to deal with it *too* often. But I do feel it’s useful to get this sort of information out there so people can see another perspective of someone who uses Chef.

thanks for the comment & thanks for reading!

Comment by Nate — August 6, 2013 @ 2:50 pm
I guess I come from more of a programming mindset, so Chef makes tons of sense to me. Though I do like stuff that SaltStack does with just defining a YAML file with what you want (e.g. Apache installed with a specific config file), etc…. haven’t looked much into CFEngine, but it seems they follow the same kind of pattern. Instead of code you are writing documents that describe your infrastructure (though Salt does let you define with code if needed).

The reason I use Chef is because I am 1 guy doing software dev (mostly Ruby) and IT operations. I don’t have a massive scale at all, just my IT infrastructure.

I totally agree about Opscode. They seem to be extremely overwhelmed and really need to just add non-Opscode committers to their Github repos so these issues can be address quickly, and free up the Opscode team to work on their hosted infrastructure.

Comment by cpuguy83 — August 7, 2013 @ 6:55 am
That very well could be the case (being overwhelmed), it took em 8+ months to fix their status site after all. One more point I didn’t touch on which I think is important, is the concept of maintaining cook books and even creating cook books. One of the advantages Chef folks touted is ops folks can have developers create and perhaps even maintain cookbooks for certain things. This sounds like a nice idea in theory. The downside to this of course is standardization and different ways of doing things (for the ops teams I am on I have always felt we knew the “right” way of doing things and in general the developers don’t have the operational experience to think about things right in that way).

So for example, say your deploying some new application that needs apache customization, some php modules, memcache, perhaps some custom packages or whatever. Doing this in one cookbook would probably be a bad thing. You’d have to be touching multiple cook books(assuming this is all new stuff) and multiple recipes, who knows maybe you’ll need to adjust some existing logic in another recipe that happens to be included by default so it doesn’t conflict with some new thing your adding.

Also the whole approach to installing software is often different. As an ops person I have strict standards on how software is installed – it must be in a operating system package. Since you do ruby software development you must be familiar with gems. Back(2007-2008) when I supported environments that used Ruby (and rails, ugh), the developers would constantly come to me and say “Oh, now we need this gem..” (which BTW has a half dozen dependencies). They would advocate just “gem install whatever”. I think even chef has this built in so you can call gem install directly from chef. Terrible..terrible idea.. fine for hacks, or for really small scale stuff.

But as an ops person, two of the things you want is reproducibility, and as much control as possible. Control comes from hosting whatever software dependencies you need in your infrastructure so you are not blocked by what people in the outside world do(e.g. remove a version, site is down, they get hacked, whatever). Reproducibility comes from(in one case) sticking to a specific version and keeping it in an OS package. Back then, there wasn’t a whole lot of options. So I took the time to manually build gems from source on a set of test systems(at one point I was supporting 4 different operating system flavors each needing it’s own unique packages). I had basic commands that ran pre/post-install which would give me a list of all files that were changed on the system. I then built tarballs from these changed files and packaged them as a tar. Then I took these tarballs and made RPMs out of them (I use a Debian package named Alien to do this, drastically simplifies the process vs making a formal package.. end result is quite similar).

Looking at my CFengine configs there was ~250 packages that I built for these 5 operating system flavors(we had a hard time getting rid of legacy stuff back then).

I encountered something similar again recently this time with php. The developer said, here run this random shell script from this website I found and then it will download and install the dependencies. This thing (which is “composer.phar” I’m not sure if that is in ruby land or if it is PHP specific) has a version string that is nothing more than a MD5 checksum (literally it says Composer version 06dff68ce78cd6a2cef1da61ec749c47c73d5769). If the developer was writing this cookbook they would of done a bad thing had they gone this route, it may not of bit them right away but quite likely over time there would of been a problem as the versions would not be consistent). So once again I built an OS package for them and installed it in a central location on the server(normally they would have it within their source code tree after checkout). I gave them a set of basic links to create(post code checkout) so their software would work with this composer in this other location and things worked out.

Not everyone goes to the lengths I do to try to do things clean(in fact I haven’t worked with anyone else that I can think of that goes to such lengths), it can in some cases take quite a bit of work.

But there is a couple of examples why beyond the most basic of cookbooks I don’t believe it is wise to hand off development of such things to developers(I’ve really only worked with about four developers out of all the ones I have known that *really* think about how their stuff will run in production (and operated by ops) when they build their stuff).

I’ve been in what you could say is a “DevOps” (ugh I hate that term) role – meaning working closely with the developers for 10 years now(literally – my first role in that was May 2003) – most of them don’t know, or don’t care, or in some cases think they know and they actually make something far worse because of it. It’s not a huge deal to me – I don’t expect developers to know how to do real ops inside and out that’s not their job. I don’t go around telling them how to develop their app unless the flaw is so huge that it causes major issues for us.

I appreciate your response – and I’m glad that I was able to explain myself in a way that was coherent enough that it made some amount of sense – we come at the problem from opposite directions. Both directions are valid for various use cases.

Comment by Nate — August 7, 2013 @ 7:53 am
On your note about gems, you can run your own gem server, or even point bundler (apt for rubygems, …basically), at git repos, file paths, lock in versions, different versions for different apps on the same server, etc. So this becomes extremely easy, no need to build these manually anymore (unless you need a c-extension… should be using jruby in prod anyway).
I wouldn’t use the chef rubygems resource except to install the low-level stuff (e.g., pretty much just the bundler gem)

Unfortunately there is no easy button for this stuff… big hopes for Docker.io to help the situation.

Comment by cpuguy83 — August 7, 2013 @ 8:19 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

TechOpsGuys.com Diggin' technology every day

April 9, 2013