TechOpsGuys.com Diggin' technology every day

July 24, 2013

Opscode is learning

Filed under: Random Thought — Tags: — Nate @ 10:11 am

A few months ago I wrote a long rant on how Opscode has a lot to learn about running operations.

Problems included:

  • status.opscode.com site returning broken HTTP Location headers which broke standards compliant browsers (I had reported this issue to them through multiple channels and support tickets for a good 7 months)
  • Taking scheduled downtime in the middle of a business day

It APPEARS someone read that post, because recently the status site was fixed. It now redirects to opscode.tumblr.com and since that time I have seen no issues with the site.

Also I see they have a scheduled downtime for some of their databases and they are scheduling it for 9PM Pacific time (Opscode is HQ’d in Pacific time), instead of say one in the afternoon. Obviously people in far time zones may not like that as much, but it makes sense to their U.S. customers(which I’d imagine is the bulk of their customer base but I don’t know).

They’ve also gone through some effort to post analysis on outages/performance issues recently as well which is nice.

I have two remaining requests, in case Opscode is reading:

  • Schedule downtime further in advance, the most recent announcement provides about 48 hours of notification, I think it’d be better to provide one week notice. Take a look at what other service providers do for planned outages, my experience says 48 hours is not sufficient notice for scheduled downtime. If it’s an emergency, then obviously a shorter window is acceptable just say it’s an emergency and try to explain why it’s an emergency.
  • Provide actual dates and times for the posts on the status site. Now it just says things like “17 hours ago” or “5 days ago”.
  • Be consistent on the time zone used. Some posts use UTC, others(scheduled events) refer to Pacific time. I don’t care which myself (well honestly I prefer Pacific since I am in that zone, but I can understand using UTC too).
  • Provide pro-active notification of any customer impacting maintenance.  Maybe all of their other customers follow them on twitter, I don’t know. I don’t use twitter myself. So having an email notification option (perhaps opt in by default) to customer addresses registered with the platform for such things would be good to consider.

Now as for Chef, there’s tons of things that could be improved with Chef to make it easier to use.. My latest issue is whenever I pull up the JSON to edit an environment, or a node or whatever the order is not consistent. My co-worker says the data is not ordered, and it has never been consistent for him, for me the issues just started a few weeks/month or two ago. It’s quite annoying. For example if I want to change the role of a node, I would knife node edit <hostname>, then skip to the end of the file, and change the role.  Now sometimes the role is at the top of the file, other times it is at the bottom (it’s never shown up as in the middle).

Pick a way to display the information and display it consistently! How hard is that to do.. It’s not as if I can pipe the JSON to the sort command and have it sort for me. I’ve never liked JSON for that reason — my saying is If it’s not friendly with grep and sed it’s not a friend to me. Or something like that ..  JSON seems to be almost exactly the opposite of what Linux admins want to deal with, it’s almost as bad as binary data, I just hate it. If I don’t have to deal with it (e.g. it’s used in applications and I never see it) – fine go nuts. Same goes for XML. I used to support a big application whose developers were gung ho for XML config files, we literally had several hundred. It would take WEEKS (literally) of configuration auditing(line by line) prior to deployment – and even then there was still problems. It was a massive headache. Using JSON just brings me back to those days.

The syntax is so delicate as well, one extra comma, or missing quote or anything the whole thing blows up(it wouldn’t be so bad if the tool ran a simple syntax check and pointed out what the error was and returned you to the editor to fix it telling you what line it was on, but in this case it just bombs and you lose all your changes — Opscode folks – look at visudo – it does this better..)

The only thing worse(off the top of my head) than the syntax for the chef recipes itself, is the JSON.. or maybe that should be vise versa..

Opscode and Chef are improving I guess is the point, maybe in the next decade or so it will become a product that I would consider usable to mere mortals.

2 Comments

  1. Those were the days, nothing like spending over 120hrs over two weeks to get a release out. Happy Admin day Nate!

    Cheers
    D

    Comment by D — July 26, 2013 @ 3:04 pm

  2. D!!! holy shit long time no see..

    I just had another flashback to one of those deployments where we couldn’t figure out why credit card processing wasn’t working so we just decided to keep the CC simulator open so all credit cards were accepted and people were going to sort the mess out later 🙂

    Those were some crazy times for sure.. learned a lot though.

    Comment by Nate — July 26, 2013 @ 3:40 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress