TechOpsGuys.com Diggin' technology every day

May 7, 2013

Internet Hippies at it again

Filed under: Networking — Tags: , — Nate @ 8:50 am

I was just reading a discussion on slashdot about IPv6 again.  So apparently BT has announced plans to deploy carrier grade NAT (CGN) for some of their low tier customers. Which is of course just a larger scope higher scale deployment of NAT.

I knew how the conversation would go but I found it interesting regardless. The die hard IPv6 folks came out crying fowl

Killing IPv4 is the only solution. This is a stopgap measure like carpooling and congestion charges that don’t actually fix the original problem of a diminishing resource.

(disclaimer – I walk to work)

[..]how on earth can you make IPv6 a premium option if you don’t make IPv4 unbearably broken and inconvenient for users?

These same folks often cry out about how NAT will break the internet, because they can’t do peer to peer stuff (as easily in some cases, others may not be possible at all). At the same time they advocate a solution (IPv6) that will break FAR more things than NAT could ever hope to break. At least an order of magnitude more.

They feel the only way to make real progress is essentially to tax the usage of IPv4 high enough that people are discouraged from using it, thus somehow bringing immediate global change to the internet and get everyone to switch to IPv6.  Which brings me to my next somewhat related topic.

Maybe they are right – I don’t know. I’m in no hurry to get to IPv6 myself.

Stop! Tangent time.

The environmentalists are of course doing the same thing — not long ago a law took effect here in the county I am at where they have banned plastic bags at grocery stores and stuff. You can still get paper pags at a cost of $0.10/bag but no more plastic.  I was having a brief discussion on this with a friend last week and he was questioning the stores for charging folks he didn’t know it was the law that was mandating it. I have absolutely, not a shred of doubt that if the environmentalists could have their way they would of banned all disposable bags. That is their goal – the tax is only $0.10 now but it will go up in the future they will push it as high as they can for the same reason, to discourage use. Obviously customers were already paying for plastic and paper bags before – the cost was built into the margins of the products they buy – just like they were paying for the electricity to keep the dairy products cool.

In Washington state I believe there was one or two places that actually tried to ban ALL disposable bags. I don’t remember if the laws passed or not, but I remember thinking that I wanted to just go to one or more of their grocery stores, load up a cart full of stuff, go to checkout. Then they tell me I have to buy bags and I would just walk out. I wanted to soo badly though I am more polite than that so I didn’t.

Safeway gave me 3 “free” reusable bags the first time I was there after the law passed and I bought one more since. I am worried about contamination more than anything else, there have been several reports of the bags being contaminated mainly by meat and stuff because people don’t clean them regularly.

I’ll admit (as much as it pains me) that there is one good reason to use these bags over the disposable ones that didn’t really hit me until I went home that first night – they are a lot stronger, so they hold more. I was able to get a full night’s shopping in 3 bags, and those were easier to carry than the ~8 or so that would otherwise be used with disposable.

I think it’s terrible to have the tax on paper since that is relatively much more green than plastic. I read an article one time that talked about paper vs plastic and the various regions in our country at least – what is more green. The answer was it varied, on the coast lines like where I live paper is more green. In the middle parts of the country plastic was more green. I forgot the reasons given but they made sense at the time. I haven’t been able to dig up the article I have no idea where I read it.

I remember living in China almost 25 years ago now, and noticing how everyone was using reusable bags, similar to what we have now but they were, from what I remember, more like knitted plastic.  They used them I believe mainly because they didn’t have an alternative – they didn’t have the machines and stuff to cheaply mass produce those bags.  I believe I remember reading at some point the usage of disposable bags really went up in the following years before reversing course again towards the reusables.

Myself I have recycled my plastic bags (at Safeway) for a long time now, as long as I can remember.  Sad to see them go.

I’ll end with a quote from Cartman (probably not a direct quote I tried checking)

Hippies piss me off

(Hey Hippies – go ban bottled water now too while your at it – I go through about 60 bottles a week myself, I’ve been stocking up recently because it was cheap(er than normal) I think I have more than 200 bottles in my apartment now – I like the taste of Arrowhead water). I don’t drink much soda at home these days basically replaced it with bottled water so I think cost wise it’s an improvement 🙂 )

(same goes for those die hard IPv6 folks – you can go ahead, slap CGNAT on my internet connection at home – I don’t care. I already have CGNAT on my cell phone(it has a 10.x IP) and when it is in hotspot mode I notice nothing is broken. The only thing I do that is peer to peer is skype(for work, I don’t use it otherwise), everything else is pure client-server).  I have a server(a real server that this blog is hosted on) in a data center (a real data center not my basement) with 100Mbps and unlimited bandwidth to do things that I can’t do on my home connection (mainly due to bandwidth constraints and dynamic IP).

I proclaim IPv6 die hards as internet hippies!

My home network has a site to site VPN with the data center, and if I need to access my home network remotely, I just VPN to the data center and access it that way. If you don’t want to host a real server(it’s not cheap), there are other cheaper solutions like VPS or whatever that are available for pennies a day.

April 30, 2013

Openflow inventor admits SDN is hype

Filed under: Networking — Tags: — Nate @ 8:25 am

The whole SDN thing has bugged me for a long time now. The amount of hype behind it has driven me mad. I have asked folks to explain to me what SDN is, and I have never really gotten a good answer. I have a decent background in networking but it’s never been my full time responsibility (nor do I wish it to be).

I am happy to report that it seems I am not crazy. Yesterday I came across an article on slashdot from the inventor of Openflow, the same guy that sold his little networking startup Nicira for a cool $1.2B (and people thought HP paid too much for 3PAR).

He admits he doesn’t know what SDN is either anymore.

Reading that made me feel better 🙂

On top of that our friends over at El Reg recently described SDN as an industry “hype gasm”. That too was refreshing to see. Finally more people are starting to cut through the hype.

I’ve always felt that the whole SDN thing that is going on is entirely too narrow in vision – seemingly to be focused almost entirely on switching & routing.  Most of the interesting stuff happens higher up in the advanced layer 7 load balancing where you have more insight as to what is actually traversing the wire from an application perspective.

I have no doubt that the concepts behind SDN will be/are very useful for massive scale service providers and such (though somehow they managed without it as it is trying to be defined now anyway). I don’t see it as very useful for most of the rest of organizations, unlike say virtualized storage.

I cringed badly when I first saw the term software defined storage last year, it just makes me shudder as to the amount of hype people might try to pump into that. HP seems to be using this term more and more often. I believe others are too, though I can’t bring myself to google the term.

April 18, 2013

Giant Explosion of companies moving off of AWS

Filed under: Datacenter — Tags: — Nate @ 10:24 am

Maybe I’m not alone in this world after all. I have ranted and raved about how terrible Amazon’s cloud is for years now. I used it at two different companies for around two years(been almost a year since I last used it at this point) and it was, by far the worst experience in my professional career. I could go on for an entire afternoon listing all the problems and lack of abilities, features etc that I had experienced — not to mention the costs.

But anyway, onto the article which I found on slashdot. It made my day, well the day is young still so perhaps something better will come along.

A week ago I quoted Piston’s CTO saying that there was a “giant explosion” of companies moving off of Amazon Web Services (AWS). At the time, I noted that he had good reason to say that, since he started a company that builds software used by companies to build private clouds.

[..]

Enterprises that are moving to private clouds tend to be those that had developers start using the cloud without permission.

[..]

Other businesses are “trying to get out,” he said. AWS has made its compute services very sticky by making them difficult for users to remove workloads like databases to run them elsewhere.

Myself I know of several folks who have stuff in Amazon, and for the most part the complaints are similar to mine. Very few that I have heard of are satisfied. The people that seem to be satisfied (in my experience) are those that don’t see the full picture, or don’t care. They may be satisfied because they don’t want to worry about infrastructure no matter the cost, or they want to be able to point the finger at an external service provider when stuff breaks (Amazon’s support is widely regarded as worthless).

“We have thousands of AWS customers, and we have not found anyone who is happy with their tech support,” says Laderman.

I was at a company paying six figures a month in fees and they refused to give us any worthwhile support. Any company in the enterprise space would of been more than happy to permanently station an employee on site to make sure the customer is happy for those kind of payments. Literally everyone who used the Amazon stuff in the company hated it, and the company wanted Amazon to come help show us the way — and they said no.

I am absolutely convinced (as I’ve seen it first and second hand) in many cases the investors in the startups have conflicts of interest and want their startups to use Amazon because the investors benefit from them growing as well. Amazon then uses this marketing stuff to pimp to other customers. This of course happens all over the place with other companies, but there are a lot of folks that are invested in Amazon relatively speaking compared to most other companies.

There’s no need for me to go into specifics as to why Amazon sucks here – for those you can see some of the past posts. This is just a quickie.

Anyway, that’s it for now.. I saw the article and it made me smile.

April 10, 2013

HP Project Moonshot micro servers

Filed under: Datacenter — Tags: , , , , — Nate @ 12:11 am

HP made a little bit of headlines recently when they officially unveiled their first set of ultra dense micro servers, under the product name Moonshot. Originally speculated as likely being an ARM-platform, it seems HP has surprised many in making this first round of products Intel Atom based.

Picture of HP Moonshot chassis with 45 servers

They are calling it the world’s first software defined server. Ugh. I can’t tell you how sick I feel whenever I hear the term software defined <insert anything here>.

In any case I think AMD might take issue with that, with their SeaMicro unit which they acquired a while back. I was talking with them as far back as 2009 I believe and they had their high density 10U virtualized Intel Atom-based platform(I have never used Seamicro though knew a couple folks that worked there). Complete with integrated switching, load balancing and virtualized storage(the latter two HP is lacking).

Unlike legacy servers, in which a disk is unalterably bound to a CPU, the SeaMicro storage architecture is far more flexible, allowing for much more efficient disk use. Any disk can mount any CPU; in fact, SeaMicro allows disks to be carved into slices called virtual disks. A virtual disk can be as large as a physical disk or it can be a slice of a physical disk. A single physical disk can be partitioned into multiple virtual disks, and each virtual disk can be allocated to a different CPU. Conversely, a single virtual disk can be shared across multiple CPUs in read-only mode, providing a large shared data cache. Sharing of a virtual disk enables users to store or update common data, such as operating systems, application software, and data cache, once for an entire system

Really the technology that SeaMicro has puts the Moonshot Atom systems to shame. SeaMicro has the advantage that this is their 2nd or 3rd (or perhaps more) generation product. Moonshot is on it’s first gen.

Picture of Seamicro chassis with 256 servers

Moonshot provides 45 hot pluggable single socket dual core Atom processors, each with 8GB of memory and a single local disk in a 4.5U package.

SeaMicro provides up to 256 sockets of dual core Atom processors, each with 4GB of memory and virtualized storage. Or you can opt for up to 64 sockets of either quad core Intel Xeon or eight core AMD Opteron, with up to 64GB/system (32GB max for Xeon). All of this in a 10U package.

Let’s expand a bit more – Moonshot can get 450 servers(900 cores) and 3.6TB of memory in a 47U rack. SeaMicro can get 1,024 servers (2,048 cores) and 4TB of memory in a 47U rack. If that is not enough memory you could switch to Xeon or Opteron with similar power profile, at the high end 2,048 Opteron(AMD uses a custom Opteron 4300 chip in the Seamicro system – a chip not available for any other purpose) cores with 16TB of memory.  Or maybe you mix/match .. There is also fewer systems to manage – HP having 10 systems, and Sea Micro having 4 per rack. I harped on HP’s SL-series a while back for similar reasons.

Seamicro also has dedicated external storage which I believe extends upon the virtualization layer within the chassis but am not certain.

All in all it appears Seamicro has been years ahead of Moonshot before Moonshot ever hit the market. Maybe HP should of scrapped Moonshot and taken out Seamicro when they had the chance.

At the end of the day I don’t see anything to get excited about with Moonshot – unless perhaps it’s really cheap (relative to Seamicro anyway). The micro server concept is somewhat risky in my opinion. I mean if you really got your workload nailed down to something specific and you can fit it into one of these designs then great. Obviously the flexibility of such micro servers is very limited. Seamicro of course wins here too, given that an 8 core Opteron with 64GB of memory is quite flexible compared to the tiny Atom with tiny memory.

I have seen time and time again people get excited about this and say oh how they can get so many more servers per watt vs the higher end chips. Most of the time they forget to realize how few workloads are CPU bound, and simply slapping a hypervisor on top of a system with a boatload of memory can get you significantly more servers per watt than a micro server could hope to achieve. HOWEVER, if your workload can effectively exploit the micro servers, drive utilization up etc, then it can be a real good solution — in my experience those sorts of workloads are the exception rather than the rule, I’ll put it that way.

It seems that HP is still evaluating whether or not to deploy ARM processors in Moonshot – in the end I think they will – but won’t have a lot of success – the market is too niche. You really have to go to fairly extreme lengths to have a true need for something specialized like ARM. The complexities in software compatibility are not trivial.

I think HP will not have an easy time competing in this space. The hyper scale folks like Rackspace, Facebook, Google, Microsoft etc all seem to be doing their own thing, and are unlikely to purchase much from HP. At the same time there of course is Seamicro, amongst other competitors (Dell DCS etc) who are making similar systems. I really don’t see anything that makes Moonshot stand out, at least not at this point. Maybe I am missing something.

April 9, 2013

Influx of SPAM – batton down the hatches!

Filed under: Random Thought — Tags: — Nate @ 9:09 pm

I don’t know what is going on but for some reason this blog has been getting a lot more SPAM comments recently. I mean normally Akismet takes care of everything and MAYBE one gets through a MONTH, eleven have gotten through today alone (update: now 14)

I haven’t been keeping track, but that little counter on the right side is up to almost 75,300 now — the last time I recall noticing it I thought it was below 30,000..

The Akismet plugin says it is operational and the API key I am using is valid,and all servers are reachable.

I wonder what is going on, maybe today is just my lucky day.

Opscode Chef folks still have a lot to learn

Filed under: Random Thought — Tags: — Nate @ 8:01 pm

The theme for this post is: BASIC STUFF. This is not rocket science.

A while back I wrote a post (wow has it really been over a year since that post!) about Chef and my experience with it for what was at the time the past two years, I think I chose a good title for it –

Making the easy stuff hard, and the hard stuff possible

Which still sums up my thoughts today. This post was inspired by something I just read on the Opscode Chef status site.

While I’m on the subject of that damn status site I’ll tell you what – I filed a support ticket with them back in AUGUST 2012 – yes people that is EIGHT MONTHS ago, to report to them that their status site doesn’t #@$@ work.  Well at least most of the time it doesn’t #@$@! work. You see a lot of times the site returns an invalid Location: header which is relative instead of absolute, and standards based browsers(e.g. Firefox), don’t like that so I get a pretty error message that says the site is down, basically. I can usually get it to load after forcing a refresh 5-25 times.

This is not the kind of message you want to serve from your "status" site

I first came across this site when Opscode was in the midst of a fairly major outage. So naturally I feel it’s important that the web site that hosts your status page work properly. So I filed the ticket, after going back and forth with support, I determined the reason for the browser errors and they said they’d look into it. There wasn’t a lot they claimed they could do because the site was hosted with another provider (Tumbler or something??).

That’s no excuse.

So time passes, and nothing gets done. I mentioned a while back I met some of the senior opscode staff a few years ago, so I directly reached out to the Chief Operating Officer of Opscode (who is a very technical guy himself) to plead with him FIX THE DAMN SITE. If Tumbler is not working then host it elsewhere, it is trivial to setup that sort of site, I mean just look at the content on the site! I was polite in my email to him. He responded and thanked me.

So more time passes, and nothing happens. So in early January I filed another support ticket outlining the reason behind their web site errors and asked that they fix their site. This time I got no reply.

More time passes. I was bored tonight so I decided to hit the site again, guess what? Yeah, they haven’t done squat.

How incompetent are these people? Sorry maybe it is not incompetence but laziness.  If you can’t be bothered to properly operate the site take the site down.

So anyway I was on their site and noticed this post from last week

Chef 0.9.x Client EOL

Since we stopped supporting Chef 0.9.x June 11, 2012 we decided it is a good time to stop all API support for Chef 0.9.x completely.

Starting tomorrow the api.opscode.com service will no longer support requests from Chef 0.9.x clients.

ref: http://www.opscode.com/blog/2012/05/10/chef-0-9-eol/

I mean it doesn’t take a rocket scientist to read that and not think immediately how absurd that is. It’s one thing to say you are going to stop supporting something that is fine. But to say OH WE DECIDED TO STOP SUPPORT, TODAY IS YOUR LAST DAY.

So I go to the page they reference above and it says

On or after June 11th, we’ll deploy a change to Hosted Chef that will disable all access to Hosted Chef for 0.9 clients, so you will want to make sure you’ve upgraded before then.

Last I checked, it is nowhere near June 11th. (now that I think of it maybe they meant last year, they don’t say for sure).  In any case there was extremely poor notification on this – and how much work does it take to maintain servers running chef 0.9 ? So you can stop development on it, no new patches. Big deal.

This has absolutely no impact on anything I do because we have been on Chef 0.10 forever. But the fact they would even consider doing something like this just shows how poorly run things are over there.

How can they expect customers to take them seriously by doing stuff like this? This is BASIC STUFF. REAL BASIC.

Something else that caught my eye recently as I was doing some stuff in Chef, was their APIs seemed to be down completely. So I hopped on over to the status site after forcing a refresh a dozen or more times to get it to load and saw

Hosted Chef Maintenance Underway

The following systems are unavailable while Hosted Chef is migrated from MySQL to PostgreSQL.

– The Hosted Chef Platform including the API and Management Console

– Opscode Support Ticketing System

– Chef Community Site

Apparently they had announced it on the site one or more days prior(can’t tell for sure now since both posts say posted 1 week ago). But they took the APIs down at 2:00 PM Pacific time! (they are based in Seattle so that’s local time for them). Who in their right mind takes their stuff down in the middle of the afternoon intentionally for a data migration? BASIC STUFF PEOPLE. And their method of notification was poor as well, nobody at my company(we are a paying customer) had any idea it was happening. Fortunately it had only a minor impact on us. I just got lucky when I happened to try to use their API at the exact moment they took it down.

Believe me there are plenty of times when one of our developers comes up to me and says OH #@$ WE NEED THIS CONFIGURATION SETTING IN PRODUCTION NOW! As you might imagine most of that is in Chef, so we rely on that functioning for us at all times. Unscheduled down time is one thing, but this is not excusable. At the very least you could migrate customers in smaller batches(with downtime for any given customer measured in seconds – maybe the really big customers take longer but they can work with those individually to schedule a good time). If they didn’t build the product to do that they should go back to the drawing board.

My co-worker was recently playing around with a slightly newer build of Chef 0.10.x that he thinks we should upgrade to (ours is fairly out of date – primarily because we had some major issues on a newer build at the time). He ran into a bunch of problems including Opscode changing some major things around within a minor release breaking a bunch of stuff. Just more signs of how cavalier they are, typical modern “web 2.0” developer types, that don’t know anything about stability.

Maybe I was lucky I don’t know. But I basically ran the same version of CFengine v2 for nearly 7 years without any breakage (hell I can’t remember encountering a single issue I considered a bug!), across three different companies. I want my configuration system to be stable, fast and simple to manage. Chef is none of those, the more I use it the more I dislike it. I still believe it is a good product and has it’s niche, but it’s got a looooooooong way to go to win over people like me.

As a CFengine employee put it in my last post, Chef views things as configuration as code, and CFengine views them as configuration as documentation. I’m far in the documentation camp. I believe in proper naming conventions whether it is servers, or load balancer addresses, or storage volumes, mount points on servers etc. Also I believe strongly in a good descriptive domain name (have always used the airport codes like most other folks). None of this randomly generated crap(here’s looking at you Amazon). If you are deploying 10,000 servers that are doing the same thing you can still number them in some sort of sane manor. I’ve always been good at documentation, it does take work, and I find more often than not most people are overwhelmed by what I write (you may get the idea with what I have written here) so they often don’t read it — but it is there and I can direct them to it. I take lots of screen shots and do a lot of step by step guides.

On a side note, this configuration as documentation is a big reason why I do not look forward to IPv6.

Chef folks will say go read the code!  That can be a pretty dangerous thing to say, really, it is. I mean just yesterday or was it the day before, I was trying to figure out how a template on a server was getting a particular value. Was it coming from the cookbook attributes? from the role? from the environment? I looked everywhere and I could not find the values that were being populated — and the values I specified were being ignored. So I passed this task to my co-worker who I have to acknowledge has been a master in Chef, he has written most of what we have, and while I can manage to tweak stuff here and there, the difficult stuff I give him because if I don’t my fist will go through the desk or perhaps the monitor (desk is closer), after a couple hours working with Chef.  A tool is not supposed to make you get so frustrated.

So I ask him to look into it, and quickly I find HIM FIGHTING CHEF! OH MY THE IRONY. He was digging up and down and trying to set things but Chef was undoing them and he was cursing and everything. I loved it. It’s what I go through all the time.  After some time he eventually found the issue, the values were being set in another cookbook and they conflicted.

So he worked on it for a bit, and decided to hard code the values for a time while he looked into a better solution. So he deployed this better solution and it had more problems. The most recent thing is for some reason Chef was no longer able to successfully complete a run on certain types of servers(other types were fine though). He’s working on fixing it.

I know he can do it, he’s a really smart guy I just wanted to write about that story – I’m not the only one that has these problems.

Sure I’d love to replace Chef  with something else. But it’s not a priority I want to try to shove in my boss’ face (who likes the concept of Chef). I have other fish to fry, and as long as I have this guy doing the dirty work well it’s not as much of a pain for me.

Tracking down conflicting things in CFengine was really simple for me – probably because I wasn’t trying anything too over the top with configuration. Opscode guys liked to say, oh wouldn’t it be great if you could have one configuration stanza that could adapt to ANY SITUATION.

I SAY NO. —-  IT! IS! NOT! GREAT!

It might be nice in some situations but in many others it just gives me a headache. I like to be able to look at a config and say THAT IS GOING TO SERVER X, EXACTLY HOW IT SITS NOW. Sure I have to duplicate configs and files for different environments and such but really at the end of the day – at all of the companies I have worked at — IT’S NOT A BIG DEAL. In the grand scheme of things. If your configuration is so complex that you need all of this maybe you should step back and consider if you are doing something wrong – does it really need to be that complex? Why?

Oh and don’t get me started on that #$@ damn ruby syntax in the Chef configuration files. Oh you need quotes around a string that is nothing more than a word? You puke with a cryptic stack trace if you don’t have that? Oh you puke with a cryptic stack trace unless these two configuration settings are on their own lines? Come on, this is stupid. I go back to this post on Ruby, how I am reminded of it almost every time I use Chef. I had to support Ruby+Rails apps back from 2006-2008 and it was a bad experience. Left a bad taste in my mouth for Ruby. Chef just keeps on piling on the crap. I’ll fully admit I am very jaded against Ruby (and Chef for that matter). I think for good reason. How’s that saying go? Burn me once shame on you, burn me 500 times shame on me?

With the background that some of these folks have at Opscode it’s absolutely stunning to me the number of times they have shot themselves in the feet over the past few years, on such BASIC THINGS.  Maybe that’s how things are done at the likes of Amazon I don’t know, never worked there(knew many that did and do though, general consensus is stay away).

In my neck of the woods people take more care in what they do.

I’ll end this again by mentioning I could train someone on CFEngine in an afternoon, Chef – here I am 2 and a half years later and still struggling.

(In case your wondering YES I run Ubuntu 10.04 LTS on my laptop and desktop (guess what – it is about to go EOL too) – I have no plans to change, because it’s stable, and it does the job for me. I run Debian STABLE on my servers because – IT’S STABLE. No testing, no unstable, no experimental. Tried and true. The new UI stuff in the newer Ubuntu is just scary for me, I have no interest in trying it.)

Ok that’s enough for this rant I guess.  Thanks for listening.

April 7, 2013

Upgraded to 64-bit Debian

Filed under: Random Thought — Nate @ 2:31 pm

Just a quick note — I am in the midst of upgrading this server from 32-bit Debian to 64-bit. I really didn’t think I needed 64-bit, but as time as gone on the processes on this system seem to have outgrown the 32-bit kernel. I recently doubled the memory size on the host server to 16GB, so there’s plenty of ram to go around for the moment.

If you see anything around here that appears more broken than normal let me know, thanks.

April 1, 2013

Public cloud will grow when I die

Filed under: Random Thought — Tags: — Nate @ 8:27 am

El Reg a few days ago posted some commentary from the CTO of Rackspace

Major adoption of public cloud computing services by large companies won’t happen until the current crop of IT workers are replaced by kiddies who grew up with Facebook, Instagram, and other cloud-centric services

Which is true to some extent — though I still feel the bigger issues with the public cloud are cost and features. If a public cloud company can offer comparable capabilities vs operating in house at a comparable (or less – given the cloud company should have bigger economies of scale) cost then I can see cloud really taking off.

As I’ve harped on again and again – one of the key cost things would be billing based on utilization, not based on what is provisioned (you could have a minimum commit rate as is often negotiated in deals for internet bandwidth). But if you provision a 2 CPU VM with 6GB of memory and 90% of the time it sits at 1% cpu usage and 1GB of memory then you must not be charged the same if you were consuming 100% cpu and 95% memory.

Some folks think it is a good idea to host non production stuff in a cloud and host production in house — to me non production is where even more of the value comes from. Most of the time the non production environments(at least the companies I have worked at in the past decade) operate at VERY low utilization rates 99.9% of the time. So they can be over subscribed even more. At my organization for example we started out with basically two or three non production environments, now we are up to 10, the costs to support the extra 7-8 were minimal(relative to hosting them in a public cloud). For the databases I setup a snapshot system for these environments so not only can we refresh the data w/minimal downtime to the environments(about 2 minutes/ea vs/ full day/ea) but each environment typically consumes less than 10% of the disk space that would normally be consumed had the environment had a full independent copy of the data.

Another thing is give the customers the benefit of things like thin provisioning, data compression, deduplication. Some work loads behave better than others, present this utilization data to the customer and include it in the billing. Myself I like to provision multi TB volumes for almost everything, and I use LVM to restrict their growth. So if the time comes and some volume needs to get bigger I just lvextend the volume and resize the file system(both are online operations), don’t have to touch the hypervsior, the storage, or anything. If some application may need a massive amount of storage (have not had one that did yet that used storage through the hypervisor) — as in many many TB — then I could allocate many volumes at once to the system, and grow them the same way over time. Perhaps a VM would have 2 to 10TB of space provisioned to it but may only use a few hundred gigs for the next year or so — nothing is wasted, because the excess is not used. There’s no harm in doing that.  Though I have not seen or heard of a cloud company that offers something like this. I think a large chunk of the reason is the technology doesn’t exist yet to do it for hundreds or thousands of small/medium customers.

Most important of all – the cloud needs to be a true utility – 99.99% uptime demonstrated over a long period of time. No requirements for “built to fail”, all failures should be transparent to the end user. Bonus points if you have the ability to have VMware-style fault tolerance (though something that can support multi CPUs) with millisecond fail over w/o data loss.   It will take a long time for the IaaS of the world to get there, but perhaps SaaS can be there already. PaaS I’m not sure, I’ve never dealt with that though. All of the major IaaS companies have had large scale outages and/or degraded performance.

The one area where public cloud does well – is the ability to get something from nothing up and going quickly, or perhaps up and going in a part of the country or world which you don’t have a facility.  Though the advantage there isn’t all that great. Even at my company back when we were hosted at Amazon on the east coast. The time came to bring up a site for our UK customers and we decided to host it on the east coast because the time frame to adapt everything(Chef etc) to work properly in another Amazon region was too tight to pull off. So we never used that region. Eventually we provisioned real equipment which I deployed in Amsterdam last summer to replace the last of our Amazon stuff.

Another article on a similar topic, this time from ComputerWorld, which noted the shift from in house data centers to service providers, though it seems more focused on literally in house data centers (vs “in house” with a co-location provider). They cite lack of available talent to manage these resources. These employees would rather work for a larger organization with more career opportunities than a small shop.

I’m sort of the opposite — I would not like to work for a large company of any kind. Much prefer small companies, with small teams. The average team size I have worked in since 2006 has been 3 people. The amount of work required to maintain our own infrastructure is actually quite a bit less than managing cloud stuff.

I guess I am the exception rather than the rule again here. I had my annual review recently and in it I wrote there was no career advancement for me at the current company, I had higher growth expectations of the company I am at — but I am not complaining. I’ll admit that the stuff I am doing now is not as exciting as it has been in the past. I’m fairly certain we could not hire someone else in the team because they would get bored and leave quickly.  Me — at least for now — I don’t mind being bored. It is a good change of pace after my run in the trenches along the front lines of technology from 2003-2011. I could do this for another year I imagine (if not longer).

As I watch the two previous companies I worked for wither and die slow deaths (and the one before them died years ago — so basically all the jobs I had from 2006-2011 were at companies that are dead or dying) it’s a good reminder to me to be thankful for where I am at. Still a small growing company with a good culture, good people, and everything runs really really well (sometimes so well it sort of scares me for some reason).

Another good reminder is I had lunch with a couple of friends while up in Seattle — they work for a company that has been on it’s death bed for years now. I asked them what is keeping the company going and they said hope (also never knew why they stuck around for as long as they have).  Or something like that. Not long after I left the company laid off a bunch of folks (they were not included in the layoff). The company is dieing every bit as much as the other two companies I worked for. I guess the main difference is I decided to jump ship long ago while they stuck it out for reasons unknown.

Time to close techopsguys?

I apologize again for not posting nearly as much as I used to — there just continues to be a dearth of topics and news that I find interesting in recent months. I am not sure if it is me that is changing or if things have really gotten boring since the new year.  I have contemplated closing the blog entirely, just to lower people’s expectations(or eliminate them) about seeing new stuff here.I’ve poured myself out all over this site the past few years and it’s just become really hard to find things to write about now. I don’t want the site to turn into a blog that is updated a couple times a year.

So I will likely close it in the coming months unless the situation changes. It has been a good run, from an idea from my former co-workers that I thought I’d be a minor contributor on to a full fledged site where I wrote nearly 400 articles, and a few hundred thousand words. Wow that is a lot.. My former co-workers bailed on the site years ago citing lack of time.  Time is certainly something I have what I have more is lack of things to write about.

I’ve had an offer to become an independent contributor to our friends over at El Reg – something that sounded cool at first, though now that I’ve thought about it I am not going to do it.  I don’t feel comfortable about the level of quality I could bring (granted not all of their stuff is high quality either but I tend to hold myself to high standards). Being a personal blog I can compromise more on quality, lean into more of my own personal biases, I have less responsibility in general.

I have seen them take on a couple other bloggers such as myself in recent months and have noticed the quality of their work is not good.  In some cases it is sort of depressing(why would you write about that?????????)  That sort of stuff belongs on a personal blog not on a news site.

I’ll have to settle for the articles where they mentioned my name in them, those I am still sort of proud of for some reason 🙂

 

March 6, 2013

Another trip to Seattle

Filed under: Random Thought — Tags: — Nate @ 10:29 am

Well I’m going again.. one of my best friends works at Microsoft over in Boston and finally found a training class to give him an excuse to come out to Seattle – his last trip was about four years ago. So I decided to go up and hang out with him and other friends. Go to my favorite places(COW GIRLS COW GIRLS..!) and have a lot of fun…

I’ll be there from this Friday the 8th until the 17th.

I’m pretty excited.

As much as I miss Seattle I’ve come to the conclusion in recent months that I can’t move back — at least not any time real soon. I have been hammered so hard by recruiters these past few months(especially since the new year). They have just been relentless. Including opportunities in Seattle. I miss friends and places up there – but from a career perspective the Bay Area is a better place to be. I’m not focused on my career at the moment (if I was I may of jumped ship as my job has gotten relatively boring and dull the past 6 months as things have gotten to be very stable and growth has leveled out). I’m happy where I am at with the flexibility that I get and the management that is in place. I think back to past companies where often times I got to a point in stability in operations but other things were blowing up be it management, or the economy or both which drove me away (always ended up being a good decision in hindsight). But at my current position I feel no similar pressure. So I have been tweaking and tuning and fixing little things here and there, and documenting like crazy.

I could even move back and still keep my same job at the same company — but I wouldn’t be able to walk to the office any more. I’d have to commute, and pay for parking, and the weather isn’t as nice as it is here (and I mean right here – I don’t like the weather in the South Bay Area vs here – which is San Bruno).

So things are going as well as I could hope for I think. I’d love to have more toys to play with, this is the smallest company from a infrastructure perspective I have worked for pretty much ever (past companies would of compared to some extent had virtualization been leveraged to the extent that it has here). That is my only gripe but it is a small one. It’s an easy trade off to make. I have little doubt that if another person tried to join my group, especially a senior one they would probably quit pretty fast because there is nothing interesting for them to do. For once I am happy to be bored, happy to have stress levels that could practically register in negative numbers!

It was a hard decision to make (to decide not to go back),  but I’ve made it now so it’ll be easier to answer that question when friends and recruiters ask.

But I do intend to keep visiting..!

New Seagate Hybrid disks coming!

Filed under: Storage — Tags: , — Nate @ 10:13 am

I first saw this yesterday over on El Reg, which seemed to have found a leaked url because at the time there was no mention of the drives elsewhere on the main Seagate site. A short time ago I came across another article via Slashdot, which mentions one major thing that yesterday’s article missed: SSD accelerated write caching.

We spoke with product manager David Burks this afternoon and have new details to report, including the revelation that the latest version of the Adaptive Memory caching technology has the ability to cache some host writes.

[..]

This new, dual-mode NAND is apparently faster than the SLC flash used in the old Momentus XT.

Though I find it interesting that they do force cached writes to the spinning rust in the event of power loss, instead of letting it sit in flash to be written the next time the drive is powered up.

Experience with Momentus XT

I have been a fan of the Momentus XT for a while now, I put one in my main laptop 1-2 years ago, upgraded from the original 320GB Hitachi 7200RPM disk that it came with (Toshiba laptop carrying an Hitachi drive?? seemed strange to me), to the original 500GB XT.

The speed up was significant – even though my laptop has 8GB of memory and really never goes above 40-50% memory usage, so always I have at least 4GB available as disk buffers — the acceleration the SSD provided was very noticeable.  Though it left me wanting more … I have thought on many occasions whether or not to go full SSD – but I needed/wanted something that had a lot of space, the laptop is dual boot and I have VM images as well. With the ability to hold only a single disk internally hybrid has been the best option.

I just wish it had more flash – I’d be happy to pay much more if it had say 32GB of flash on board, instead of the 4GB that it currently has.

A few months ago I decided to upgrade my desktop at work with a pair of Momentus 750GB drives, which each have 8GB of flash on board. That system is not dual boot but does run a Windows VM 24/7 for things I need windows for (the main OS is Ubuntu 10.04 – same as my laptop). I felt that separating the I/O for the VM(s — occasionally I run other VMs for testing things locally) would be good – also isolating the flash cache so windows has it’s own and Linux gets it’s own was good too — and hell the drives were cheap.  That system has 16GB of memory so even more room for buffers – the acceleration there was even more dramatic. I had never seen Linux go from command line to X windows login screen(GDM) in a (small) fraction of a second. But it did after the cache was warmed up (significantly faster than the XT on my laptop).

The 750GB variant has four advantages over my 500GB:

  • 6G SATA (didn’t matter to me all my systems are 3G)
  • 8GB cache (double what the 500GB has)
  • “Flash Management” – whatever that is
  • “Fast Boot technology” – whatever that is

So a few weeks ago I went and bought a 750GB XT for my laptop (haven’t installed it yet), and here we have the new stuff coming out!

New 2.5″ hybrids vs old XT

There are some significant advantages of these new hybrids –

  • 1TB vs 750GB
  • $99 vs $159 (Newegg – I bought mine online at Best buy for ~$139, picked up same day!)
  • Ability to cache some writes vs. no write caching at all
  • 64MB cache vs 32MB

I did see one potentially big disadvantage of the new hybrids – power usage.  The power draw while seeking on the new hybrids(2.7W) is more than double that of the Momentus XT (1.3W). Power draw for idle is actually 0.2W less than on the XT. I wonder what drives the power usage so much higher ? Maybe it is a typo in the data sheet.

The article above reports that Seagate says there is backup power to flush the write buffers in the event of sudden power loss, a problem that seems wide spread amongst SSDs in general. Myself I had a Corsair SSD corrupt itself a little bit a few years ago when it was connected to a system with a UPS that had a dead battery. The UPS did a self test – which then cut the power to the system because the battery was dead, and the file system became corrupt. I don’t recall how I recovered the system but I think I managed to without re-installing. I thought the problem was fairly isolated to my cheap crap SSD, so was interested to learn the problem is much more wide spread covering large sectors of the market and persists even today.

Seagate of course recently announced they were discontinuing non hybrid 7200RPM laptop drives. Which is a fine idea — when you can get a 1TB hybrid drive for only $99 that’s a pretty amazing price point, even over their existing XT series.

I suspect that especially with the new price point it will cause people to think harder on whether or not they want to go full SSD on their laptops or not.

Availability of the new 2.5″ hybrids are expected in the next week or so.

New 3.5″ Desktop Hybrids

Finally there are desktop hybrids as well, which appear to be identical other than in a larger form factor, and offering a 2TB model. Prices here were reported as $99 for 1TB and $149 for 2TB.

I do have one workstation at home which I setup for some gaming about a year ago, though recently have not been using it, in it is a 750GB Momentus XT along with a few other drives – including a low end cheap 64GB Corsair SSD that I bought a few years ago. I configured Windows 7 to use that SSD as a cache(using Ready Boost – I think that is what it is called) — though have not seen any noticeable performance boost – which surprised me quite a bit(especially given the size of the SSD). I thought it would be caching the data from the games and stuff but load times still seemed relatively normal.

Availability of these 3.5″ hybrids is expected late next month.

Conclusion

I think I will hold onto the 750GB XT I just bought, and not return it. I don’t feel comfortable returning it just because something newer/better might be coming out. Doesn’t seem right. I’ll find a use for it somewhere.. I don’t know yet if I will upgrade my laptop with that disk, or buy the 1TB new stuff. I’ll be very interested to see the benchmarks of the new drives vs the old ones. Seagate claims the MLC is faster than the SLC used in the XT. Power draw isn’t too much of a consideration my laptop doesn’t run more than about 2.5 hours anyway(also my laptop spends 99% of it’s time plugged in), I’m not sure how much doubling the power draw on the disk(only during seeks) would have on that number.

I have read Western Digital is coming out with hybrid drives too – I wonder if they will up the ante by offering a premium model with more cache on it, I can hope at least. I’d love to see 32 or even 64GB cache in a 2.5″ form factor.

« Newer PostsOlder Posts »

Powered by WordPress